197f73dac9
* Add examples to tools folder * Correct P2P memory access section * Sync poriting guide * Add HIP Graph tutorial * Add hint about using amdgpu-dkms for IPC API * Add a few more env variables
60 líneas
2.9 KiB
ReStructuredText
60 líneas
2.9 KiB
ReStructuredText
.. meta::
|
|
:description: This page describes call stack concept in HIP
|
|
:keywords: AMD, ROCm, HIP, call stack
|
|
|
|
*******************************************************************************
|
|
Call stack
|
|
*******************************************************************************
|
|
|
|
The call stack is a data structure for managing function calls, by saving the
|
|
state of the current function. Each time a function is called, a new call frame
|
|
is added to the top of the stack, containing information such as local
|
|
variables, return addresses and function parameters. When the function
|
|
execution completes, the frame is removed from the stack and loaded back into
|
|
the corresponding registers. This concept allows the program to return to the
|
|
calling function and continue execution from where it left off.
|
|
|
|
The call stack for each thread must track its function calls, local variables,
|
|
and return addresses. However, in GPU programming, the memory required to store
|
|
the call stack increases due to the parallelism inherent to the GPUs. NVIDIA
|
|
and AMD GPUs use different approaches. NVIDIA GPUs have the independent thread
|
|
scheduling feature where each thread has its own call stack and effective
|
|
program counter. On AMD GPUs threads are grouped; each warp has its own call
|
|
stack and program counter. Warps are described and explained in the
|
|
:ref:`inherent_thread_model`
|
|
|
|
If a thread or warp exceeds its stack size, a stack overflow occurs, causing
|
|
kernel failure. This can be detected using debuggers.
|
|
|
|
Call stack management with HIP
|
|
===============================================================================
|
|
|
|
You can adjust the call stack size as shown in the following example, allowing
|
|
fine-tuning based on specific kernel requirements. This helps prevent stack
|
|
overflow errors by ensuring sufficient stack memory is allocated.
|
|
|
|
.. literalinclude:: ../../tools/example_codes/call_stack_management.cpp
|
|
:start-after: // [sphinx-start]
|
|
:end-before: // [sphinx-end]
|
|
:language: cpp
|
|
|
|
Depending on the GPU model, at full occupancy, it can consume a significant
|
|
amount of memory. For instance, an MI300X with 304 compute units (CU) and up to
|
|
2048 threads per CU could use 304 · 2048 · 1024 bytes = 608 MiB for the call
|
|
stack by default.
|
|
|
|
Handling recursion and deep function calls
|
|
-------------------------------------------------------------------------------
|
|
|
|
Similar to CPU programming, recursive functions and deeply nested function
|
|
calls are supported. However, developers must ensure that these functions do
|
|
not exceed the available stack memory, considering the huge amount of memory
|
|
needed for the call stack due to the GPUs inherent parallelism. This can be
|
|
achieved by increasing stack size or optimizing code to reduce stack usage. To
|
|
detect stack overflow add proper error handling or use debugging tools.
|
|
|
|
.. literalinclude:: ../../tools/example_codes/device_recursion.hip
|
|
:start-after: // [sphinx-start]
|
|
:end-before: // [sphinx-end]
|
|
:language: cpp
|