There are some techniques provided in HIP for developers to trace and debug codes during execution, this section describes some details and practical suggestions on debugging.
Table of Contents
=================
* [ Debugging Tools](#debugging-tools)
* [Using ltrace](#using-ltrace)
* [Using ROCgdb](#using-rocgdb)
* [Other Debugging Tools](#Other-debugging-tools)
* [ Debugging HIP Application](#debugging-hip-application)
* [HSA related environment variables](#HSA-related-environment-variables)
* [ General Debugging Tips](#general-debugging-tips)
## Debugging tools
### Using ltrace
ltrace is a standard linux tool which provides a message to stderr on every dynamic library call.
Since ROCr and the ROCt (the ROC thunk, which is the thin user-space interface to the ROC kernel driver) are both dynamic libraries, this provides an easy way to trace the activity in these libraries.
Tracing can be a powerful way to quickly observe the flow of the application before diving into the details with a command-line debugger.
ltrace is a helpful tool to visualize the runtime behavior of the entire ROCm software stack.
The trace can also show performance issues related to accidental calls to expensive API calls on the critical path.
Here's a simple sample with command-line to trace hip APIs and output:
HIP developers on ROCm can use AMD's ROCgdb for debugging and profiling.
ROCgdb is the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger, equivalent of cuda-gdb, can be used with debugger frontends, such as eclipse, vscode, or gdb-dashboard.
For details, see (https://github.com/ROCm-Developer-Tools/ROCgdb).
Below is a sample how to use ROCgdb run and debug HIP application, rocgdb is installed with ROCM package in the folder /opt/rocm/bin.
```
$ export PATH=$PATH:/opt/rocm/bin
$ rocgdb ./hipTexObjPitch
GNU gdb (rocm-dkms-no-npi-hipclang-6549) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
Below is an example to show how to get useful information from the debugger while running a simple memory copy test, which caused an issue of segmentation fault.
| AMD_LOG_LEVEL <br><sub> Enable HIP log on different Level. </sub> | 0 | 0: Disable log. <br> 1: Enable log on error level. <br> 2: Enable log on warning and below levels. <br> 0x3: Enable log on information and below levels. <br> 0x4: Decode and display AQL packets. |
| AMD_LOG_MASK <br><sub> Enable HIP log on different Level. </sub> | 0x7FFFFFFF | 0x1: Log API calls. <br> 0x02: Kernel and Copy Commands and Barriers. <br> 0x4: Synchronization and waiting for commands to finish. <br> 0x8: Enable log on information and below levels. <br> 0x20: Queue commands and queue contents. <br> 0x40:Signal creation, allocation, pool. <br> 0x80: Locks and thread-safety code. <br> 0x100: Copy debug. <br> 0x200: Detailed copy debug. <br> 0x400: Resource allocation, performance-impacting events. <br> 0x800: Initialization and shutdown. <br> 0x1000: Misc debug, not yet classified. <br> 0x2000: Show raw bytes of AQL packet. <br> 0x4000: Show code creation debug. <br> 0x8000: More detailed command info, including barrier commands. <br> 0x10000: Log message location. <br> 0xFFFFFFFF: Log always even mask flag is zero. |
| HIP_VISIBLE_DEVICES <br><sub> Only devices whose index is present in the sequence are visible to HIP. </sub> | | 0,1,2: Depending on the number of devices on the system. |
| AMD_SERIALIZE_KERNEL <br><sub> Serialize kernel enqueue. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
| AMD_SERIALIZE_COPY <br><sub> Serialize copies. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
| HIP_HOST_COHERENT <br><sub> Coherent memory in hipHostMalloc. </sub> | 0 | 0: memory is not coherent between host and GPU. <br> 1: memory is coherent with host. |
- From inside GDB, you can set environment variables "set env". Note the command does not use an '=' sign:
```
(gdb) set env AMD_SERIALIZE_KERNEL 3
```
- The fault will be caught by the runtime but was actually generated by an asynchronous command running on the GPU. So, the GDB backtrace will show a path in the runtime.
- To determine the true location of the fault, force the kernels to execute synchronously by seeing the environment variables AMD_SERIALIZE_KERNEL=3 AMD_SERIALIZE_COPY=3. This will force HIP runtime to wait for the kernel to finish executing before retuning. If the fault occurs during the execution of a kernel, you can see the code which launched the kernel inside the backtrace. A bit of guesswork is required to determine which thread is actually causing the issue - typically it will the thread which is waiting inside the libhsa-runtime64.so.
- VM faults inside kernels can be caused by:
- incorrect code (ie a for loop which extends past array boundaries),
- memory issues - kernel arguments which are invalid (null pointers, unregistered host pointers, bad pointers),
- synchronization issues,
- compiler issues (incorrect code generation from the compiler),