Files
rocm-systems/projects/rocm-core/rdhc
solaiys 8912930840 [rocm-core] Adding a tool for ROCM Deployment Health Check (#958)
* Adding a tool for ROCM Deployment Health Check

rdhc.py - This simple tool will check for the rocm
installation and its readiness on the current system and its working status.
Check the README file for more info.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
2025-09-24 22:43:42 +05:30
..

rdhc

Rocm Deployment Health Check Tool

Features of the ROCm Deployment Health Check Tool

  1. Cross-Platform Support: Works on Ubuntu, RHEL, and SLES distributions
  2. Comprehensive Testing:
    • Default tests (GPU presence, driver status, rocminfo, rocm-smi)
    • Library dependency verification
    • Check some kernel parameters and ENV variables presence
    • Component-specific tests
      • Build and test the test program available from rocm-examples git repo dynamically.
  3. Dynamic Component Detection: Identifies installed ROCm components using distribution-specific package manager commands
  4. Flexible Reporting:
    • Pretty table output for terminal display
    • JSON export for further analysis or integration
  5. Configurable Verbosity: Through command-line options (-v for verbose, -s for silent)

Install dependency pip packages

sudo pip3 install -r requirements.txt

Usage

./rdhc.py -h
usage: sudo -E rdhc.py [options]

ROCm Deployment Health Check Tool

optional arguments:
  -h, --help            show this help message and exit
  --quick               Run quick tests only (default)
  --all                 Default tests + Compile and executes simple program for each component.
  -v, --verbose         Enable verbose output
  -s, --silent          Silent mode (errors only)
  -j FILE, --json FILE  Export results to JSON file
  -d DIR, --dir DIR     Directory path for temporary files (default: /tmp/rdhc/)

Usage examples:
# Run quick test (default tests only)
sudo -E ./rdhc.py

# Run all tests including compile and execute the rocm-example program for each component
sudo -E ./rdhc.py --all

# Run all tests with verbose output
sudo -E ./rdhc.py --all -v

# Enable verbose output
sudo -E ./rdhc.py -v

# Run in silent mode (only errors shown)
sudo -E ./rdhc.py -s

# Export results to a specific JSON file
sudo -E ./rdhc.py --all --json rdhc-results.json

# Specify a directory for temp files and logs (default: /tmp/rdhc/)
sudo -E ./rdhc.py -d /home/user/rdhc-dir/

RDHC Environment VARIABLES

RDHC tool will use the following ENV varaibles and act accordingly if they are set.

# ROCm installation path can be set by the below ENV varaible. Default is "/opt/rocm/"
export ROCM_PATH="/opt/rocm"

# For library dependency validation, the lib search depth can be set by the below ENV.
# Default is full depth. It checks for all the lib files in ROCM_PATH/lib/ folder recursively.
export LIBDIR_MAX_DEPTH=""

# if you want to check the libs only from the ROCM_PATH/lib/ folder set the depth as 1.
export LIBDIR_MAX_DEPTH=1

The tool is designed to be easily extended with additional component tests by adding new test methods following the naming convention test_check_component_name().