8912930840
* Adding a tool for ROCM Deployment Health Check rdhc.py - This simple tool will check for the rocm installation and its readiness on the current system and its working status. Check the README file for more info. Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
rdhc
Rocm Deployment Health Check Tool
Features of the ROCm Deployment Health Check Tool
- Cross-Platform Support: Works on Ubuntu, RHEL, and SLES distributions
- Comprehensive Testing:
- Default tests (GPU presence, driver status, rocminfo, rocm-smi)
- Library dependency verification
- Check some kernel parameters and ENV variables presence
- Component-specific tests
- Build and test the test program available from rocm-examples git repo dynamically.
- Dynamic Component Detection: Identifies installed ROCm components using distribution-specific package manager commands
- Flexible Reporting:
- Pretty table output for terminal display
- JSON export for further analysis or integration
- Configurable Verbosity: Through command-line options (
-vfor verbose,-sfor silent)
Install dependency pip packages
sudo pip3 install -r requirements.txt
Usage
./rdhc.py -h
usage: sudo -E rdhc.py [options]
ROCm Deployment Health Check Tool
optional arguments:
-h, --help show this help message and exit
--quick Run quick tests only (default)
--all Default tests + Compile and executes simple program for each component.
-v, --verbose Enable verbose output
-s, --silent Silent mode (errors only)
-j FILE, --json FILE Export results to JSON file
-d DIR, --dir DIR Directory path for temporary files (default: /tmp/rdhc/)
Usage examples:
# Run quick test (default tests only)
sudo -E ./rdhc.py
# Run all tests including compile and execute the rocm-example program for each component
sudo -E ./rdhc.py --all
# Run all tests with verbose output
sudo -E ./rdhc.py --all -v
# Enable verbose output
sudo -E ./rdhc.py -v
# Run in silent mode (only errors shown)
sudo -E ./rdhc.py -s
# Export results to a specific JSON file
sudo -E ./rdhc.py --all --json rdhc-results.json
# Specify a directory for temp files and logs (default: /tmp/rdhc/)
sudo -E ./rdhc.py -d /home/user/rdhc-dir/
RDHC Environment VARIABLES
RDHC tool will use the following ENV varaibles and act accordingly if they are set.
# ROCm installation path can be set by the below ENV varaible. Default is "/opt/rocm/"
export ROCM_PATH="/opt/rocm"
# For library dependency validation, the lib search depth can be set by the below ENV.
# Default is full depth. It checks for all the lib files in ROCM_PATH/lib/ folder recursively.
export LIBDIR_MAX_DEPTH=""
# if you want to check the libs only from the ROCM_PATH/lib/ folder set the depth as 1.
export LIBDIR_MAX_DEPTH=1
The tool is designed to be easily extended with additional component tests by
adding new test methods following the naming convention test_check_component_name().