[rocm-core] Adding a tool for ROCM Deployment Health Check (#958)
* Adding a tool for ROCM Deployment Health Check rdhc.py - This simple tool will check for the rocm installation and its readiness on the current system and its working status. Check the README file for more info. Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
This commit is contained in:
@@ -0,0 +1,81 @@
|
||||
# rdhc
|
||||
Rocm Deployment Health Check Tool
|
||||
|
||||
|
||||
## Features of the ROCm Deployment Health Check Tool
|
||||
|
||||
1. **Cross-Platform Support**: Works on Ubuntu, RHEL, and SLES distributions
|
||||
2. **Comprehensive Testing**:
|
||||
- Default tests (GPU presence, driver status, rocminfo, rocm-smi)
|
||||
- Library dependency verification
|
||||
- Check some kernel parameters and ENV variables presence
|
||||
- Component-specific tests
|
||||
- Build and test the test program available from rocm-examples git repo dynamically.
|
||||
3. **Dynamic Component Detection**: Identifies installed ROCm components using distribution-specific package manager commands
|
||||
4. **Flexible Reporting**:
|
||||
- Pretty table output for terminal display
|
||||
- JSON export for further analysis or integration
|
||||
5. **Configurable Verbosity**: Through command-line options (`-v` for verbose, `-s` for silent)
|
||||
|
||||
## Install dependency pip packages
|
||||
|
||||
```bash
|
||||
sudo pip3 install -r requirements.txt
|
||||
|
||||
```
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
./rdhc.py -h
|
||||
usage: sudo -E rdhc.py [options]
|
||||
|
||||
ROCm Deployment Health Check Tool
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--quick Run quick tests only (default)
|
||||
--all Default tests + Compile and executes simple program for each component.
|
||||
-v, --verbose Enable verbose output
|
||||
-s, --silent Silent mode (errors only)
|
||||
-j FILE, --json FILE Export results to JSON file
|
||||
-d DIR, --dir DIR Directory path for temporary files (default: /tmp/rdhc/)
|
||||
|
||||
Usage examples:
|
||||
# Run quick test (default tests only)
|
||||
sudo -E ./rdhc.py
|
||||
|
||||
# Run all tests including compile and execute the rocm-example program for each component
|
||||
sudo -E ./rdhc.py --all
|
||||
|
||||
# Run all tests with verbose output
|
||||
sudo -E ./rdhc.py --all -v
|
||||
|
||||
# Enable verbose output
|
||||
sudo -E ./rdhc.py -v
|
||||
|
||||
# Run in silent mode (only errors shown)
|
||||
sudo -E ./rdhc.py -s
|
||||
|
||||
# Export results to a specific JSON file
|
||||
sudo -E ./rdhc.py --all --json rdhc-results.json
|
||||
|
||||
# Specify a directory for temp files and logs (default: /tmp/rdhc/)
|
||||
sudo -E ./rdhc.py -d /home/user/rdhc-dir/
|
||||
|
||||
```
|
||||
## RDHC Environment VARIABLES
|
||||
RDHC tool will use the following ENV varaibles and act accordingly if they are set.
|
||||
```bash
|
||||
# ROCm installation path can be set by the below ENV varaible. Default is "/opt/rocm/"
|
||||
export ROCM_PATH="/opt/rocm"
|
||||
|
||||
# For library dependency validation, the lib search depth can be set by the below ENV.
|
||||
# Default is full depth. It checks for all the lib files in ROCM_PATH/lib/ folder recursively.
|
||||
export LIBDIR_MAX_DEPTH=""
|
||||
|
||||
# if you want to check the libs only from the ROCM_PATH/lib/ folder set the depth as 1.
|
||||
export LIBDIR_MAX_DEPTH=1
|
||||
|
||||
```
|
||||
The tool is designed to be easily extended with additional component tests by
|
||||
adding new test methods following the naming convention `test_check_component_name()`.
|
||||
Executable
+1837
Plik diff jest za duży
Load Diff
@@ -0,0 +1,2 @@
|
||||
prettytable>=3.14.0
|
||||
PyYAML>=5.4.1
|
||||
Reference in New Issue
Block a user