diff --git a/README.md b/README.md index 20a8689180..66464fc033 100644 --- a/README.md +++ b/README.md @@ -8,14 +8,10 @@ The ROCm™ Data Center Tool simplifies the administration and addresses key inf - Integration with third-party tools - Open source -For complete list of features and how to start using RDC from pre-built packages, please refer to [**user guide**](docs/AMD_ROCm_Data_Center_Tool_User_Guide.pdf) +For up-to-date document and how to start using RDC from pre-built packages, please refer to [**user guide**](https://docs.amd.com/bundle/ROCm-Data-Center-Tool-User-Guide-v5.1/page/Overview.html) # Supported platforms - Ubuntu 18.04.5 (Kernel 5.3) - CentOS v7.7 (Using devtoolset-7 runtime support) - RHEL v7.7 (Using devtoolset-7 runtime support) - SLES 15 SP1 - CentOS and RHEL 8.1(Kernel 4.18.0-147) +RDC can run on AMD ROCm supported platforms, please refer to [List of Supported Operating Systems](https://rocmdocs.amd.com/en/latest/Current_Release_Notes/Current-Release-Notes.html) # Building RDC from source @@ -30,7 +26,7 @@ For complete list of features and how to start using RDC from pre-built packages AMD ROCm platform (https://github.com/RadeonOpenCompute/ROCm) * It is recommended to install the complete AMD ROCm platform. - For installation instruction see https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html + For installation instruction see https://rocmdocs.amd.com/en/latest/Current_Release_Notes/Current-Release-Notes.html * At the minimum, these two components are required (i) AMD ROCm SMI Library (https://github.com/RadeonOpenCompute/rocm_smi_lib) (ii) AMD ROCk Kernel driver (https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver) @@ -96,7 +92,7 @@ The following commands need to be executed as root (sudo). It may be easiest to # Running RDC RDC supports encrypted communications between clients and servers. The -communication can be configured to be *authenticated* or *not authenticated*. The [**user guide**](docs/AMD_ROCm_Data_Center_Tool_User_Guide.pdf) has information on how to generate and install SSL keys and certificates for authentication. By default, authentication is enabled. +communication can be configured to be *authenticated* or *not authenticated*. The [**user guide**](https://docs.amd.com/bundle/ROCm-Data-Center-Tool-User-Guide-v5.1/page/Developer_Handbook.html) has information on how to generate and install SSL keys and certificates for authentication. By default, authentication is enabled. ## Starting ROCm™ Data Center Daemon (RDCD) For an RDC client application to monitor and/or control a remote system, the RDC server daemon, *rdcd*, must be running on the remote system. *rdcd* can be configured to run with (a) full-capabilities which includes ability to set or change GPU configuration or (b) monitor-only capabilities which limits to monitoring GPU metrics. @@ -117,10 +113,10 @@ When *rdcd* is started from a command-line the *capabilities* are determined by $ sudo /opt/rocm-/rdc/bin/rdcd -u ## rdcd is started will full-capabilities ### Start RDCD using systemd -*rdcd* can be started by using the systemctl command. systemctl will read /lib/systemd/system/rdc.service, which is installed with rdc. This file has 2 lines that control what *capabilities* with which *rdcd* will run. If left uncommented, rdcd will run with full-capabilities. +*rdcd* can be started by using the systemctl command. You can copy /opt/rocm-\/rdc/lib/rdc.service, which is installed with RDC, to the systemd folder. This file has 2 lines that control what *capabilities* with which *rdcd* will run. If left uncommented, rdcd will run with full-capabilities. - ## file: /lib/systemd/system/rdc.service + ## file: /opt/rocm-/rdc/lib/rdc.service ## Comment the following two lines to run with monitor-only capabilities CapabilityBoundingSet=CAP_DAC_OVERRIDE AmbientCapabilities=CAP_DAC_OVERRIDE @@ -128,15 +124,15 @@ When *rdcd* is started from a command-line the *capabilities* are determined by systemctl start rdc ## start rdc as systemd service ## Invoke RDC using ROCm™ Data Center Interface (RDCI) -RDCI provides command-line interface to all RDC features. This CLI can be run locally or remotely. Refer to [**user guide**](docs/AMD_ROCm_Data_Center_Tool_User_Guide) for the current list of features. +RDCI provides command-line interface to all RDC features. This CLI can be run locally or remotely. Refer to [**user guide**](https://docs.amd.com/bundle/ROCm-Data-Center-Tool-User-Guide-v5.1/page/Feature_Overview.html) for the current list of features. ## sample rdci commands to test RDC functionality ## discover devices in a local or remote compute node ## NOTE: option -u (for unauthenticated) is required is rdcd was started in this mode $ cd rdc_install_prefix ## If specified in Building RDC section - ./opt/rocm/rdc/bin/rdci discovery -l <-u> ## list available GPUs in localhost - ./opt/rocm/rdc/bin/rdci discovery -l <-u> ## list available GPUs in host machine + ./opt/rocm-/rdc/bin/rdci discovery -l <-u> ## list available GPUs in localhost + ./opt/rocm-/rdc/bin/rdci discovery -l <-u> ## list available GPUs in host machine ## Troubleshooting rdcd @@ -149,4 +145,3 @@ Log messages that can provide useful debug information. ## version will be the version number(ex:3.10.0) of ROCm where RDC was pacakged with RDC_LOG=DEBUG /opt/rocm-/rdc/bin/rdcd - diff --git a/docs/AMD_ROCm_Data_Center_Tool_User_Guide.pdf b/docs/AMD_ROCm_Data_Center_Tool_User_Guide.pdf deleted file mode 100644 index 935c24a7a0..0000000000 Binary files a/docs/AMD_ROCm_Data_Center_Tool_User_Guide.pdf and /dev/null differ diff --git a/docs/RDC_API_Manual.pdf b/docs/RDC_API_Manual.pdf deleted file mode 100644 index 111f160622..0000000000 Binary files a/docs/RDC_API_Manual.pdf and /dev/null differ diff --git a/python_binding/README.md b/python_binding/README.md index 61a8f250d9..3470af014d 100644 --- a/python_binding/README.md +++ b/python_binding/README.md @@ -1,29 +1,36 @@ -* Quick start +# Quick start If you do not have the RDC installed, please specify the RDC library path using: -export LD_LIBRARY_PATH= + + $ export LD_LIBRARY_PATH= Then you can run RdcReader in python_binding folder: -python RdcReader.py -* Prometheus plugin + $ python RdcReader.py + +# Prometheus plugin Install the prometheus_client: -% pip install prometheus_client + + $ pip install prometheus_client Start the rdcd with auth and then run plugin to connect to it: -% python rdc_prometheus.py + + $ python rdc_prometheus.py Check the options of the plugin: -% python rdc_prometheus.py --help + + $ python rdc_prometheus.py --help Verify the plugin is running: -% curl localhost:5000 + + $ curl localhost:5000 In the managment computer, install the Prometheus from https://github.com/prometheus/prometheus Modify the file prometheus_targets.json to add the compute nodes running the plugin. Start the Prometheus -% prometheus --config.file= + + $ prometheus --config.file= Browse to localhost:9090 in the managment computer for metrics from RDC.