1335d19020
add handbook, user, install, and integration guides Change-Id: I996f6909f4fdf76910981c0224f5a0266907e27a remove old documentation steps Change-Id: Icfad09926e67a2dfa1de0e182fc3cd534f0448f7 formatting fixes Change-Id: I704bbbbf6ad384178f804e4a3f5e621f9c3d33b9
54 líneas
2.2 KiB
Markdown
54 líneas
2.2 KiB
Markdown
# Introduction to ROCm Data Center Tool User Guide
|
|
|
|
The ROCm™ Data Center Tool™ (RDC) simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and datacenter environments. The main features are:
|
|
|
|
• GPU telemetry
|
|
|
|
• GPU statistics for jobs
|
|
|
|
• Integration with third-party tools
|
|
|
|
• Open source
|
|
|
|
You can use the tool in standalone mode if all components are installed. However, the existing management tools can use the same set of features available in a library format.
|
|
|
|
For details on different modes of operation, refer to [Starting RDC](install).
|
|
|
|
## Objective
|
|
|
|
This user guide is intended to:
|
|
|
|
• Provide an overview of the RDC tool features.
|
|
|
|
• Describe how system administrators and Data Center (or HPC) users can administer and configure AMD GPUs.
|
|
|
|
• Describe the components.
|
|
|
|
• Provide an overview of the open source developer handbook.
|
|
|
|
## Terminology
|
|
|
|
Table 1: Terminologies and Abbreviations
|
|
|
|
| Term | Description |
|
|
| ------------------------ | ------------------------- |
|
|
| RDC | ROCm Data Center tool |
|
|
| Compute node (CN) | One of many nodes containing one or more GPUs in the Data Center on which compute jobs are run |
|
|
| Management node (MN) or Main console | A machine running system administration applications to administer and manage the Data Center |
|
|
| GPU Groups | Logical grouping of one or more GPUs in a compute node |
|
|
| Fields | A metric that can be monitored by the RDC, such as GPU temperature, memory usage, and power usage |
|
|
| Field Groups | Logical grouping of multiple fields |
|
|
| Job | A workload that is submitted to one or more compute nodes |
|
|
|
|
## Target Audience
|
|
|
|
The audience for the AMD RDC tool consists of:
|
|
|
|
• Administrators: The tool provides the cluster administrator with the capability of monitoring, validating, and configuring policies.
|
|
|
|
• HPC Users: Provides GPU-centric feedback for their workload submissions.
|
|
|
|
• OEM: Add GPU information to their existing cluster management software.
|
|
|
|
• Open source Contributors: RDC is open source and accepts contributions from the community.
|