Moved partition_id from static --asic-info to static --partition.

partition_id also removed from the `amdsmi_asic_info_t` struct and
supporting API has been added for querying partition information.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Id5a6291a77d11bb97a1c7a200fc465898e86e081


[ROCm/amdsmi commit: c9a489d437]
Этот коммит содержится в:
gabrpham
2024-09-18 19:53:32 -05:00
коммит произвёл Maisam Arif
родитель 82096d7f74
Коммит 0fd0b46b7f
10 изменённых файлов: 351 добавлений и 155 удалений
+111 -95
Просмотреть файл
@@ -16,18 +16,15 @@ Full documentation for amd_smi_lib is available at [https://rocm.docs.amd.com/pr
- **Added more supported utilization count types to `amdsmi_get_utilization_count()`**.
- **Added `amd-smi set -L/--clk-limit ...` command**.
Equivalent to rocm-smi's '--extremum' command which sets sclk's or mclk's soft minimum or soft maximum clock frequency.
- Equivalent to rocm-smi's '--extremum' command which sets sclk's or mclk's soft minimum or soft maximum clock frequency.
- **Added Pytest functionality to test amdsmi API calls in Python**.
- **Changed the `power` parameter in `amdsmi_get_energy_count()` to `energy_accumulator`**.
Changes propagate forwards into the python interface as well, however we are maintaing backwards compatibility and keeping the `power` field in the python API until ROCm 6.4.
- Changes propagate forwards into the python interface as well, however we are maintaing backwards compatibility and keeping the `power` field in the python API until ROCm 6.4.
- **Added GPU memory overdrive percentage to `amd-smi metric -o`**.
Added `amdsmi_get_gpu_mem_overdrive_level()` function to amd-smi C and Python Libraries.
- **Added Subsystem Device ID to `amd-smi static --asic`**.
No underlying changes to amdsmi_get_gpu_asic_info
- Added `amdsmi_get_gpu_mem_overdrive_level()` function to amd-smi C and Python Libraries.
- **Added retrieving connection type and P2P capabilities between two GPUs**.
- Added `amdsmi_topo_get_p2p_status` function to amd-smi C and Python Libraries.
@@ -44,14 +41,14 @@ If no topology argument is provided all topology information will be displayed.
Topology arguments:
-h, --help show this help message and exit
-g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices:
ID: 0 | BDF: 0000:0c:00.0 | UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ID: 1 | BDF: 0000:22:00.0 | UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ID: 2 | BDF: 0000:38:00.0 | UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ID: 3 | BDF: 0000:5c:00.0 | UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ID: 4 | BDF: 0000:9f:00.0 | UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ID: 5 | BDF: 0000:af:00.0 | UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ID: 6 | BDF: 0000:bf:00.0 | UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ID: 7 | BDF: 0000:df:00.0 | UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ID: 0 | BDF: 0000:0c:00.0 | UUID: <redacted>
ID: 1 | BDF: 0000:22:00.0 | UUID: <redacted>
ID: 2 | BDF: 0000:38:00.0 | UUID: <redacted>
ID: 3 | BDF: 0000:5c:00.0 | UUID: <redacted>
ID: 4 | BDF: 0000:9f:00.0 | UUID: <redacted>
ID: 5 | BDF: 0000:af:00.0 | UUID: <redacted>
ID: 6 | BDF: 0000:bf:00.0 | UUID: <redacted>
ID: 7 | BDF: 0000:df:00.0 | UUID: <redacted>
all | Selects all devices
@@ -75,62 +72,7 @@ Command Modifiers:
```
```shell
$ amd-smi topology
ACCESS TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:22:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:38:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:5c:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:9f:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:af:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:bf:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:df:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
WEIGHT TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 0 15 15 15 15 15 15 15
0000:22:00.0 15 0 15 15 15 15 15 15
0000:38:00.0 15 15 0 15 15 15 15 15
0000:5c:00.0 15 15 15 0 15 15 15 15
0000:9f:00.0 15 15 15 15 0 15 15 15
0000:af:00.0 15 15 15 15 15 0 15 15
0000:bf:00.0 15 15 15 15 15 15 0 15
0000:df:00.0 15 15 15 15 15 15 15 0
HOPS TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 0 1 1 1 1 1 1 1
0000:22:00.0 1 0 1 1 1 1 1 1
0000:38:00.0 1 1 0 1 1 1 1 1
0000:5c:00.0 1 1 1 0 1 1 1 1
0000:9f:00.0 1 1 1 1 0 1 1 1
0000:af:00.0 1 1 1 1 1 0 1 1
0000:bf:00.0 1 1 1 1 1 1 0 1
0000:df:00.0 1 1 1 1 1 1 1 0
LINK TYPE TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 SELF XGMI XGMI XGMI XGMI XGMI XGMI XGMI
0000:22:00.0 XGMI SELF XGMI XGMI XGMI XGMI XGMI XGMI
0000:38:00.0 XGMI XGMI SELF XGMI XGMI XGMI XGMI XGMI
0000:5c:00.0 XGMI XGMI XGMI SELF XGMI XGMI XGMI XGMI
0000:9f:00.0 XGMI XGMI XGMI XGMI SELF XGMI XGMI XGMI
0000:af:00.0 XGMI XGMI XGMI XGMI XGMI SELF XGMI XGMI
0000:bf:00.0 XGMI XGMI XGMI XGMI XGMI XGMI SELF XGMI
0000:df:00.0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI SELF
NUMA BW TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 N/A 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000
0000:22:00.0 50000-50000 N/A 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000
0000:38:00.0 50000-50000 50000-50000 N/A 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000
0000:5c:00.0 50000-50000 50000-50000 50000-50000 N/A 50000-50000 50000-50000 50000-50000 50000-50000
0000:9f:00.0 50000-50000 50000-50000 50000-50000 50000-50000 N/A 50000-50000 50000-50000 50000-50000
0000:af:00.0 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 N/A 50000-50000 50000-50000
0000:bf:00.0 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 N/A 50000-50000
0000:df:00.0 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 N/A
$ amd-smi topology -cndz
CACHE COHERANCY TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 SELF C NC NC C C C NC
@@ -203,22 +145,40 @@ typedef struct {
$ amd-smi list
GPU: 0
BDF: 0000:23:00.0
UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
UUID: <redacted>
KFD_ID: 45412
NODE_ID: 1
PARTITION_ID: 0
GPU: 1
BDF: 0000:26:00.0
UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
UUID: <redacted>
KFD_ID: 59881
NODE_ID: 2
PARTITION_ID: 0
```
- **Added Target_Graphics_Version and partition id to `amd-smi static --asic`**.
- **Added Subsystem Device ID to `amd-smi static --asic`**.
- No underlying changes to amdsmi_get_gpu_asic_info
Due to fixes needed to properly enumerate all logical GPUs in CPX, new device identifiers
were placed within the `amdsmi_asic_info_t` struct. These new fields are only available for BM/Guest Linux
devices at this time.
```shell
$ amd-smi static --asic
GPU: 0
ASIC:
MARKET_NAME: MI308X
VENDOR_ID: 0x1002
VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
SUBVENDOR_ID: 0x1002
DEVICE_ID: 0x74a2
SUBSYSTEM_ID: 0x74a2
REV_ID: 0x00
ASIC_SERIAL: <redacted>
OAM_ID: 5
NUM_COMPUTE_UNITS: 20
TARGET_GRAPHICS_VERSION: gfx942
```
- **Added Target_Graphics_Version to `amd-smi static --asic` and `amdsmi_get_gpu_asic_info()`**.
```C
typedef struct {
@@ -232,13 +192,12 @@ typedef struct {
uint32_t oam_id; //< 0xFFFF if not supported
uint32_t num_of_compute_units; //< 0xFFFFFFFF if not supported
uint64_t target_graphics_version; //< 0xFFFFFFFFFFFFFFFF if not supported
uint32_t partition_id; //< 0xFFFFFFFF if not supported
uint32_t reserved[14];
uint32_t reserved[15];
} amdsmi_asic_info_t;
```
```shell
$ amd-smi static --asic --partition
$ amd-smi static --asic
GPU: 0
ASIC:
MARKET_NAME: MI308X
@@ -246,47 +205,102 @@ GPU: 0
VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
SUBVENDOR_ID: 0x1002
DEVICE_ID: 0x74a2
TARGET_GRAPHICS_VERSION: gfx942
KFD_ID: 24248
NODE_ID: 2
PARTITION_ID: 0
SUBSYSTEM_ID: 0x74a2
REV_ID: 0x00
ASIC_SERIAL: <redacted>
OAM_ID: 5
NUM_COMPUTE_UNITS: 20
TARGET_GRAPHICS_VERSION: gfx942
```
- **Udpated Partition APIs and struct information and added and partition_id to `amd-smi static --partition` & `amd-smi list`**.
- As part of an overhaul to partition information, some partition information will be made available in the `amdsmi_accelerator_partition_profile_t`.
- This struct will be filled out by a new API, `amdsmi_get_gpu_accelerator_partition_profile()`.
- Future data from these APIs wil will eventually get added to `static --partition`.
```C
#define AMDSMI_MAX_ACCELERATOR_PROFILE 32
#define AMDSMI_MAX_CP_PROFILE_RESOURCES 32
#define AMDSMI_MAX_ACCELERATOR_PARTITIONS 8
/**
* @brief Accelerator Partition. This enum is used to identify
* various accelerator partitioning settings.
*/
typedef enum {
AMDSMI_ACCELERATOR_PARTITION_INVALID = 0,
AMDSMI_ACCELERATOR_PARTITION_SPX, //!< Single GPU mode (SPX)- All XCCs work
//!< together with shared memory
AMDSMI_ACCELERATOR_PARTITION_DPX, //!< Dual GPU mode (DPX)- Half XCCs work
//!< together with shared memory
AMDSMI_ACCELERATOR_PARTITION_TPX, //!< Triple GPU mode (TPX)- One-third XCCs
//!< work together with shared memory
AMDSMI_ACCELERATOR_PARTITION_QPX, //!< Quad GPU mode (QPX)- Quarter XCCs
//!< work together with shared memory
AMDSMI_ACCELERATOR_PARTITION_CPX, //!< Core mode (CPX)- Per-chip XCC with
//!< shared memory
} amdsmi_accelerator_partition_type_t;
typedef struct {
amdsmi_accelerator_partition_type_t profile_type; // SPX, DPX, QPX, CPX and so on
uint32_t num_partitions; // On MI300X, SPX: 1, DPX: 2, QPX: 4, CPX: 8, the length of resources array
uint32_t profile_index; // The index in the profiles array in amdsmi_compute_partition_profile_t
uint32_t num_resources; // length of index_of_resources_profile
uint32_t resources[AMDSMI_MAX_ACCELERATOR_PARTITIONS][AMDSMI_MAX_CP_PROFILE_RESOURCES];
uint32_t reserved[12];
} amdsmi_accelerator_partition_profile_t;
```
```shell
$ amd-smi static --partition
GPU: 0
PARTITION:
COMPUTE_PARTITION: CPX
MEMORY_PARTITION: NPS4
PARTITION_ID: 0
$ amd-smi list
GPU: 0
BDF: 0000:23:00.0
UUID: <redacted>
KFD_ID: 45412
NODE_ID: 1
PARTITION_ID: 0
GPU: 1
BDF: 0000:26:00.0
UUID: <redacted>
KFD_ID: 59881
NODE_ID: 2
PARTITION_ID: 0
```
### Removals
- **Removed usage of _validate_positive in Parser and replaced with _positive_int and _not_negative_int as appropriate**.
This will allow 0 to be a valid input for several options in setting CPUs where appropriate (for example, as a mode or NBIOID)
- This will allow 0 to be a valid input for several options in setting CPUs where appropriate (for example, as a mode or NBIOID)
### Optimizations
- **Adjusted ordering of gpu_metrics calls to ensure that pcie_bw values remain stable in `amd-smi metric` & `amd-smi monitor`**.
With this change additional padding was added to PCIE_BW `amd-smi monitor --pcie`
- With this change additional padding was added to PCIE_BW `amd-smi monitor --pcie`
### Resolved issues
- **Improved Offline install process & lowered dependency for PyYAML**.
- **Fixed CPX not showing total number of logical GPUs**.
Updates were made to `amdsmi_init()` and `amdsmi_get_gpu_bdf_id(..)`. In order to display all logical devices, we needed a way to provide order to GPU's enumerated. This was done
- Updates were made to `amdsmi_init()` and `amdsmi_get_gpu_bdf_id(..)`. In order to display all logical devices, we needed a way to provide order to GPU's enumerated. This was done
by adding a partition_id within the BDF optional pci_id bits.
Due to driver changes in KFD, some devices may report bits [31:28] or [2:0]. With the newly added `amdsmi_get_gpu_bdf_id(..)`, we provided this fallback to properly retreive partition ID. We
- Due to driver changes in KFD, some devices may report bits [31:28] or [2:0]. With the newly added `amdsmi_get_gpu_bdf_id(..)`, we provided this fallback to properly retreive partition ID. We
plan to eventually remove partition ID from the function portion of the BDF (Bus Device Function). See below for PCI ID description.
- bits [63:32] = domain
- bits [31:28] or bits [2:0] = partition id
- bits [27:16] = reserved
- bits [15:8] = Bus
- bits [7:3] = Device
- bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes
- bits [63:32] = domain
- bits [31:28] or bits [2:0] = partition id
- bits [27:16] = reserved
- bits [15:8] = Bus
- bits [7:3] = Device
- bits [2:0] = Function (partition id maybe in bits [2:0]) <-- Fallback for non SPX modes
Previously in non-SPX modes (ex. CPX/TPX/DPX/etc) some MI3x ASICs would not report all logical GPU devices within AMD SMI.
@@ -329,6 +343,8 @@ GPU POWER GPU_TEMP MEM_TEMP VRAM_USED VRAM_TOTAL
- **Fixed incorrect implementation of the Python API `amdsmi_get_gpu_metrics_header_info()`**.
- **`amd-smi static --partition` will have updates with additional partition information from `amdsmi_get_gpu_accelerator_partition_profile()`**.
### Known issues
- N/A
@@ -1005,7 +1021,7 @@ Use the watch arguments to run continuously
Monitor Arguments:
-h, --help show this help message and exit
-g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices:
ID: 0 | BDF: 0000:01:00.0 | UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
ID: 0 | BDF: 0000:01:00.0 | UUID: <redacted>
all | Selects all devices
-U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices:
ID: 0
+21 -5
Просмотреть файл
@@ -175,20 +175,29 @@ class AMDSMICommands():
kfd_id = kfd_info['kfd_id']
node_id = kfd_info['node_id']
except amdsmi_exception.AmdSmiLibraryException as e:
kfd_id = node_id = e.get_error_info()
kfd_id = node_id = "N/A"
logging.debug("Failed to get kfd info for gpu %s | %s", gpu_id, e.get_error_info())
try:
partition_info = amdsmi_interface.amdsmi_get_gpu_accelerator_partition_profile(args.gpu)
partition_id = partition_info['partition_id']
except amdsmi_exception.AmdSmiLibraryException as e:
partition_id = "N/A"
logging.debug("Failed to get partition ID for gpu %s | %s", gpu_id, e.get_error_info())
# CSV format is intentionally aligned with Host
if self.logger.is_csv_format():
self.logger.store_output(args.gpu, 'gpu_bdf', bdf)
self.logger.store_output(args.gpu, 'gpu_uuid', uuid)
self.logger.store_output(args.gpu, 'kfd_id', kfd_id)
self.logger.store_output(args.gpu, 'node_id', node_id)
self.logger.store_output(args.gpu, 'partition_id', partition_id)
else:
self.logger.store_output(args.gpu, 'bdf', bdf)
self.logger.store_output(args.gpu, 'uuid', uuid)
self.logger.store_output(args.gpu, 'kfd_id', kfd_id)
self.logger.store_output(args.gpu, 'node_id', node_id)
self.logger.store_output(args.gpu, 'partition_id', partition_id)
if multiple_devices:
self.logger.store_multiple_device_output()
@@ -380,8 +389,7 @@ class AMDSMICommands():
"asic_serial" : "N/A",
"oam_id" : "N/A",
"num_compute_units" : "N/A",
"target_graphics_version" : "N/A",
"partition_id" : "N/A"
"target_graphics_version" : "N/A"
}
try:
@@ -679,8 +687,16 @@ class AMDSMICommands():
memory_partition = "N/A"
logging.debug("Failed to get memory partition info for gpu %s | %s", gpu_id, e.get_error_info())
try:
partition_info = amdsmi_interface.amdsmi_get_gpu_accelerator_partition_profile(args.gpu)
partition_id = partition_info['partition_id']
except amdsmi_exception.AmdSmiLibraryException as e:
partition_id = "N/A"
logging.debug("Failed to get partition ID for gpu %s | %s", gpu_id, e.get_error_info())
static_dict['partition'] = {"compute_partition": compute_partition,
"memory_partition": memory_partition}
"memory_partition": memory_partition,
"partition_id": partition_id}
if 'soc_pstate' in current_platform_args:
if args.soc_pstate:
try:
@@ -4996,4 +5012,4 @@ class AMDSMICommands():
except Exception as e:
print(e)
listener.stop()
listener.stop()
+48 -2
Просмотреть файл
@@ -87,6 +87,9 @@ typedef enum {
#define AMDSMI_MAX_CONTAINER_TYPE 2
#define AMDSMI_MAX_CACHE_TYPES 10
#define AMDSMI_MAX_NUM_XGMI_PHYSICAL_LINK 64
#define AMDSMI_MAX_ACCELERATOR_PROFILE 32
#define AMDSMI_MAX_CP_PROFILE_RESOURCES 32
#define AMDSMI_MAX_ACCELERATOR_PARTITIONS 8
#define AMDSMI_GPU_UUID_SIZE 38
@@ -275,6 +278,24 @@ typedef enum {
AMDSMI_CLK_TYPE__MAX = AMDSMI_CLK_TYPE_DCLK1
} amdsmi_clk_type_t;
/**
* @brief Accelerator Partition. This enum is used to identify
* various accelerator partitioning settings.
*/
typedef enum {
AMDSMI_ACCELERATOR_PARTITION_INVALID = 0,
AMDSMI_ACCELERATOR_PARTITION_SPX, //!< Single GPU mode (SPX)- All XCCs work
//!< together with shared memory
AMDSMI_ACCELERATOR_PARTITION_DPX, //!< Dual GPU mode (DPX)- Half XCCs work
//!< together with shared memory
AMDSMI_ACCELERATOR_PARTITION_TPX, //!< Triple GPU mode (TPX)- One-third XCCs
//!< work together with shared memory
AMDSMI_ACCELERATOR_PARTITION_QPX, //!< Quad GPU mode (QPX)- Quarter XCCs
//!< work together with shared memory
AMDSMI_ACCELERATOR_PARTITION_CPX, //!< Core mode (CPX)- Per-chip XCC with
//!< shared memory
} amdsmi_accelerator_partition_type_t;
/**
* @brief Compute Partition. This enum is used to identify
* various compute partitioning settings.
@@ -590,8 +611,7 @@ typedef struct {
uint32_t oam_id; //< 0xFFFF if not supported
uint32_t num_of_compute_units; //< 0xFFFFFFFF if not supported
uint64_t target_graphics_version; //< 0xFFFFFFFFFFFFFFFF if not supported
uint32_t partition_id; //< 0xFFFFFFFF if not supported
uint32_t reserved[14];
uint32_t reserved[15];
} amdsmi_asic_info_t;
typedef struct {
@@ -600,6 +620,15 @@ typedef struct {
uint32_t reserved[13];
} amdsmi_kfd_info_t;
typedef struct {
amdsmi_accelerator_partition_type_t profile_type; // SPX, DPX, QPX, CPX and so on
uint32_t num_partitions; // On MI300X, SPX: 1, DPX: 2, QPX: 4, CPX: 8, the length of resources array
uint32_t profile_index; // The index in the profiles array in amdsmi_accelerator_partition_profile_t
uint32_t num_resources; // length of index_of_resources_profile
uint32_t resources[AMDSMI_MAX_ACCELERATOR_PARTITIONS][AMDSMI_MAX_CP_PROFILE_RESOURCES];
uint64_t reserved[6];
} amdsmi_accelerator_partition_profile_t;
typedef enum {
AMDSMI_LINK_TYPE_PCIE,
AMDSMI_LINK_TYPE_XGMI,
@@ -4517,6 +4546,23 @@ amdsmi_status_t amdsmi_reset_gpu_memory_partition(amdsmi_processor_handle proces
/** @} */ // end of memory_partition
/*****************************************************************************/
/** @defgroup accelerator_partition_profile Accelerator Partition Profile Functions
* These functions are used to configure and query the device's
* accelerator parition profile setting.
* @{
*/
// TODO: declare rest of partition profile functions and complete doc commentary.
/*
Get the current accelerator partition profile. The function will return current profile.
*/
amdsmi_status_t
amdsmi_get_gpu_accelerator_partition_profile(amdsmi_processor_handle processor_handle,
amdsmi_accelerator_partition_profile_t *profile,
uint32_t *partition_id);
/** @} */ // end of accelerator_partition_profile
/*****************************************************************************/
/** @defgroup EvntNotif Event Notification Functions
* These functions are used to configure for and get asynchronous event
+40 -1
Просмотреть файл
@@ -2102,6 +2102,7 @@ except AmdSmiException as e:
```
### amdsmi_set_gpu_process_isolation
Description: Enable/disable the system Process Isolation for the given device handle.
Input parameters:
@@ -2132,6 +2133,7 @@ except AmdSmiException as e:
```
### amdsmi_clean_gpu_local_data
Description: Clear the SRAM data of the given device. This can be called between user logins to prevent information leak.
Input parameters:
@@ -2160,7 +2162,6 @@ except AmdSmiException as e:
print(e)
```
### amdsmi_get_gpu_overdrive_level
Description: Get the overdrive percent associated with the device with provided
@@ -3826,6 +3827,44 @@ except AmdSmiException as e:
print(e)
```
### amdsmi_get_gpu_accelerator_partition_profile
**Note: CURRENTLY HARDCODED TO RETURN EMPTY VALUES**
Description: Get partition information for target device
Input parameters:
* `processor_handle` the device handle
Output: Dictionary with fields:
Field | Description
---|---
`partition_id` | ID of the partition on the GPU provided
`partition_profile` | Dict containing partition data (TBD)
Exceptions that can be thrown by `amdsmi_get_gpu_accelerator_partition_profile` function:
* `AmdSmiLibraryException`
* `AmdSmiRetryException`
* `AmdSmiParameterException`
Example:
```python
try:
devices = amdsmi_get_processor_handles()
if len(devices) == 0:
print("No GPUs on machine")
else:
for device in devices:
partition_id = amdsmi_get_gpu_accelerator_partition_profile(device)["partition_id"]
print(partition_id)
except AmdSmiException as e:
print(e)
```
### amdsmi_get_xgmi_info
Description: Returns XGMI information for the GPU.
+1
Просмотреть файл
@@ -224,6 +224,7 @@ from .amdsmi_interface import amdsmi_reset_gpu_compute_partition
from .amdsmi_interface import amdsmi_get_gpu_memory_partition
from .amdsmi_interface import amdsmi_set_gpu_memory_partition
from .amdsmi_interface import amdsmi_reset_gpu_memory_partition
from .amdsmi_interface import amdsmi_get_gpu_accelerator_partition_profile
# # Individual GPU Metrics Functions
from .amdsmi_interface import amdsmi_get_gpu_metrics_header_info
+35 -2
Просмотреть файл
@@ -1665,8 +1665,7 @@ def amdsmi_get_gpu_asic_info(
"asic_serial": asic_info_struct.asic_serial.decode("utf-8"),
"oam_id": asic_info_struct.oam_id,
"num_compute_units": asic_info_struct.num_of_compute_units,
"target_graphics_version": "gfx" + str(asic_info_struct.target_graphics_version),
"partition_id": asic_info_struct.partition_id
"target_graphics_version": "gfx" + str(asic_info_struct.target_graphics_version)
}
string_values = ["market_name", "vendor_name"]
@@ -1746,6 +1745,7 @@ def amdsmi_get_power_cap_info(
"min_power_cap": power_info.min_power_cap,
"max_power_cap": power_info.max_power_cap}
def amdsmi_get_gpu_pm_metrics_info(
processor_handle: amdsmi_wrapper.amdsmi_processor_handle,
) -> Dict[str, Any]:
@@ -1773,6 +1773,7 @@ def amdsmi_get_gpu_pm_metrics_info(
amdsmi_wrapper.amdsmi_free_name_value_pairs(pm_metrics)
return results
def amdsmi_get_gpu_reg_table_info(
processor_handle: amdsmi_wrapper.amdsmi_processor_handle,
reg_type: amdsmi_wrapper.amdsmi_reg_type_t,
@@ -1801,6 +1802,7 @@ def amdsmi_get_gpu_reg_table_info(
amdsmi_wrapper.amdsmi_free_name_value_pairs(pm_metrics)
return results
def amdsmi_get_gpu_vram_info(
processor_handle: amdsmi_wrapper.amdsmi_processor_handle,
) -> Dict[str, Any]:
@@ -2564,6 +2566,7 @@ def amdsmi_topo_get_link_type(
return {"hops": hops.value, "type": type.value}
def amdsmi_topo_get_p2p_status(
processor_handle_src: amdsmi_wrapper.amdsmi_processor_handle,
processor_handle_dst: amdsmi_wrapper.amdsmi_processor_handle,
@@ -2716,6 +2719,36 @@ def amdsmi_reset_gpu_memory_partition(processor_handle: amdsmi_wrapper.amdsmi_pr
_check_res(amdsmi_wrapper.amdsmi_reset_gpu_memory_partition(processor_handle))
def amdsmi_get_gpu_accelerator_partition_profile(
processor_handle: amdsmi_wrapper.amdsmi_processor_handle
) -> Dict[str, Any]:
if not isinstance(processor_handle, amdsmi_wrapper.amdsmi_processor_handle):
raise AmdSmiParameterException(
processor_handle, amdsmi_wrapper.amdsmi_processor_handle
)
partition_id = ctypes.c_uint32()
profile = amdsmi_wrapper.amdsmi_accelerator_partition_profile_t()
_check_res(
amdsmi_wrapper.amdsmi_get_gpu_accelerator_partition_profile(processor_handle,
ctypes.byref(profile),
ctypes.byref(partition_id))
)
partition_profile_dict = {
"profile_type" : profile.profile_type,
"num_partitions" : profile.num_partitions,
"profile_index" : profile.profile_index,
"num_resources" : profile.num_resources,
"resources" : "N/A"
}
return {
"partition_id" : partition_id.value,
"partition_profile" : partition_profile_dict
}
def amdsmi_get_xgmi_info(processor_handle: amdsmi_wrapper.amdsmi_processor_handle):
if not isinstance(processor_handle, amdsmi_wrapper.amdsmi_processor_handle):
raise AmdSmiParameterException(
+73 -31
Просмотреть файл
@@ -377,6 +377,23 @@ AMDSMI_CLK_TYPE_DCLK1 = 9
AMDSMI_CLK_TYPE__MAX = 9
amdsmi_clk_type_t = ctypes.c_uint32 # enum
# values for enumeration 'amdsmi_accelerator_partition_type_t'
amdsmi_accelerator_partition_type_t__enumvalues = {
0: 'AMDSMI_ACCELERATOR_PARTITION_INVALID',
1: 'AMDSMI_ACCELERATOR_PARTITION_SPX',
2: 'AMDSMI_ACCELERATOR_PARTITION_DPX',
3: 'AMDSMI_ACCELERATOR_PARTITION_TPX',
4: 'AMDSMI_ACCELERATOR_PARTITION_QPX',
5: 'AMDSMI_ACCELERATOR_PARTITION_CPX',
}
AMDSMI_ACCELERATOR_PARTITION_INVALID = 0
AMDSMI_ACCELERATOR_PARTITION_SPX = 1
AMDSMI_ACCELERATOR_PARTITION_DPX = 2
AMDSMI_ACCELERATOR_PARTITION_TPX = 3
AMDSMI_ACCELERATOR_PARTITION_QPX = 4
AMDSMI_ACCELERATOR_PARTITION_CPX = 5
amdsmi_accelerator_partition_type_t = ctypes.c_uint32 # enum
# values for enumeration 'amdsmi_compute_partition_type_t'
amdsmi_compute_partition_type_t__enumvalues = {
0: 'AMDSMI_COMPUTE_PARTITION_INVALID',
@@ -759,19 +776,6 @@ amdsmi_card_form_factor_t = ctypes.c_uint32 # enum
class struct_amdsmi_pcie_info_t(Structure):
pass
class struct_pcie_static_(Structure):
pass
struct_pcie_static_._pack_ = 1 # source:False
struct_pcie_static_._fields_ = [
('max_pcie_width', ctypes.c_uint16),
('PADDING_0', ctypes.c_ubyte * 2),
('max_pcie_speed', ctypes.c_uint32),
('pcie_interface_version', ctypes.c_uint32),
('slot_type', amdsmi_card_form_factor_t),
('reserved', ctypes.c_uint64 * 10),
]
class struct_pcie_metric_(Structure):
pass
@@ -790,6 +794,19 @@ struct_pcie_metric_._fields_ = [
('reserved', ctypes.c_uint64 * 13),
]
class struct_pcie_static_(Structure):
pass
struct_pcie_static_._pack_ = 1 # source:False
struct_pcie_static_._fields_ = [
('max_pcie_width', ctypes.c_uint16),
('PADDING_0', ctypes.c_ubyte * 2),
('max_pcie_speed', ctypes.c_uint32),
('pcie_interface_version', ctypes.c_uint32),
('slot_type', amdsmi_card_form_factor_t),
('reserved', ctypes.c_uint64 * 10),
]
struct_amdsmi_pcie_info_t._pack_ = 1 # source:False
struct_amdsmi_pcie_info_t._fields_ = [
('pcie_static', struct_pcie_static_),
@@ -904,8 +921,7 @@ struct_amdsmi_asic_info_t._fields_ = [
('num_of_compute_units', ctypes.c_uint32),
('PADDING_0', ctypes.c_ubyte * 4),
('target_graphics_version', ctypes.c_uint64),
('partition_id', ctypes.c_uint32),
('reserved', ctypes.c_uint32 * 14),
('reserved', ctypes.c_uint32 * 15),
('PADDING_1', ctypes.c_ubyte * 4),
]
@@ -921,6 +937,20 @@ struct_amdsmi_kfd_info_t._fields_ = [
]
amdsmi_kfd_info_t = struct_amdsmi_kfd_info_t
class struct_amdsmi_accelerator_partition_profile_t(Structure):
pass
struct_amdsmi_accelerator_partition_profile_t._pack_ = 1 # source:False
struct_amdsmi_accelerator_partition_profile_t._fields_ = [
('profile_type', amdsmi_accelerator_partition_type_t),
('num_partitions', ctypes.c_uint32),
('profile_index', ctypes.c_uint32),
('num_resources', ctypes.c_uint32),
('resources', ctypes.c_uint32 * 32 * 8),
('reserved', ctypes.c_uint64 * 6),
]
amdsmi_accelerator_partition_profile_t = struct_amdsmi_accelerator_partition_profile_t
# values for enumeration 'amdsmi_link_type_t'
amdsmi_link_type_t__enumvalues = {
@@ -2250,6 +2280,9 @@ amdsmi_set_gpu_memory_partition.argtypes = [amdsmi_processor_handle, amdsmi_memo
amdsmi_reset_gpu_memory_partition = _libraries['libamd_smi.so'].amdsmi_reset_gpu_memory_partition
amdsmi_reset_gpu_memory_partition.restype = amdsmi_status_t
amdsmi_reset_gpu_memory_partition.argtypes = [amdsmi_processor_handle]
amdsmi_get_gpu_accelerator_partition_profile = _libraries['libamd_smi.so'].amdsmi_get_gpu_accelerator_partition_profile
amdsmi_get_gpu_accelerator_partition_profile.restype = amdsmi_status_t
amdsmi_get_gpu_accelerator_partition_profile.argtypes = [amdsmi_processor_handle, ctypes.POINTER(struct_amdsmi_accelerator_partition_profile_t), ctypes.POINTER(ctypes.c_uint32)]
amdsmi_init_gpu_event_notification = _libraries['libamd_smi.so'].amdsmi_init_gpu_event_notification
amdsmi_init_gpu_event_notification.restype = amdsmi_status_t
amdsmi_init_gpu_event_notification.argtypes = [amdsmi_processor_handle]
@@ -2447,7 +2480,12 @@ amdsmi_get_esmi_err_msg = _libraries['libamd_smi.so'].amdsmi_get_esmi_err_msg
amdsmi_get_esmi_err_msg.restype = amdsmi_status_t
amdsmi_get_esmi_err_msg.argtypes = [amdsmi_status_t, ctypes.POINTER(ctypes.POINTER(ctypes.c_char))]
__all__ = \
['AGG_BW0', 'AMDSMI_AVERAGE_POWER',
['AGG_BW0', 'AMDSMI_ACCELERATOR_PARTITION_CPX',
'AMDSMI_ACCELERATOR_PARTITION_DPX',
'AMDSMI_ACCELERATOR_PARTITION_INVALID',
'AMDSMI_ACCELERATOR_PARTITION_QPX',
'AMDSMI_ACCELERATOR_PARTITION_SPX',
'AMDSMI_ACCELERATOR_PARTITION_TPX', 'AMDSMI_AVERAGE_POWER',
'AMDSMI_CACHE_PROPERTY_CPU_CACHE',
'AMDSMI_CACHE_PROPERTY_DATA_CACHE',
'AMDSMI_CACHE_PROPERTY_ENABLED',
@@ -2651,21 +2689,23 @@ __all__ = \
'AMDSMI_XGMI_STATUS_MULTIPLE_ERRORS',
'AMDSMI_XGMI_STATUS_NO_ERRORS', 'CLK_LIMIT_MAX', 'CLK_LIMIT_MIN',
'RD_BW0', 'WR_BW0', 'amd_metrics_table_header_t',
'amdsmi_asic_info_t', 'amdsmi_bdf_t', 'amdsmi_bit_field_t',
'amdsmi_board_info_t', 'amdsmi_cache_property_type_t',
'amdsmi_card_form_factor_t', 'amdsmi_clean_gpu_local_data',
'amdsmi_clk_info_t', 'amdsmi_clk_limit_type_t',
'amdsmi_clk_type_t', 'amdsmi_compute_partition_type_t',
'amdsmi_container_types_t', 'amdsmi_counter_command_t',
'amdsmi_counter_value_t', 'amdsmi_cpu_apb_disable',
'amdsmi_cpu_apb_enable', 'amdsmi_cpusocket_handle',
'amdsmi_ddr_bw_metrics_t', 'amdsmi_dev_perf_level_t',
'amdsmi_dimm_power_t', 'amdsmi_dimm_thermal_t',
'amdsmi_dpm_level_t', 'amdsmi_dpm_policy_entry_t',
'amdsmi_dpm_policy_t', 'amdsmi_driver_info_t',
'amdsmi_engine_usage_t', 'amdsmi_error_count_t',
'amdsmi_event_group_t', 'amdsmi_event_handle_t',
'amdsmi_event_type_t', 'amdsmi_evt_notification_data_t',
'amdsmi_accelerator_partition_profile_t',
'amdsmi_accelerator_partition_type_t', 'amdsmi_asic_info_t',
'amdsmi_bdf_t', 'amdsmi_bit_field_t', 'amdsmi_board_info_t',
'amdsmi_cache_property_type_t', 'amdsmi_card_form_factor_t',
'amdsmi_clean_gpu_local_data', 'amdsmi_clk_info_t',
'amdsmi_clk_limit_type_t', 'amdsmi_clk_type_t',
'amdsmi_compute_partition_type_t', 'amdsmi_container_types_t',
'amdsmi_counter_command_t', 'amdsmi_counter_value_t',
'amdsmi_cpu_apb_disable', 'amdsmi_cpu_apb_enable',
'amdsmi_cpusocket_handle', 'amdsmi_ddr_bw_metrics_t',
'amdsmi_dev_perf_level_t', 'amdsmi_dimm_power_t',
'amdsmi_dimm_thermal_t', 'amdsmi_dpm_level_t',
'amdsmi_dpm_policy_entry_t', 'amdsmi_dpm_policy_t',
'amdsmi_driver_info_t', 'amdsmi_engine_usage_t',
'amdsmi_error_count_t', 'amdsmi_event_group_t',
'amdsmi_event_handle_t', 'amdsmi_event_type_t',
'amdsmi_evt_notification_data_t',
'amdsmi_evt_notification_type_t',
'amdsmi_first_online_core_on_cpu_socket',
'amdsmi_free_name_value_pairs', 'amdsmi_freq_ind_t',
@@ -2695,6 +2735,7 @@ __all__ = \
'amdsmi_get_cpu_socket_temperature', 'amdsmi_get_cpucore_handles',
'amdsmi_get_cpusocket_handles', 'amdsmi_get_energy_count',
'amdsmi_get_esmi_err_msg', 'amdsmi_get_fw_info',
'amdsmi_get_gpu_accelerator_partition_profile',
'amdsmi_get_gpu_activity', 'amdsmi_get_gpu_asic_info',
'amdsmi_get_gpu_available_counters',
'amdsmi_get_gpu_bad_page_info', 'amdsmi_get_gpu_bdf_id',
@@ -2804,6 +2845,7 @@ __all__ = \
'amdsmi_vram_vendor_type_t', 'amdsmi_xgmi_info_t',
'amdsmi_xgmi_status_t', 'processor_type_t', 'size_t',
'struct__links', 'struct_amd_metrics_table_header_t',
'struct_amdsmi_accelerator_partition_profile_t',
'struct_amdsmi_asic_info_t', 'struct_amdsmi_board_info_t',
'struct_amdsmi_clk_info_t', 'struct_amdsmi_counter_value_t',
'struct_amdsmi_ddr_bw_metrics_t', 'struct_amdsmi_dimm_power_t',
+20 -11
Просмотреть файл
@@ -774,15 +774,6 @@ amdsmi_get_gpu_asic_info(amdsmi_processor_handle processor_handle, amdsmi_asic_i
info->target_graphics_version = tmp_target_gfx_version;
}
// default to 0xffffffff as not supported
info->partition_id = std::numeric_limits<uint32_t>::max();
auto tmp_partition_id = uint32_t(0);
status = rsmi_wrapper(rsmi_dev_partition_id_get, processor_handle,
&(tmp_partition_id));
if (status == amdsmi_status_t::AMDSMI_STATUS_SUCCESS) {
info->partition_id = tmp_partition_id;
}
return AMDSMI_STATUS_SUCCESS;
}
@@ -1168,6 +1159,24 @@ amdsmi_reset_gpu_memory_partition(amdsmi_processor_handle processor_handle) {
return rsmi_wrapper(rsmi_dev_memory_partition_reset, processor_handle);
}
amdsmi_status_t
amdsmi_get_gpu_accelerator_partition_profile(amdsmi_processor_handle processor_handle,
amdsmi_accelerator_partition_profile_t *profile,
uint32_t *partition_id) {
AMDSMI_CHECK_INIT();
// TODO: also fill out profile later
// default to 0xffffffff if not supported
*partition_id = std::numeric_limits<uint32_t>::max();
auto tmp_partition_id = uint32_t(0);
amdsmi_status_t status = rsmi_wrapper(rsmi_dev_partition_id_get, processor_handle, &tmp_partition_id);
if (status == amdsmi_status_t::AMDSMI_STATUS_SUCCESS){
*partition_id = tmp_partition_id;
}
return status;
}
// TODO(bliu) : other xgmi related information
amdsmi_status_t
amdsmi_get_xgmi_info(amdsmi_processor_handle processor_handle, amdsmi_xgmi_info_t *info) {
@@ -1303,8 +1312,8 @@ void amdsmi_free_name_value_pairs(void *p) {
amdsmi_status_t
amdsmi_get_power_cap_info(amdsmi_processor_handle processor_handle,
uint32_t sensor_ind,
amdsmi_power_cap_info_t *info) {
uint32_t sensor_ind,
amdsmi_power_cap_info_t *info) {
AMDSMI_CHECK_INIT();
if (info == nullptr)
+2 -6
Просмотреть файл
@@ -60,7 +60,7 @@ TestSysInfoRead::TestSysInfoRead() : TestBase() {
set_title("AMDSMI System Info Read Test");
set_description("This test verifies that system information such as the "
"BDFID, AMDSMI version, VBIOS version, "
"vendor_id, unique_id, target_gfx_version, kfd_id, node_id, partition_id, etc. "
"vendor_id, unique_id, target_gfx_version, kfd_id, node_id, etc. "
"can be read properly.");
}
@@ -153,7 +153,7 @@ void TestSysInfoRead::Run(void) {
ASSERT_EQ(err, AMDSMI_STATUS_INVAL);
// vendor_id, unique_id, target_gfx_version, partition_id
// vendor_id, unique_id, target_gfx_version
amdsmi_asic_info_t asic_info = {};
err = amdsmi_get_gpu_asic_info(processor_handles_[i], &asic_info);
if (err == AMDSMI_STATUS_NOT_SUPPORTED) {
@@ -161,7 +161,6 @@ void TestSysInfoRead::Run(void) {
"\t**amdsmi_dev_unique_id() is not supported"
" on this machine" << std::endl;
EXPECT_EQ(asic_info.target_graphics_version, std::numeric_limits<uint64_t>::max());
EXPECT_EQ(asic_info.partition_id, std::numeric_limits<uint32_t>::max());
// Verify api support checking functionality is working
err = amdsmi_get_gpu_asic_info(processor_handles_[i], nullptr);
ASSERT_EQ(err, AMDSMI_STATUS_NOT_SUPPORTED);
@@ -172,12 +171,9 @@ void TestSysInfoRead::Run(void) {
<< asic_info.vendor_name << std::endl;
std::cout << "\t**Target GFX version: " << std::dec
<< asic_info.target_graphics_version << "\n";
std::cout << "\t**Partition ID: " << std::dec
<< asic_info.partition_id << "\n";
}
EXPECT_EQ(err, AMDSMI_STATUS_SUCCESS);
EXPECT_NE(asic_info.target_graphics_version, std::numeric_limits<uint64_t>::max());
EXPECT_NE(asic_info.partition_id, std::numeric_limits<uint32_t>::max());
// Verify api support checking functionality is working
err = amdsmi_get_gpu_asic_info(processor_handles_[i], nullptr);
ASSERT_EQ(err, AMDSMI_STATUS_INVAL);
-2
Просмотреть файл
@@ -511,8 +511,6 @@ def walk_through(self):
asic_info['oam_id']))
print(" asic_info['target_graphics_version'] is: {}\n".format(
asic_info['target_graphics_version']))
print(" asic_info['partition_id'] is: {}\n".format(
asic_info['partition_id']))
print("\n###Test amdsmi_get_gpu_kfd_info \n")
kfd_info = amdsmi.amdsmi_get_gpu_kfd_info(processors[i])
print(" kfd_info['kfd_id'] is: {}\n".format(