df6de25624
Changes:
- Updated AMD SMI firmware to display "N/A" for unavailable firmware in partitioned environments, improving clarity.
Example (in DPX):
$ amd-smi firmware
GPU: 0
FW_LIST:
...
FW 12:
FW_ID: PM
FW_VERSION: 00.86.39.00
GPU: 1
FW_LIST: N/A
- Fixed amd-smi partition not showing current partition information on
asics with inablity to set memory or accelerator partitions.
$ amd-smi partition -c -m
CURRENT_PARTITION:
GPU_ID MEMORY ACCELERATOR_TYPE ACCELERATOR_PROFILE_INDEX PARTITION_ID
0 NPS1 CPX 2 0
1 N/A N/A N/A 1
2 N/A N/A N/A 2
3 N/A N/A N/A 3
4 N/A N/A N/A 4
5 N/A N/A N/A 5
6 NPS1 SPX 0 0
7 NPS1 SPX 0 0
8 NPS1 SPX 0 0
MEMORY_PARTITION:
GPU_ID MEMORY_PARTITION_CAPS CURRENT_MEMORY_PARTITION
0 N/A NPS1
1 N/A N/A
2 N/A N/A
3 N/A N/A
4 N/A N/A
5 N/A N/A
6 N/A NPS1
7 N/A NPS1
8 N/A NPS1
- Refactored amd_smi_drm_example.cc:
- Grouped partition changes and restores original partition settings.
- Now handles partitioned environments allowing example to continue even if some APIs are not supported in partitioned configurations.
- Modified amdsmi_asic_info_t (see amdsmi_get_gpu_asic_info()) to report OAM ID as N/A if 0xFFFFFFFF (was 0xFFFF).
Allows for better handling of OAM IDs in partitioned environments (DNE for non-primary nodes,
since its a physical identifier). Easier to handle in tests and example code (ie. now consistent w/ max size of the structure's value).
- Introduced amdsmi_RAII_open_FD() (internal API) to manage file descriptors using RAII, ensuring proper closure and preventing resource leaks.
Updated the following APIs to use this function:
- amdsmi_get_gpu_asic_info(), amdsmi_get_gpu_vram_usage(),
amdsmi_get_gpu_vram_info(), amdsmi_get_gpu_vbios_info(),
amdsmi_get_gpu_driver_info(), amdsmi_get_gpu_virtualization_mode()
- Updated AMD SMI test_base.cc/.h:
- Improved output and handling for partitioned environments.
- Added detailed ASIC information logging to align with structure changes.
- Enhanced error messages for better context before ASSERT checks.
- Resolved test failures in partitioned environments by updating
logic and handling for partition-specific configurations.
Fixed tests include:
- computepartition_read_write.cc, frequencies_read_write.cc,
gpu_metrics_read.cc, mem_util_read.cc, memorypartition_read_write.cc,
perf_level_read.cc, perf_level_read_write.cc, power_cap_read_write.cc,
power_read.cc, sys_info_read.cc, gpu_busy_read.cc
Change-Id: I36e903f8fddd714c74c719459c71aba8bbb77e6f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Resetting head + adding fixes for tests ran in partitions
Change-Id: I0c1e9ac07488b50c95f3bc6d8a724e67d2c715dc
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 391451752b]