Граф коммитов

175 Коммитов

Автор SHA1 Сообщение Дата
Marko Oblak 5e168c8f6a SWDEV-394359 - [AMDSMI] [Linux] [Guest] Resolved issue: status string and socket info API failing
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: I6fe3beafbf2cd3d2701dd5f78dac2bcf2d9b3aa9
2023-04-19 05:43:25 -04:00
Maisam Arif 452bffb6b4 CI Build directory fix
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I7600c3c4c2fd45ca240ee8ec04de55dc29c26365
2023-03-30 11:38:58 -04:00
Maisam Arif 4cc7244fb6 AMDSMI CLI Version 0.0.2
Added Rocm Set Commands
Wrapped all amdsmi_interface calls with error handling

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ic6b3648ef01ded8ee1fb7f0f14f3ca7bc069c567
2023-03-30 01:25:11 -05:00
AravindanC 689d58d2c9 SWDEV-351540 - ASAN packaging for amd-smi
Change-Id: I5f0bf5330727e11159db87c2814904a2832df385
2023-03-23 10:40:10 -07:00
Dalibor Stanisavljevic 3af2687f17 SWDEV-387561 - Fixed market name
In case there is no device id to map to the corresponding market_name,
the rsmi_dev_brand_get function is used to retrieve the market_name

Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
Change-Id: I565089d34b2b7e5f714e0dd41062ac8d52095835
2023-03-15 12:54:14 +01:00
Marko Oblak d1325fcf40 SWDEV-379772 - [Navi32] [SMI-LIB] [Linux] [BM] [Guest] Wrong market name
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: I12d3e650851a3aa474ccbf62628b60d4c385e68c
2023-03-06 17:08:33 +01:00
Marko Oblak 8429df989c SWDEV-371210 - [AMDSMI][LinuxBM] SMILIB returns wrong pcie speed value
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: Ie3ca6997f11d18505df799fef9cd9d53716d53f9
2023-02-28 11:49:20 +01:00
Marko Oblak db9d8793be SWDEV-381227 - [AMDSMI][Linux][BM] SMILIB returns wrong temperature value
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: Idc9929d1cfd882bb33abf040378587f68d22b31a
2023-02-21 17:21:15 +01:00
Dalibor Stanisavljevic c469c3d505 SWDEV-375213 - Renamed gpu_device to gpudevice for amdsmi_get_power_cap_info
Change-Id: I8518587f35e4ce897317a09505435eee7a8f81f8
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-26 14:11:26 +01:00
Dalibor Stanisavljevic ff553cdb56 SWDEV-375213 - Separate smi from rocm part inside functions
Change-Id: I81d2e9d02794ac017a74b3273c6f5a8c85b042a0
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-26 07:28:53 -05:00
Dalibor Stanisavljevic 411ef54087 SWDEV-375113 - Fixed process info
The format of the fdinfo file has changed

Change-Id: Iad2e26487e75f3e614e364456e929aa1f6f949a4
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-23 08:13:55 -05:00
Dalibor Stanisavljevic cf7a92f383 SWDEV-373282 - Fixed compiler warnings
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
Change-Id: Ieacf1057ad23f9a31d47a6d1199c90d8fa0d12db
2023-01-23 03:37:29 -05:00
Jason Albert 86de0f441f Remove tag values from enum/union/struct declarations
The tag values largely were not used and were causing doxygen
generation issues.
In the few cases where the tags were being referenced, clean up
those compile issues.

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I7b32eac742fb5af560400c13dda2721705d882bc
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-16 13:14:45 +01:00
Dalibor Stanisavljevic 49aad0f898 SWDEV-375098 - Added check if driver sysfs node exists
Change-Id: I2524f96e5447fd3a34aa16efe3dfc271b7df62b9
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-16 10:58:27 +01:00
Bill(Shuzhou) Liu f19da1bb2c Crash when fails to open sysfs file
When it fails to open sysfs file, it may crash. Modify the condition
to check the file descriptor after open the file.

Change-Id: I2acdc55f8194a2d734db20d16e1660a20ba09574
2023-01-13 08:15:58 -06:00
Dalibor Stanisavljevic 943c42f58f SWDEV-374716 - Fixed asic info
Change-Id: I8d806ef09eca4300fcec0ce6a226d13547dfb728
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-11 11:03:17 -05:00
Bill(Shuzhou) Liu ec48312c61 Remove duplicate temperature function
The amdsmi_dev_get_temp_metric() will cover both function:
amdsmi_get_temperature_measure() using AMDSMI_TEMP_CURRENT
and
amdsmi_get_temperature_limit() using AMDSMI_TEMP_CRITICAL
Remove those two function.

It also merge the amdsmi_get_power_limit() into
amdsmi_get_power_measure()

Change-Id: I40d4afeb2ec0ac7b64832729f36adfaae120c990
2023-01-11 08:13:37 -06:00
Bill(Shuzhou) Liu 79bd9c1d5f change sensor_type in amdsmi_dev_get_temp_metric() to enum
The sensor_type in amdsmi_dev_get_temp_metric() will be changed to
amdsmi_temperature_type_t

Change-Id: I72a7f271b0a55a025acc2ca523062a3d51cc036d
2023-01-04 13:01:04 -06:00
Dalibor Stanisavljevic e22e72d4c3 SWDEV-371492 - Added check that device_handle is valid
Change-Id: Ic1b593fd5f781650528c860c372fa9864624255d
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-22 12:57:42 +01:00
Dalibor Stanisavljevic 4c56e9e3d6 SWDEV-371199 - Return NOT_INIT when amdsmi initialization fails
Change-Id: Ifb40aef3a62885b08164e9aa944bf9b5c375ebfd
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-19 16:29:29 +01:00
Bill(Shuzhou) Liu 221d6fdc5c Make amdsmi function name consistent
Some of the amdsmi function have the verb (set or get) at the
end of the function. Move it to the middle to be consistent with
other APIs.

Change-Id: I8053d16f46af951c25aaaf8febf2896a33633fa1
2022-12-16 10:20:49 -06:00
Dalibor Stanisavljevic 238f885e14 SWDEV-371561 - Fixed vbios version string value
Change-Id: Ide06784200084741e6cde606492bf03a760b9601
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-09 15:19:12 +01:00
Dalibor Stanisavljevic a2a38a5aa2 SWDEV-371210 - Fixed pcie link speed
Change-Id: I736d8095c05ee0685db0c209ea0fdb5832e14744
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-08 12:03:50 -05:00
Dalibor Stanisavljevic b93baf686d SWDEV-371191 - Fixed amdsmi_get_bad_page_info
Change-Id: I97134f548164eff588d9caa9b9f31c4361c78804
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-08 12:03:34 -05:00
Dalibor Stanisavljevic b4b761d02f SWDEV-370223 - Change the name of the header to amdsmi.h
Change dev to device_handle throughout the file
Change the pcie_info pcie_speed field type to uint32_t
Add AMDSMI prefix before amdsmi_mm_ip enum

Change-Id: I242145389ddc3f2ad05dfd6ca371640f4d118fc4
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-08 13:34:34 +01:00
Jason Albert 3b1584915b Set status codes to fixed values
Assign fixed values to status codes to prevent enum auto assign
from changing them.

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I0ca1de7ba503ce8a75c56026f5a54e212204595b
2022-12-07 10:39:26 -05:00
Galantsev, Dmitrii aeb0bf5832 CMAKE: Repackage whole project for ROCm 5.5 release
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I932b11a111c8e0db04bd8c5e0c3d1a470e5b2386
2022-11-29 17:04:32 -06:00
Dalibor Stanisavljevic 76f6cf7a9d SWDEV-366720 - Changed amdsmi_get_device_handle_from_bdf
Changed implementation and input parameters

Change-Id: Ifca3247132eb4033f99d74617a53f54ad076dad0
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-11-22 10:28:45 -05:00
Dalibor Stanisavljevic 9cad9e5216 SWDEV-361376 - Add README for python tool
- Add up to date README file for python tool

Change-Id: I7a02f79469e990870398b3741b033ea447998fdd
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-11-10 16:57:49 +01:00
Bill(Shuzhou) Liu b34b7451e8 Init the amdsmi using rocm_smi for libdrm
Init the ams_smi using the rocm-smi, which makes the GPU discovery
consistent with or without libdrm.

Change-Id: Ic714781f8ce791451b0c057621525926edb7f5ee
2022-11-07 11:09:09 -06:00
Bill(Shuzhou) Liu 9a92ea833f The device name and vbios version is incorrect
Get the device name from rocm-smi which is not displayed properly
in some cards. Set the vibos version using the rocm-smi.

Change-Id: I138f1760cde94007cb93cad02c6d8cccbb4afa28
2022-10-28 13:03:18 -05:00
Dejan Andjelkovic 6064f160a3 SWDEV-361376 - Add python wrapper
- Add generator for python wrapper
- Add interface, exception and init files
- Add CMake custom targets

Change-Id: I63c1d94fbb587387c22f559a3db79987eb214a2e
Signed-off-by: Dejan Andjelkovic <Dejan.Andjelkovic@amd.com>
2022-10-20 09:24:53 -05:00
Bill(Shuzhou) Liu 2b2d11c446 Change the get_socket_handles and get_device_handles APIs interface
Those two APIs are changed to let the user get the handles count,
allocate memory, and then return handles to the allocated memory.

Change-Id: Ibe28a89ad188c99da6af3af1740b2b25ff22ba06
2022-10-20 09:24:31 -05:00
Dalibor Stanisavljevic 3daf9c1063 SWDEV-353742 - Port smilib function to amdsmi
Change-Id: I99df249755a5c665a8dd1777fa82d046e139bd77
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-10-20 09:24:22 -05:00
Bill(Shuzhou) Liu 0c91ef919d Restructure the folder
Move rocm_smi related function to rocm_smi folder. Move amd_smi to
top level include/ and src/ folder. Remove obsolte oam folder.
Change the CMakeLists.txt to update folder locations.

Change-Id: I52e6be739e49f3b0545865f25364787f5985e9c3
2022-10-20 09:23:51 -05:00
Divya Shikre b23cfc0e82 Fix mem leaks observed while running rsmitst
1.  Memory allocated for handle was not deleted
when no variant, subvariant or supported function
was found
2. handle->func_id_iter address was set to 0
before delete[]

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iab50fdfbe03eec8e6fd0e84e03bd2c47e645b3d8
2022-05-18 14:31:44 -04:00
Divya Shikre afe996c2ed Update get_frequencies to handle failures.
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
   their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6
2022-05-11 15:33:15 -04:00
Divya Shikre 99be3451d7 Add DEBUG_LOG macro
Add DEBUG_LOG that will optionally print error
message when RSMI_DEBUG_BITFIELD is set to 2.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6017e92d8a9e5f9861ae29ece0488d4bc198f996
2022-05-11 11:03:24 -04:00
Divya Shikre c9b42bff57 Add RSMI_CLK_TYPE_PCIE to rsmi_clk_type_t
showclocks/showclkfrq does not display pp_dpm_pcie values
in sriov. This fix adds pcie clocks to rsmi_clk_type_t
where rest of the clocks are present.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6d129ae412623b369c14456ae9781b2dbceb2139
2022-05-06 09:15:39 -04:00
Ori Messinger 9d6403bb17 ROCm SMI LIB: Add Missing GPU Blocks
This patch adds the following 4 missing GPU blocks to the SMI LIB:
-RSMI_GPU_BLOCK_MMHUB
-RSMI_GPU_BLOCK_PCIE_BIF
-RSMI_GPU_BLOCK_HDP
-RSMI_GPU_BLOCK_XGMI_WAFL

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia1ec6f53e195f4bf7b8f073d6bed4fdb6572e546
2022-05-05 00:44:16 -04:00
Bill(Shuzhou) Liu 7860de5107 Suppress "rsmi_init() failed" error message
When an application call the library in a system without amdgpu,
it may always print out "rsmi_init() failed". Suppress the error
message in the library.

Change-Id: Ice63dd3a764b221a6935536bff1bfa6aa3e51a46
2022-04-12 09:44:00 -04:00
Sreekant Somasekharan dbe3403bd3 make string variable 'tpath' an empty string.
string variable not being empty can lead to incorrect compilation
and corrupted output.

Change-Id: Ie66756c28aef7417759c29387500970a8b53e44c
2022-03-11 21:22:28 -05:00
Bill(Shuzhou) Liu 4b65b0307f Prevent stack buffer overflow
readlink() does not append a null byte to buffer. Initialize the
tpath to prevent stack buffer overflow.

Change-Id: I17895dc3576b080a0c35bd0528a5b83223ec1c1b
2022-03-03 15:43:53 -05:00
Laurent Morichetti 2804bf7c28 Don't use NDEBUG when the intent is !DEBUG
CMakeLists.txt does not set up the DEBUG macro correctly to mean
!NDEBUG, so, as a workaround, replace all uses of ifdef NDEBUG with
ifndef DEBUG in the library sources.

Change-Id: I408adb36d1a2310fb894a486574469662ebb27cd
(cherry picked from commit 9f87197d8d)
2022-01-27 11:08:48 -05:00
Divya Shikre ec71380e1c Add fix to check for vector size while reading pp_dpm_pcie
pop_back() was causing a seg fault when pp_dpm_pcie file is empty and returns whitespace.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I888f1f79751cd456e43751a5b96d08560a039677
2022-01-26 10:34:57 -05:00
Divya Shikre 432df20321 Add null ptr check for temperature read from all sensors.
The (temperature == nullptr) check happens only when HBM temperature is retrieved.
This check needs to apply in other cases as well, hence moving this outside the HBM condition.
This should return RSMI_STATUS_INVALID_ARGS consistently in all cases when nullptr is passed through rsmitst.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iea3cec75312a0a669c7da27e15e9782e6a885c5f
2021-12-01 14:05:46 -05:00
Divya Shikre 7b1daaef96 Add fix to display correct GPU Memory Activity and GFX Activity value.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab
2021-11-25 14:28:06 -05:00
Divya Shikre f61cb1b41d Add fix for out of range temperature value for HBM.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iacb6474486e3732f2aa824ff447c17f8243b65cd
2021-11-23 15:37:41 -05:00
Elena Sakhnovitch 50ea68e694 [ROCm SMI LIB]: Add rsmi_minmax_bandwidth_get()
API provides min/max bandwidth values between nodes.
(Current implementation only supports directly (1 hop)
connected XGMI devices.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ifc95da13845fbe7903c5386d320183ffd58c5b53
2021-10-28 17:00:41 -04:00
Ori Messinger ff02042c64 ROCm SMI LIB: Add rsmi_is_P2P_accessible() API
Implements rsmi_is_p2p_accessible API.
The function returns True if P2P is possible between two nodes.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ic7316eebcec4480175c7ad04c21a42b2e1a4c454
2021-10-13 22:01:33 -04:00