Commit Graph

150 Commitit

Tekijä SHA1 Viesti Päivämäärä
Suma Hegde c4aa7d2c03 Change AMDSmiDevice to AMDSmiProcessor
grep -rli 'AMDSmiDevice' * | xargs -i@ sed -i 's/AMDSmiDevice/AMDSmiProcessor/g' @

Change-Id: Ib71e11d7122699cc62df3c4e9711ce3fc51e6fdf
2023-05-11 10:08:40 -04:00
Marko Oblak d1325fcf40 SWDEV-379772 - [Navi32] [SMI-LIB] [Linux] [BM] [Guest] Wrong market name
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: I12d3e650851a3aa474ccbf62628b60d4c385e68c
2023-03-06 17:08:33 +01:00
Marko Oblak 8429df989c SWDEV-371210 - [AMDSMI][LinuxBM] SMILIB returns wrong pcie speed value
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: Ie3ca6997f11d18505df799fef9cd9d53716d53f9
2023-02-28 11:49:20 +01:00
Marko Oblak 7eea4e596b SWDEV-384678 - Resolve issue with amdsmi build failure
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: I073113814d2f6740c9eaea1b298d8aff9ea58c72
2023-02-22 11:00:57 +01:00
Marko Oblak 0aadf7eab2 SWDEV-373291 - Added implementation of versioning solution
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: Ifd0be3f81902466339b6c098ce16d5e49740056c
2023-02-21 17:37:54 +01:00
Dalibor Stanisavljevic 411ef54087 SWDEV-375113 - Fixed process info
The format of the fdinfo file has changed

Change-Id: Iad2e26487e75f3e614e364456e929aa1f6f949a4
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-23 08:13:55 -05:00
Jason Albert 86de0f441f Remove tag values from enum/union/struct declarations
The tag values largely were not used and were causing doxygen
generation issues.
In the few cases where the tags were being referenced, clean up
those compile issues.

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I7b32eac742fb5af560400c13dda2721705d882bc
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-16 13:14:45 +01:00
Dalibor Stanisavljevic 943c42f58f SWDEV-374716 - Fixed asic info
Change-Id: I8d806ef09eca4300fcec0ce6a226d13547dfb728
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-11 11:03:17 -05:00
Bill(Shuzhou) Liu ec48312c61 Remove duplicate temperature function
The amdsmi_dev_get_temp_metric() will cover both function:
amdsmi_get_temperature_measure() using AMDSMI_TEMP_CURRENT
and
amdsmi_get_temperature_limit() using AMDSMI_TEMP_CRITICAL
Remove those two function.

It also merge the amdsmi_get_power_limit() into
amdsmi_get_power_measure()

Change-Id: I40d4afeb2ec0ac7b64832729f36adfaae120c990
2023-01-11 08:13:37 -06:00
Bill(Shuzhou) Liu 79bd9c1d5f change sensor_type in amdsmi_dev_get_temp_metric() to enum
The sensor_type in amdsmi_dev_get_temp_metric() will be changed to
amdsmi_temperature_type_t

Change-Id: I72a7f271b0a55a025acc2ca523062a3d51cc036d
2023-01-04 13:01:04 -06:00
Dalibor Stanisavljevic cb013d25ff SWDEV-370502 - Reserved fields in structs
Change-Id: I23aed12baf6b3173eb149eb3b969e55d7e4360ee
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-26 10:51:44 -05:00
Dalibor Stanisavljevic 4c56e9e3d6 SWDEV-371199 - Return NOT_INIT when amdsmi initialization fails
Change-Id: Ifb40aef3a62885b08164e9aa944bf9b5c375ebfd
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-19 16:29:29 +01:00
Bill(Shuzhou) Liu 221d6fdc5c Make amdsmi function name consistent
Some of the amdsmi function have the verb (set or get) at the
end of the function. Move it to the middle to be consistent with
other APIs.

Change-Id: I8053d16f46af951c25aaaf8febf2896a33633fa1
2022-12-16 10:20:49 -06:00
Dalibor Stanisavljevic b4b761d02f SWDEV-370223 - Change the name of the header to amdsmi.h
Change dev to device_handle throughout the file
Change the pcie_info pcie_speed field type to uint32_t
Add AMDSMI prefix before amdsmi_mm_ip enum

Change-Id: I242145389ddc3f2ad05dfd6ca371640f4d118fc4
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-08 13:34:34 +01:00
Jason Albert b4cde9adec Doxygen related cleanup
- Made all doxygen formatting consistent with @ use
- Added @file definition to fix a lot of missed references
- Simplified return definitions for easier maintainability
- Fixed bad formatting and missing section closures

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I02cc55f7d0ae277f318a4620978af096f56cac6c
2022-12-07 10:41:33 -05:00
Jason Albert 3b1584915b Set status codes to fixed values
Assign fixed values to status codes to prevent enum auto assign
from changing them.

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I0ca1de7ba503ce8a75c56026f5a54e212204595b
2022-12-07 10:39:26 -05:00
Dalibor Stanisavljevic 76f6cf7a9d SWDEV-366720 - Changed amdsmi_get_device_handle_from_bdf
Changed implementation and input parameters

Change-Id: Ifca3247132eb4033f99d74617a53f54ad076dad0
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-11-22 10:28:45 -05:00
Dalibor Stanisavljevic 9cad9e5216 SWDEV-361376 - Add README for python tool
- Add up to date README file for python tool

Change-Id: I7a02f79469e990870398b3741b033ea447998fdd
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-11-10 16:57:49 +01:00
Bill(Shuzhou) Liu b34b7451e8 Init the amdsmi using rocm_smi for libdrm
Init the ams_smi using the rocm-smi, which makes the GPU discovery
consistent with or without libdrm.

Change-Id: Ic714781f8ce791451b0c057621525926edb7f5ee
2022-11-07 11:09:09 -06:00
Galantsev, Dmitrii c99e4e1501 Cleanup CMakeLists.txt for packaging
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-11-03 12:44:23 -05:00
Dejan Andjelkovic 6064f160a3 SWDEV-361376 - Add python wrapper
- Add generator for python wrapper
- Add interface, exception and init files
- Add CMake custom targets

Change-Id: I63c1d94fbb587387c22f559a3db79987eb214a2e
Signed-off-by: Dejan Andjelkovic <Dejan.Andjelkovic@amd.com>
2022-10-20 09:24:53 -05:00
Bill(Shuzhou) Liu 2b2d11c446 Change the get_socket_handles and get_device_handles APIs interface
Those two APIs are changed to let the user get the handles count,
allocate memory, and then return handles to the allocated memory.

Change-Id: Ibe28a89ad188c99da6af3af1740b2b25ff22ba06
2022-10-20 09:24:31 -05:00
Dalibor Stanisavljevic 3daf9c1063 SWDEV-353742 - Port smilib function to amdsmi
Change-Id: I99df249755a5c665a8dd1777fa82d046e139bd77
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-10-20 09:24:22 -05:00
Bill(Shuzhou) Liu 0c91ef919d Restructure the folder
Move rocm_smi related function to rocm_smi folder. Move amd_smi to
top level include/ and src/ folder. Remove obsolte oam folder.
Change the CMakeLists.txt to update folder locations.

Change-Id: I52e6be739e49f3b0545865f25364787f5985e9c3
2022-10-20 09:23:51 -05:00
Bill(Shuzhou) Liu f1d02aca79 Port rocm-smi function to amd-smi
Port most rocm-smi function to amd-smi and add unit tests.

Change-Id: I6387a4bdaf20ead2389c99bb01d438156ccd0747
2022-09-06 12:08:59 -04:00
Divya Shikre afe996c2ed Update get_frequencies to handle failures.
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
   their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6
2022-05-11 15:33:15 -04:00
Divya Shikre 99be3451d7 Add DEBUG_LOG macro
Add DEBUG_LOG that will optionally print error
message when RSMI_DEBUG_BITFIELD is set to 2.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6017e92d8a9e5f9861ae29ece0488d4bc198f996
2022-05-11 11:03:24 -04:00
Divya Shikre c9b42bff57 Add RSMI_CLK_TYPE_PCIE to rsmi_clk_type_t
showclocks/showclkfrq does not display pp_dpm_pcie values
in sriov. This fix adds pcie clocks to rsmi_clk_type_t
where rest of the clocks are present.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6d129ae412623b369c14456ae9781b2dbceb2139
2022-05-06 09:15:39 -04:00
Ori Messinger 9d6403bb17 ROCm SMI LIB: Add Missing GPU Blocks
This patch adds the following 4 missing GPU blocks to the SMI LIB:
-RSMI_GPU_BLOCK_MMHUB
-RSMI_GPU_BLOCK_PCIE_BIF
-RSMI_GPU_BLOCK_HDP
-RSMI_GPU_BLOCK_XGMI_WAFL

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia1ec6f53e195f4bf7b8f073d6bed4fdb6572e546
2022-05-05 00:44:16 -04:00
Harish Kasiviswanathan 8de6ed2b8d rocm_smi_lib: add stdbool.h needed for C90
'bool' keyword is supported only from C99 onwards. Include stdbool.h
for older compilers

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I09fd5cf6eac20e7185e85a1123bc4826958b2b7c
2021-12-14 15:25:59 -05:00
Elena Sakhnovitch 50ea68e694 [ROCm SMI LIB]: Add rsmi_minmax_bandwidth_get()
API provides min/max bandwidth values between nodes.
(Current implementation only supports directly (1 hop)
connected XGMI devices.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ifc95da13845fbe7903c5386d320183ffd58c5b53
2021-10-28 17:00:41 -04:00
Ori Messinger ff02042c64 ROCm SMI LIB: Add rsmi_is_P2P_accessible() API
Implements rsmi_is_p2p_accessible API.
The function returns True if P2P is possible between two nodes.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ic7316eebcec4480175c7ad04c21a42b2e1a4c454
2021-10-13 22:01:33 -04:00
Elena Sakhnovitch 5e1bfcadd7 rocm_smi_lib: fix gpu_metrics_v1_3 support
Signed-off-by: Elena Sakhnovitch
Change-Id: Ia7a6b17eb0f317465613ba92ae7548a221c46ee3
2021-08-13 11:59:50 -04:00
Elena Sakhnovitch fee82af1fe rocm_smi_lib: add gpu_metrics_v1_3 support
Signed-off-by: Elena Sakhnovitch
Change-Id: I4a9dedc80b8fce60e12c5baf8651d54d16a6a41c
2021-08-13 09:23:35 -04:00
Harish Kasiviswanathan 14201290a2 Add timestamp resolution info in comments
Specify that timestamp resolution is in ns in header file.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4db00a07c0b5c43ae23c98213f2fbbcf93110234
2021-05-05 12:32:58 -04:00
Harish Kasiviswanathan 6b10a7761b Add support to read gpu_metrics version 1.2
gpu_metrics version 1.2 provides atomic timestamp. Use this timestamp.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I7a1a675f53b93718f34b1f2979173e9064e0ef93
2021-05-05 12:31:10 -04:00
Harish Kasiviswanathan e83cf605c6 Change #define RSMI_GPU_METRICS_API_CONTENT_VER
Chnage to RSMI_GPU_METRICS_API_CONTENT_VER_1. In preparation for
supporting additional formats

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4367a2622a0fa41e6b05bc4436ecd24b8c4e30e2
2021-05-04 20:51:10 -04:00
Ori Messinger 83cd2fe4f1 ROCm SMI LIB: Add Default GPU Power Cap
Implement default GPU power cap functionality in the LIB.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia6b3420beb0e4df5559c3e6d11d0667972590b53
2021-04-22 10:49:55 -04:00
Harish Kasiviswanathan 844acbc0d8 Add energy counter resolution to rsmi_dev_energy_count_get
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I03b70968257db7a45e21d7ba62542cdedd18eb85
2021-04-22 10:25:06 -04:00
Divya Shikre 9f9a7aaf65 Add new setrange function in C++ lib
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I670aaeb93827bf4b2cc08eb36d0f9756f00e4e4e
2021-04-19 22:38:59 -04:00
Elena a383dd23aa [rocm-smi-lib] add HBM temperature conversion factor
Change-Id: I45339c87c3d2a40670baf1b76ada60dceb650dc0
2021-04-19 16:41:48 -04:00
Bill(Shuzhou) Liu 8eec0a7d36 Add energy accumulator counter
The energy accumulator counter tracks all energy consumed.

Change-Id: I5b25f817b7802d81c477361447f0ecd7ec02fc61
2021-04-14 10:43:01 -04:00
Bill(Shuzhou) Liu 9bfb9ac297 Add coarse grain utilization counter
The coarse grain utilization counter includes GFX and Memory activity.

Change-Id: I5d09976792d3f4a1c1081651fa24ff857016d4c0
2021-04-14 10:40:19 -04:00
Divya Shikre aaf2120117 Update performance determinism api as per the modified sysfs interface.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ib0ec5128819644a2ff6c916da9194a7fe1dad795
2021-04-07 16:38:48 -04:00
Bill(Shuzhou) Liu da480b4589 Add support for the HBM temperature
The rsmi_dev_temp_metric_get() can also support the HBM
temperatures which is retrieved from gpu_metrics.

Change-Id: I96b979296e90cf881523627b41b1a02849676416
2021-04-05 15:55:55 -04:00
Chris Freehill 5e2a4f3a15 Handle different gpu_metrics content versions for format v1
Change-Id: I344d1815da683befc8f8b5caf921803b267ae29f
2021-03-24 14:34:55 -05:00
Chris Freehill ce475b009c Adjust event counters to report only new events
Previously, RSMI assumed that the event counter values returned
from perf were only new events. But in fact, when we read the
counter values, they are running totals. To account for this, we
now record the value we read and take the difference between the
current value and the previously recorded value.

Change-Id: I1e04b514e89c7c4d4719889f2dae3a1283864e7f
2021-02-24 11:02:17 -06:00
Chris Freehill ff9546aa62 Don't use hwmon# as indicator of gpu
Previously, during the rsmi_init discovery process, the existence
of an hwmon# directory was used to distinguish between gpus nodes
and non-gpu nodes. This isn't reliable in some scenarios. Instead,
the existence of the vbios_version file is used as an
indicator that the node is indeed a gpu.

Change-Id: Icfbe5c42ed0970077b05f25c3d209308a31bec85
2021-01-29 13:05:10 -05:00
Ori Messinger 80f629b9be ROCm SMI Python CLI & LIB: Add GPU Reset Functionality
The purpose of this patch is to implement GPU reset functionality
in the LIB, and to call it from the rocm_smi python CLI.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Iaf525f7016f8354a7fd93af0209ca2e97ef4fd56
2021-01-26 17:52:24 -05:00
Chris Freehill 68095b50e7 Introduce RSMI_DEBUG_INFINITE_LOOP
The environment variable RSMI_DEBUG_INFINITE_LOOP is introduced
to facilitate debugging RSMI in user applications. When this
env. variable is non-zero, an infinite loop will be entered in
rsmi_init(). At this point, a debugger can be attached and RSMI
can be debugger. This only applies to debug builds.

Change-Id: I23f6dd730fc965764295070de053314a1cc5b6aa
2021-01-06 10:30:24 -05:00