Commit Graph

420 Commits

Author SHA1 Message Date
Deepak Mewar ac1b857f24 Renamed API amdsmi_dev_get_drm_render_minor to
amdsmi_get_gpu_drm_render_minor

grep -rli 'amdsmi_dev_get_drm_render_minor' * | xargs -i@ sed -i
's/amdsmi_dev_get_drm_render_minor/amdsmi_get_gpu_drm_render_minor/g' @

Change-Id: Icf6a1ba28ee3ff7f1fa66bad5d600725aad0bfca
2023-05-11 10:34:01 -04:00
Deepak Mewar 82dbd07b2c Renamed API amdsmi_dev_get_subsystem_name to
amdsmi_get_gpu_subsystem_name

grep -rli 'amdsmi_dev_get_subsystem_name' * | xargs -i@ sed -i
's/amdsmi_dev_get_subsystem_name/amdsmi_get_gpu_subsystem_name/g' @

Change-Id: Ib6f2b03f824e1ee910cfbbd0bab09ad859ec252b
2023-05-11 10:33:20 -04:00
Deepak Mewar 14f1367d6b Renamed API amdsmi_dev_get_subsystem_id to
amdsmi_get_gpu_subsystem_id

grep -rli 'amdsmi_dev_get_subsystem_id' * | xargs -i@ sed -i
's/amdsmi_dev_get_subsystem_id/amdsmi_get_gpu_subsystem_id/g' @

Change-Id: I64616ac4c001f7761b8d83120d05d21c5c8e763f
2023-05-11 10:32:32 -04:00
Deepak Mewar f69a6ea64e Renamed API amdsmi_dev_get_vram_vendor to
amdsmi_get_gpu_vram_vendor

grep -rli 'amdsmi_dev_get_vram_vendor' * | xargs -i@ sed -i
's/amdsmi_dev_get_vram_vendor/amdsmi_get_gpu_vram_vendor/g' @

Change-Id: I3c11643a778f147027d0d3121b9782931439c752
2023-05-11 10:32:00 -04:00
Deepak Mewar 21da55b9df Renamed API amdsmi_dev_get_vendor_name to
amdsmi_get_gpu_vendor_name

grep -rli 'amdsmi_dev_get_vendor_name' * | xargs -i@ sed -i
's/amdsmi_dev_get_vendor_name/amdsmi_get_gpu_vendor_name/g' @

Change-Id: Ib31c1387150d0dd268d1bd54cfb43786c7ec41c1
2023-05-11 10:31:41 -04:00
Deepak Mewar 20222f771e Renamed API amdsmi_dev_get_id to amdsmi_get_gpu_id
grep -rli 'amdsmi_dev_get_id' * | xargs -i@ sed -i
's/amdsmi_dev_get_id/amdsmi_get_gpu_id/g' @

Change-Id: I78faeff9a94250454bcecfaa50b5c7cc7e04cb98
2023-05-11 10:30:53 -04:00
Deepak Mewar 5ed24711c7 Renamed API amdsmi_get_device_uuid to
amdsmi_get_gpu_device_uuid

grep -rli 'amdsmi_get_device_uuid' * | xargs -i@ sed -i
's/amdsmi_get_device_uuid/amdsmi_get_gpu_device_uuid/g' @

Change-Id: I40bf740235a1a98d7d12964378b0b45208987c9e
2023-05-11 10:30:32 -04:00
Deepak Mewar 3fec3b4b4a Renamed API amdsmi_get_device_bdf to amdsmi_get_gpu_device_bdf
grep -rli 'amdsmi_get_device_bdf' * | xargs -i@ sed -i
's/amdsmi_get_device_bdf/amdsmi_get_gpu_device_bdf/g' @

Change-Id: I3db605f8bdb0a83b1f0f7f300a663c47563ba651
2023-05-11 10:30:05 -04:00
Suma Hegde 6e6176b04a Change variable name from device to processor
device_count to processor_count
devices to processor
device to processor
also handle_to_device is renamed to handle_to_processor

Change-Id: Ie9c7e01bc1b83058eeaae80934526d06468f4f5c
2023-05-11 10:29:35 -04:00
Suma Hegde ae39d9707d Change device to processor in API names and variable names
Change-Id: Ide71fabeaa837f2035dc9726162cc537e40b4a57
2023-05-11 10:29:01 -04:00
Suma Hegde dd00a16124 Change device_type to processor_type
also rename amdsmi_get_device_type to amdsmi_get_processor_type

grep -rli 'device_type' * | xargs -i@ sed -i
's/device_type/processor_type/g' @

Change-Id: Ic6a73c1a170757d5ab5d10ad20b4fc2f0b280e78
2023-05-11 10:28:31 -04:00
Suma Hegde 3f9e4d95d4 Change device_handle to processor_handle
grep -rli 'device_handle' * | xargs -i@ sed -i
's/device_handle/processor_handle/g' @

Change-Id: Ifc8b7fa3b5488ce1fa8d8cf9eb3981a09450de11
2023-05-11 10:11:24 -04:00
Suma Hegde 3963036a05 Change amdsmi_device_handle to amdsmi_processor_handle
grep -rli 'amdsmi_device_handle' * | xargs -i@ sed -i
's/amdsmi_device_handle/amdsmi_processor_handle/g' @

Change-Id: Ie25c51933dcc31e5b34c8070d0d5ba0e8cd05cc1
2023-05-11 10:09:11 -04:00
Suma Hegde c4aa7d2c03 Change AMDSmiDevice to AMDSmiProcessor
grep -rli 'AMDSmiDevice' * | xargs -i@ sed -i 's/AMDSmiDevice/AMDSmiProcessor/g' @

Change-Id: Ib71e11d7122699cc62df3c4e9711ce3fc51e6fdf
2023-05-11 10:08:40 -04:00
Charis Poag f44d1ea8bc [SWDEV-387906] Fix rocm-smi initialize crash
Fix was needed due to hwmon updates.
Several voltage sensors (ex. vddgfx/vddnb)
are unsupported or not applicable
to upcoming hardware. This was not the case
for previous hardware sensors, resulting in
the rocm-smi crash observed.

Change-Id: Ib8593e10811638def26fc7a1eda29309e328db09
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-03-17 15:04:34 -05:00
Marko Oblak d1325fcf40 SWDEV-379772 - [Navi32] [SMI-LIB] [Linux] [BM] [Guest] Wrong market name
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: I12d3e650851a3aa474ccbf62628b60d4c385e68c
2023-03-06 17:08:33 +01:00
Marko Oblak 8429df989c SWDEV-371210 - [AMDSMI][LinuxBM] SMILIB returns wrong pcie speed value
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: Ie3ca6997f11d18505df799fef9cd9d53716d53f9
2023-02-28 11:49:20 +01:00
Charis Poag 0d3558945b [SWDEV-335697] Add RSMI_STATUS_SETTING_UNAVAILABLE for dynamic partition
Updates:
    * Added RSMI_STATUS_SETTING_UNAVAILABLE for
      rsmi_dev_compute_partition_set - gives users
      better error output when attempting to set
      compute partition to values not listed in
      available_compute_partition SYSFS
    * Updated python --setcomputepartition to
      provide better output when receiving
      RSMI_STATUS_SETTING_UNAVAILABLE
    * Updated all test & example files to check for
      RSMI_STATUS_SETTING_UNAVAILABLE when doing
      rsmi_dev_compute_partition_set

Change-Id: Ida5d54880d9b9b6e4a0468cdb962fdc0c18d6257
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-02-27 11:17:44 -06:00
Marko Oblak 7eea4e596b SWDEV-384678 - Resolve issue with amdsmi build failure
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: I073113814d2f6740c9eaea1b298d8aff9ea58c72
2023-02-22 11:00:57 +01:00
Marko Oblak 0aadf7eab2 SWDEV-373291 - Added implementation of versioning solution
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: Ifd0be3f81902466339b6c098ce16d5e49740056c
2023-02-21 17:37:54 +01:00
Charis Poag 77c950a4bf [SWDEV-381630] Add reset partition functionality
Updates:
    * Added rsmi_dev_compute_partition_reset & rsmi_dev_nps_mode_reset
    * Added --resetcomputepartition and --resetnpsmode python smi calls
    * Added temp data files rocmsmi_boot_compute_partition_<device num>
      & rocmsmi_boot_nps_mode_partition_<device num>, writes UNKNOWN
      if data cannot be read or device does not support
    * Cleaned up NPS & compute API documentation
    * Added creation and reading of API temp files (used in reset
      functionality)
    * Cleaned up output of rocm_smi_example
    * Updated rocm_smi_example to check if running with sudo permission
      before executing write API calls (cleans up erroneous output)
    * Added template specialization for storing temp data, requires
      specific rsmi_type_t enums (restrics what data can be stored)
    * Added storage of temp data, if temp files do not exist
    * Updated google tests for NPS & compute to include reset API calls

Change-Id: I69895a466b97107617e6dbb355737b84499a76c9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-02-17 12:55:08 -06:00
Charis Poag 9ef376cd61 SWDEV-342812- Add NPS support
Updates:
    * Added rsmi_dev_nps_mode_set and rsmi_dev_nps_mode_get
    * Added ability to set multiple SYSFS files in debug build
    * Added ability to see user's env variables set for debug build
    * Added tests for rsmi_dev_nps_mode_set and rsmi_dev_nps_mode_get
    * Added ability to restart AMD GPU driver, used in nps_mode_set
    * Updated ROCm_SMI_Manual.pdf to include new APIs
    * Added progress bar for long running python_smi_tools, used
      in setting nps_mode if runs longer than .1 seconds

Change-Id: I6d61bedd28d7cba6aff432ad2d127ba741b7d15a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-02-14 11:54:24 -06:00
Dalibor Stanisavljevic 411ef54087 SWDEV-375113 - Fixed process info
The format of the fdinfo file has changed

Change-Id: Iad2e26487e75f3e614e364456e929aa1f6f949a4
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-23 08:13:55 -05:00
Jason Albert 86de0f441f Remove tag values from enum/union/struct declarations
The tag values largely were not used and were causing doxygen
generation issues.
In the few cases where the tags were being referenced, clean up
those compile issues.

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I7b32eac742fb5af560400c13dda2721705d882bc
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-16 13:14:45 +01:00
Charis Poag 4d7f3f2bc7 SWDEV-335697- Add support for dynamic partitioning
Original updates:
    * Added .gitignore to help with future commits
    * Updated/added copyrights on modified or added files
    * Updated rocm_smi.h/.cc
      - Added 3 new SMI API functions:
          rsmi_dev_compute_partition_set &
          rsmi_dev_compute_partition_get
      - Added helpful maps/enums used in
        new get/set compute_partition API calls
    * Updated rocm_smi.py
      - Added --showcomputepartition
      - Added --setcomputepartition
      - Fixed a few mistypes
    * Updated rsmiBindings.py - added helpful class/dict/list
    * Updated rocm_smi_example.cc
      - Added helpful MACRO to detect if api is not supported.
      - Added current_compute_partition set/get rocm lib calls
      - Added helpful macro to discover future RSMI errors
      - Commented out test_set_freq, was having permission issues
        on a Navi21
    * Updated rocm_smi_main.cc
      - Added helpful map to debug API calls, left in for future use
      - Added comment to better understand a non-class function returns
    * Added computepartition_read_write.cc/.h
      - Added get/set compute partition API test calls
      - Confirmed on devices that do not support the API calls, tests pass
    * Updated rocm_smi_test/main.cc
      - Calls new compute partition gtests

Added following updates from review feedback:
   * Updated rocm_smi.h/cc
       - Removed C++ API calls, adding support for both C/C++
         API calls could cause confusion and adds extra work for us
       - rsmi_dev_compute_partition_get -> Fixed an edge case where
         user gives a small buffer length size (smaller than data
         received), but does not receive the partial buffer back.
         google Tests are updated to reflect this find.
   * Updated rocm_smi_example.cc
       - Fixed test_set_freq, issue was that file was not writable.
         We now indicate this warning, so prior errors make sense.
       - General test code cleanup. Removed extra code,
         by creating loops for tests.
   * Updated rocm_smi_main.cc
     - Moved and got rid of an external reference to a map used
       for debugging RSMI enums, now is a const public reference.
   * Updated rocm_smi.py
     - Updated python code to identify NOT_SUPPORTED due to
       (currently) only a few GPU support the feature

Change-Id: I4a567acbb59d6771fb64df08d19175fe3604fd1b
2023-01-13 10:46:40 -05:00
Dalibor Stanisavljevic 943c42f58f SWDEV-374716 - Fixed asic info
Change-Id: I8d806ef09eca4300fcec0ce6a226d13547dfb728
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-11 11:03:17 -05:00
Bill(Shuzhou) Liu ec48312c61 Remove duplicate temperature function
The amdsmi_dev_get_temp_metric() will cover both function:
amdsmi_get_temperature_measure() using AMDSMI_TEMP_CURRENT
and
amdsmi_get_temperature_limit() using AMDSMI_TEMP_CRITICAL
Remove those two function.

It also merge the amdsmi_get_power_limit() into
amdsmi_get_power_measure()

Change-Id: I40d4afeb2ec0ac7b64832729f36adfaae120c990
2023-01-11 08:13:37 -06:00
Bill(Shuzhou) Liu 79bd9c1d5f change sensor_type in amdsmi_dev_get_temp_metric() to enum
The sensor_type in amdsmi_dev_get_temp_metric() will be changed to
amdsmi_temperature_type_t

Change-Id: I72a7f271b0a55a025acc2ca523062a3d51cc036d
2023-01-04 13:01:04 -06:00
Dalibor Stanisavljevic cb013d25ff SWDEV-370502 - Reserved fields in structs
Change-Id: I23aed12baf6b3173eb149eb3b969e55d7e4360ee
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-26 10:51:44 -05:00
Dalibor Stanisavljevic 4c56e9e3d6 SWDEV-371199 - Return NOT_INIT when amdsmi initialization fails
Change-Id: Ifb40aef3a62885b08164e9aa944bf9b5c375ebfd
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-19 16:29:29 +01:00
Bill(Shuzhou) Liu 221d6fdc5c Make amdsmi function name consistent
Some of the amdsmi function have the verb (set or get) at the
end of the function. Move it to the middle to be consistent with
other APIs.

Change-Id: I8053d16f46af951c25aaaf8febf2896a33633fa1
2022-12-16 10:20:49 -06:00
Dalibor Stanisavljevic b4b761d02f SWDEV-370223 - Change the name of the header to amdsmi.h
Change dev to device_handle throughout the file
Change the pcie_info pcie_speed field type to uint32_t
Add AMDSMI prefix before amdsmi_mm_ip enum

Change-Id: I242145389ddc3f2ad05dfd6ca371640f4d118fc4
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-08 13:34:34 +01:00
Jason Albert b4cde9adec Doxygen related cleanup
- Made all doxygen formatting consistent with @ use
- Added @file definition to fix a lot of missed references
- Simplified return definitions for easier maintainability
- Fixed bad formatting and missing section closures

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I02cc55f7d0ae277f318a4620978af096f56cac6c
2022-12-07 10:41:33 -05:00
Jason Albert 3b1584915b Set status codes to fixed values
Assign fixed values to status codes to prevent enum auto assign
from changing them.

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I0ca1de7ba503ce8a75c56026f5a54e212204595b
2022-12-07 10:39:26 -05:00
Dalibor Stanisavljevic 76f6cf7a9d SWDEV-366720 - Changed amdsmi_get_device_handle_from_bdf
Changed implementation and input parameters

Change-Id: Ifca3247132eb4033f99d74617a53f54ad076dad0
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-11-22 10:28:45 -05:00
Dalibor Stanisavljevic 9cad9e5216 SWDEV-361376 - Add README for python tool
- Add up to date README file for python tool

Change-Id: I7a02f79469e990870398b3741b033ea447998fdd
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-11-10 16:57:49 +01:00
Bill(Shuzhou) Liu b34b7451e8 Init the amdsmi using rocm_smi for libdrm
Init the ams_smi using the rocm-smi, which makes the GPU discovery
consistent with or without libdrm.

Change-Id: Ic714781f8ce791451b0c057621525926edb7f5ee
2022-11-07 11:09:09 -06:00
Galantsev, Dmitrii c99e4e1501 Cleanup CMakeLists.txt for packaging
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-11-03 12:44:23 -05:00
Dejan Andjelkovic 6064f160a3 SWDEV-361376 - Add python wrapper
- Add generator for python wrapper
- Add interface, exception and init files
- Add CMake custom targets

Change-Id: I63c1d94fbb587387c22f559a3db79987eb214a2e
Signed-off-by: Dejan Andjelkovic <Dejan.Andjelkovic@amd.com>
2022-10-20 09:24:53 -05:00
Bill(Shuzhou) Liu 2b2d11c446 Change the get_socket_handles and get_device_handles APIs interface
Those two APIs are changed to let the user get the handles count,
allocate memory, and then return handles to the allocated memory.

Change-Id: Ibe28a89ad188c99da6af3af1740b2b25ff22ba06
2022-10-20 09:24:31 -05:00
Dalibor Stanisavljevic 3daf9c1063 SWDEV-353742 - Port smilib function to amdsmi
Change-Id: I99df249755a5c665a8dd1777fa82d046e139bd77
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-10-20 09:24:22 -05:00
Bill(Shuzhou) Liu 0c91ef919d Restructure the folder
Move rocm_smi related function to rocm_smi folder. Move amd_smi to
top level include/ and src/ folder. Remove obsolte oam folder.
Change the CMakeLists.txt to update folder locations.

Change-Id: I52e6be739e49f3b0545865f25364787f5985e9c3
2022-10-20 09:23:51 -05:00
Bill(Shuzhou) Liu f1d02aca79 Port rocm-smi function to amd-smi
Port most rocm-smi function to amd-smi and add unit tests.

Change-Id: I6387a4bdaf20ead2389c99bb01d438156ccd0747
2022-09-06 12:08:59 -04:00
Sreekant Somasekharan aa5cba122c Fix documentation mistake related to get memory overdrive function.
Changes made on rsmi_perf_determinism_mode_set function documentation
as well for styling consistency.

Change-Id: I09ce8139eb9cbda94352ac7725c4c9b9bb06bd59
2022-06-30 08:57:52 -04:00
Sreekant Somasekharan 1432e5e040 Add rsmi lib function to get memory overdrive value
Change-Id: I515b51d5ce4baf966bb31714886a0d72330026bc
2022-06-23 11:42:50 -04:00
Divya Shikre afe996c2ed Update get_frequencies to handle failures.
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
   their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6
2022-05-11 15:33:15 -04:00
Divya Shikre 99be3451d7 Add DEBUG_LOG macro
Add DEBUG_LOG that will optionally print error
message when RSMI_DEBUG_BITFIELD is set to 2.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6017e92d8a9e5f9861ae29ece0488d4bc198f996
2022-05-11 11:03:24 -04:00
Divya Shikre c9b42bff57 Add RSMI_CLK_TYPE_PCIE to rsmi_clk_type_t
showclocks/showclkfrq does not display pp_dpm_pcie values
in sriov. This fix adds pcie clocks to rsmi_clk_type_t
where rest of the clocks are present.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6d129ae412623b369c14456ae9781b2dbceb2139
2022-05-06 09:15:39 -04:00
Ori Messinger 9d6403bb17 ROCm SMI LIB: Add Missing GPU Blocks
This patch adds the following 4 missing GPU blocks to the SMI LIB:
-RSMI_GPU_BLOCK_MMHUB
-RSMI_GPU_BLOCK_PCIE_BIF
-RSMI_GPU_BLOCK_HDP
-RSMI_GPU_BLOCK_XGMI_WAFL

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia1ec6f53e195f4bf7b8f073d6bed4fdb6572e546
2022-05-05 00:44:16 -04:00
Harish Kasiviswanathan 8de6ed2b8d rocm_smi_lib: add stdbool.h needed for C90
'bool' keyword is supported only from C99 onwards. Include stdbool.h
for older compilers

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I09fd5cf6eac20e7185e85a1123bc4826958b2b7c
2021-12-14 15:25:59 -05:00