Commit Graph

1558 Commits

Author SHA1 Message Date
Galantsev, Dmitrii 6b1a7ce27a Merge amd-dev into amd-master 20230120
Change-Id: I846879aa2614c45250fa34ef76aead0b08b4e9d5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-20 10:38:32 -06:00
Galantsev, Dmitrii 6ee793ca03 SWDEV-340919 - Move examples and tests install dir
Previous install locations:
- /opt/rocm/share/example/amd-smi
- /opt/rocm/share/tests/amd-smi

New install locations:
- /opt/rocm/share/amd_smi/example
- /opt/rocm/share/amd_smi/tests

Change-Id: I305477b9f66bdc5963923efe6da1c01f87ea2085
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-20 09:49:56 -06:00
Maisam Arif 6aa91da74c Revert "Added AMD-SMI Linux Baremetal"
This reverts commit 013400bee7.

Reason for revert: Branch is still WIP

Change-Id: I75eec813b3d81049f033fe0a534251bd69eeca0e
2023-01-19 11:45:20 -05:00
Dalibor Stanisavljevic ed8f865341 Revert "Adjusted folder naming and moved amdsmi_cli into amdsmi project folder"
This reverts commit 3eadf3a216
because build failed

Change-Id: Id9efa22f3e1167e1b1bb235b449aef60256c0e24
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-19 15:40:28 +01:00
Dalibor Stanisavljevic bf79fe4323 SWDEV-378294 - Fixed failing tests
Change-Id: Ie0f9dedd6901e05b1a5ca7846624c127d92ed67f
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-18 10:28:32 -05:00
Maisam Arif 3eadf3a216 Adjusted folder naming and moved amdsmi_cli into amdsmi project folder
Change-Id: I4b7c42161fc92450f496483e5b49c7def6810437
2023-01-18 08:47:38 -06:00
Maisam Arif 013400bee7 Added AMD-SMI Linux Baremetal
Change-Id: I39ec76f4e4a8ca32eba10f4541585b2284e71539
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2023-01-18 03:32:52 -06:00
Hao Zhou 272a274313 Merge amd-staging into amd-master 20230118
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I4ce0f22e2d1fc3f347ab160d04a16a9c84afe0ab
2023-01-18 10:33:38 +08:00
Elena Sakhnovitch 2b449fe58d Measure api execution time
Add new test to measure api execution time.

Change-Id: I0ad10c822bad4a2ae04b5785173b4ff21996021d
2023-01-16 17:00:36 -05:00
Bill(Shuzhou) Liu 99034af009 Add missing string header for memcpy
Fix compile error: ‘memcpy’ was not declared

Change-Id: I54d1849a3a18901baac1e24986b82067eb2fd6b4
2023-01-16 12:11:10 -05:00
Jason Albert 86de0f441f Remove tag values from enum/union/struct declarations
The tag values largely were not used and were causing doxygen
generation issues.
In the few cases where the tags were being referenced, clean up
those compile issues.

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I7b32eac742fb5af560400c13dda2721705d882bc
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-16 13:14:45 +01:00
Dalibor Stanisavljevic bbcbe896ea SWDEV-375113 - Updated python wrapper
Change-Id: I779cd5d7ff3f3ca231d1fd90dcedcc070540e6e3
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-16 12:24:38 +01:00
Dalibor Stanisavljevic 49aad0f898 SWDEV-375098 - Added check if driver sysfs node exists
Change-Id: I2524f96e5447fd3a34aa16efe3dfc271b7df62b9
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-16 10:58:27 +01:00
Charis Poag 4d7f3f2bc7 SWDEV-335697- Add support for dynamic partitioning
Original updates:
    * Added .gitignore to help with future commits
    * Updated/added copyrights on modified or added files
    * Updated rocm_smi.h/.cc
      - Added 3 new SMI API functions:
          rsmi_dev_compute_partition_set &
          rsmi_dev_compute_partition_get
      - Added helpful maps/enums used in
        new get/set compute_partition API calls
    * Updated rocm_smi.py
      - Added --showcomputepartition
      - Added --setcomputepartition
      - Fixed a few mistypes
    * Updated rsmiBindings.py - added helpful class/dict/list
    * Updated rocm_smi_example.cc
      - Added helpful MACRO to detect if api is not supported.
      - Added current_compute_partition set/get rocm lib calls
      - Added helpful macro to discover future RSMI errors
      - Commented out test_set_freq, was having permission issues
        on a Navi21
    * Updated rocm_smi_main.cc
      - Added helpful map to debug API calls, left in for future use
      - Added comment to better understand a non-class function returns
    * Added computepartition_read_write.cc/.h
      - Added get/set compute partition API test calls
      - Confirmed on devices that do not support the API calls, tests pass
    * Updated rocm_smi_test/main.cc
      - Calls new compute partition gtests

Added following updates from review feedback:
   * Updated rocm_smi.h/cc
       - Removed C++ API calls, adding support for both C/C++
         API calls could cause confusion and adds extra work for us
       - rsmi_dev_compute_partition_get -> Fixed an edge case where
         user gives a small buffer length size (smaller than data
         received), but does not receive the partial buffer back.
         google Tests are updated to reflect this find.
   * Updated rocm_smi_example.cc
       - Fixed test_set_freq, issue was that file was not writable.
         We now indicate this warning, so prior errors make sense.
       - General test code cleanup. Removed extra code,
         by creating loops for tests.
   * Updated rocm_smi_main.cc
     - Moved and got rid of an external reference to a map used
       for debugging RSMI enums, now is a const public reference.
   * Updated rocm_smi.py
     - Updated python code to identify NOT_SUPPORTED due to
       (currently) only a few GPU support the feature

Change-Id: I4a567acbb59d6771fb64df08d19175fe3604fd1b
2023-01-13 10:46:40 -05:00
Bill(Shuzhou) Liu f19da1bb2c Crash when fails to open sysfs file
When it fails to open sysfs file, it may crash. Modify the condition
to check the file descriptor after open the file.

Change-Id: I2acdc55f8194a2d734db20d16e1660a20ba09574
2023-01-13 08:15:58 -06:00
Hao Zhou 6cb72bda67 Merge amd-staging into amd-master 20230113
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: If4c1ada5b8ef50c8cb117efa71d004057d9311cb
2023-01-13 09:42:52 +08:00
Dalibor Stanisavljevic 4caded6dc4 SWDEV-376644 - Renamed usage to engine_usage
Change-Id: Icaac74800e30c1769a491ef190359490aba757b7
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-12 15:58:29 +01:00
Dalibor Stanisavljevic 943c42f58f SWDEV-374716 - Fixed asic info
Change-Id: I8d806ef09eca4300fcec0ce6a226d13547dfb728
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-11 11:03:17 -05:00
Bill(Shuzhou) Liu ec48312c61 Remove duplicate temperature function
The amdsmi_dev_get_temp_metric() will cover both function:
amdsmi_get_temperature_measure() using AMDSMI_TEMP_CURRENT
and
amdsmi_get_temperature_limit() using AMDSMI_TEMP_CRITICAL
Remove those two function.

It also merge the amdsmi_get_power_limit() into
amdsmi_get_power_measure()

Change-Id: I40d4afeb2ec0ac7b64832729f36adfaae120c990
2023-01-11 08:13:37 -06:00
Dalibor Stanisavljevic e217fff82c SWDEV-375181 - Fixed amdsmi_get_fw_info python output
Change-Id: I4bf4bf49cd921d52849e1bb140e464e2756b07c5
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-11 11:17:49 +01:00
Hao Zhou f10fb9e99e Merge amd-staging into amd-master 20230106
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I9f9a4caee65d3c11ea81193446a823497182a3db
2023-01-06 13:16:56 +08:00
Ori Messinger 5c478e9eb9 ROCm SMI CLI: Fix --showproductname bugs
This patch fixes a couple of --showproductname bugs, both of which
are related to the device's SKU.
Previously if a device with a non-standard VBIOS name was used,
fetching that device's SKU wasn't working correctly.

A standard VBIOS name should follow the following pattern:
AAA-BBBBBB-CCC
Where the middle section "BBBBBB" between the hypens is the SKU.

Now, SKU can be correctly fetched even with a non-standard VBIOS
name, and return 'unkown' if SKU does not exist.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5899a859c6131c6048bb31a4305ddacbac3075a9
2023-01-05 11:53:04 -05:00
Bill(Shuzhou) Liu 79bd9c1d5f change sensor_type in amdsmi_dev_get_temp_metric() to enum
The sensor_type in amdsmi_dev_get_temp_metric() will be changed to
amdsmi_temperature_type_t

Change-Id: I72a7f271b0a55a025acc2ca523062a3d51cc036d
2023-01-04 13:01:04 -06:00
Galantsev, Dmitrii 2184d0c3d7 SWDEV-374138 - Improve ASAN flags
Tests overwritten the linker flags resulting in failed build with clang.
This change improves ASAN linker flag assignment and fixes test issue.

Change-Id: I88f38360d46b20f6cc7298ad0d1fd09ff6ce47d6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-01-03 16:30:48 -06:00
Hao Zhou 400baca011 Merge amd-staging into amd-master 20221230
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I1548ada3cb8b63e9261115ebc7ee402664293298
2022-12-30 10:36:34 +08:00
Dalibor Stanisavljevic 9d345c5797 SWDEV-375271 - Renamed AmdSmiClockType to AmdSmiClkType
Change-Id: I6af34f7c4701584357ae5ec1315fbc425f2a9f82
2022-12-28 12:55:15 +01:00
Dalibor Stanisavljevic 0f7c440d95 SWDEV-373280 - Updated python wrapper
Use ctypes to import types instead of via amdsmi_wrapper

Change-Id: I217e90b74aafdd39eaab5f50edfda80e0bf91cce
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-28 10:21:54 +01:00
Dalibor Stanisavljevic 36eb2145d9 SWDEV-373280 - Added new generator
Change-Id: I82e3663f8118f17dc6a223a79cadd95634329356
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-27 10:14:09 +01:00
Dalibor Stanisavljevic cb013d25ff SWDEV-370502 - Reserved fields in structs
Change-Id: I23aed12baf6b3173eb149eb3b969e55d7e4360ee
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-26 10:51:44 -05:00
Hao Zhou 4901ac954a Merge amd-staging into amd-master 20221226
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I0df8148f884113f53e7ccb4b9de0361c2663f7d5
2022-12-26 11:52:33 +08:00
Dalibor Stanisavljevic 0bdd45e935 SWDEV-371492 - Updated python wrapper
Change-Id: I04f9f825ecdbe06de9ca95cf19e3f5bca972ec95
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-23 14:09:50 +01:00
Dalibor Stanisavljevic e22e72d4c3 SWDEV-371492 - Added check that device_handle is valid
Change-Id: Ic1b593fd5f781650528c860c372fa9864624255d
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-22 12:57:42 +01:00
Dalibor Stanisavljevic 4c56e9e3d6 SWDEV-371199 - Return NOT_INIT when amdsmi initialization fails
Change-Id: Ifb40aef3a62885b08164e9aa944bf9b5c375ebfd
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-19 16:29:29 +01:00
Bill(Shuzhou) Liu 221d6fdc5c Make amdsmi function name consistent
Some of the amdsmi function have the verb (set or get) at the
end of the function. Move it to the middle to be consistent with
other APIs.

Change-Id: I8053d16f46af951c25aaaf8febf2896a33633fa1
2022-12-16 10:20:49 -06:00
Ori Messinger 932feb6e49 ROCm SMI CLI: Add --showtempgraph Feature
The purpose of this patch is to add a new feature to the smi cli.
Use ./rocm-smi --showtempgraph to print a persistant bar graph for
each GPU's temperature.

The bar graphs refresh continuously to show current temps, and the
graphs change in a color gradient depending on the temperature.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I98902b76c42cc7281420759f5ebe8c78f7785e66
2022-12-15 18:20:32 -05:00
Hao Zhou 28d04a8f52 Merge amd-staging into amd-master 20221215
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I646d74c4a1185b6fa68659f63685a712618761a7
2022-12-15 09:45:26 +08:00
Dalibor Stanisavljevic a80bbd308c SWDEV-371565 - Fixed retrieval FW versions in python and example
Change-Id: I4e512584a50342dcd4f9c93f523112fb4b5099dd
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-14 13:06:41 +01:00
Bill(Shuzhou) Liu 552a7403bc SWDEV-373189: build error with g++ v12.1.0
Fix the g++ error: ‘memset’ was not declared in this scope

Change-Id: I6231f863801f84a5a8c46543c87499058f2ef381
2022-12-13 08:33:12 -06:00
Galantsev, Dmitrii 5db4424549 Merge branch 'amd-dev' into amd-master 20221212
Change-Id: I3dbaaae0e157487afbab1efb96d6f854c1249125
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-12-12 20:12:57 -06:00
Galantsev, Dmitrii a255393b5c SWDEV-372949 - Resolve ASAN failure
Change-Id: I622ba5e8fc4d30d98dae365a67a0b0e99ffae3a5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-12-12 15:49:55 -06:00
Galantsev, Dmitrii 0c52236abd CMAKE: Resolve lib dependencies for tests
amdsmitst was failing and not finding libgtest and libamd_smi.

This change resolves the issue by

1. Installing gtest into tests directory
2. Modifying RUNPATH variable to point to libamd_smi.so

Change-Id: I126d01c88116d37c5f2b55b9ecb2c9f1313f26fe
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-12-12 14:37:57 -06:00
David Ceranic ab928f3be5 LWPVATS-4489 - [AMDSMI][LinuxBM] Implement smi-tool for calling rocm APIs using amdsmi wrapper
Signed-off-by: David Ceranic <David.Ceranic@amd.com>
Change-Id: I15900a6686e672291b2c0f9d54fd0b5b7e35e589
2022-12-09 16:16:41 +01:00
Dalibor Stanisavljevic 238f885e14 SWDEV-371561 - Fixed vbios version string value
Change-Id: Ide06784200084741e6cde606492bf03a760b9601
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-09 15:19:12 +01:00
kent.russell@amd.com 248c6f79f4 rocm_smi.py: Fix order of CE and UE reporting
We append CE then UE, but in the table right after, it goes UE then CE.
Fix the order of the table, and add capitals for consistency

Change-Id: I208f37685508ab1e2ff83d3456620bbbf3a16268
2022-12-08 12:28:37 -05:00
Dalibor Stanisavljevic a2a38a5aa2 SWDEV-371210 - Fixed pcie link speed
Change-Id: I736d8095c05ee0685db0c209ea0fdb5832e14744
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-08 12:03:50 -05:00
Dalibor Stanisavljevic b93baf686d SWDEV-371191 - Fixed amdsmi_get_bad_page_info
Change-Id: I97134f548164eff588d9caa9b9f31c4361c78804
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-08 12:03:34 -05:00
Dalibor Stanisavljevic b4b761d02f SWDEV-370223 - Change the name of the header to amdsmi.h
Change dev to device_handle throughout the file
Change the pcie_info pcie_speed field type to uint32_t
Add AMDSMI prefix before amdsmi_mm_ip enum

Change-Id: I242145389ddc3f2ad05dfd6ca371640f4d118fc4
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-08 13:34:34 +01:00
Jason Albert b4cde9adec Doxygen related cleanup
- Made all doxygen formatting consistent with @ use
- Added @file definition to fix a lot of missed references
- Simplified return definitions for easier maintainability
- Fixed bad formatting and missing section closures

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I02cc55f7d0ae277f318a4620978af096f56cac6c
2022-12-07 10:41:33 -05:00
Jason Albert 3b1584915b Set status codes to fixed values
Assign fixed values to status codes to prevent enum auto assign
from changing them.

Signed-off-by: Jason Albert <jason.albert@amd.com>
Change-Id: I0ca1de7ba503ce8a75c56026f5a54e212204595b
2022-12-07 10:39:26 -05:00
Bill(Shuzhou) Liu 8d347bb6c4 Return errors when set clock range
Instead of using assert to abort the application, the fix will
return the error code if the input parameters is incorrect.

Change-Id: I00861ddf1198386fb322ea06232a7178fb5ef4bd
2022-12-06 12:39:43 -06:00