Commit Graph

19 Commits

Author SHA1 Message Date
Charis Poag 3a4abbd8c0 [SWDEV-422195/SWDEV-440985] GPU metrics 1.6
Changes:
    - Added new GPU metrics:
      1) Violation status' (ex. PVIOL/TVIOL) accumulators
      2) XCP (Graphics Compute Partitions) statistics
      3) pcie other end recovery counter
    - CLI/API/tests changes were made accordingly

Change-Id: I589b9b1f570f25dda12d95bb501feca85da8b3bb
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-27 12:04:21 -05:00
Ryo Ficano 9979be8512 [SWDEV-482963] [Test updates] Add new tests for p0 items - BM v2
Updates:
- Added tests for these API calls:

amdsmi_get_socket_handles
amdsmi_get_processor_type
amdsmi_get_clk_freq
amdsmi_get_gpu_process_info
amdsmi_get_gpu_ras_block_features_enabled
amdsmi_get_gpu_ecc_count
amdsmi_get_gpu_memory_usage
amdsmi_get_gpu_vendor_name
amdsmi_get_utilization_count

- Added amdsmi_init() and amdsmi_shut_down() before and after each test.
- Updated README and removed all pytest references.

Change-Id: Ida0c165a466571b1df36c413161bd95c070f6ff1
Signed-off-by: Ryo Ficano <Ryo.Ficano@amd.com>
2024-09-26 14:08:13 -04:00
Charis Poag 6132074089 Merge rocm-smi/amd-staging into amd-dev 20240119
Change-Id: Ie706473ff92a91b19e95d2d58f64904cad73a89a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-01-19 03:57:00 -05:00
Suma Hegde 597fb00bef esmi: Clone open-source esmi repo as part of build
1. Remove esmi (internal gerrit) repo as git submodule
2. Clone esmi (open-source) repo during cmake using "git clone"
3. Download amd_hsmp.h header file during cmake build

TODO:
We can update the amd_hsmp.h to mainline linux kernel repo after
next Linux kernel release.

Change-Id: I763b5e287e24337c8e9e25f4e421cdb8698b9322
2023-10-16 15:06:02 -04:00
Galantsev, Dmitrii 936719eeb6 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I9c38b4facd472b877d1ad133f3176a023c890955
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-23 16:04:15 -05:00
Charis Poag 98c607c8fd Merge branch 'amd-dev' into change-895251-1
Change-Id: I778bda482973b292d6de1b3f266619cbc852c2f5
2023-07-24 17:23:31 -05:00
Charis Poag 4613e8dec3 Update logging and README for other project usage
Updates:
    * [rocm-smi] Logging now can update files on
      per-project-basis for install/remove
    * [rocm-smi] README now has latest build
      instructions, including test builds
    * [rocm-smi] Updated README to include
      revision dates

Change-Id: Ifb19a6f32ccf6938f47225db53fef88021909264
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-07-20 19:09:11 -05:00
Charis Poag afa174c655 Merge 'rocm-smi/amd-staging' into 'amd-smi/amd-dev'
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id35c9807c45a965c968fb430e3ce4f3c7069c210
2023-07-19 18:46:28 -05:00
Galantsev, Dmitrii b0fe2fbd07 Add .cache to gitignore
Change-Id: Ida03bf1f50704bea44827d7578cd74c1896d4368
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-07-17 15:52:23 -04:00
Galantsev, Dmitrii e6c42c6626 Simplify gitignore
Remove generic gitignore to simplify tracking of generated files

Change-Id: Idf1f9719b2cfd16b31332a3ed87be5943c2c1ce7
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-07-07 11:48:09 -04:00
Jeremy Newton 2d2c73a5e6 Fix python loading of librocm_smi64
The librocm_smi64.so is used for development, while
librocm_smi64.so.MAJOR is used for runtime, thus the python front end
should not be loading the .so binary, but rather the .so.MAJOR binary.

As well, it's good not to hardcode "lib" as some distros will change
this.

rsmiBindings.py is now generated with CMake

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I7cb745f8936fdf10d3ebd6c1e606031f713184ca
2023-07-06 09:52:56 -04:00
Sam Wu c5e06b4040 add configurations for sphinx documentation
Change-Id: I5672348aab0f20d0bfc4dd1efcfecdf4324342d6
2023-05-30 16:08:54 -06:00
Maisam Arif aa70b77ec5 AMDSMI_CLI version 0.0.1
Change-Id: I0b02ddf1cc22753635062475cccadcc235e3a603
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2023-03-20 08:51:35 -05:00
Charis Poag 9ef376cd61 SWDEV-342812- Add NPS support
Updates:
    * Added rsmi_dev_nps_mode_set and rsmi_dev_nps_mode_get
    * Added ability to set multiple SYSFS files in debug build
    * Added ability to see user's env variables set for debug build
    * Added tests for rsmi_dev_nps_mode_set and rsmi_dev_nps_mode_get
    * Added ability to restart AMD GPU driver, used in nps_mode_set
    * Updated ROCm_SMI_Manual.pdf to include new APIs
    * Added progress bar for long running python_smi_tools, used
      in setting nps_mode if runs longer than .1 seconds

Change-Id: I6d61bedd28d7cba6aff432ad2d127ba741b7d15a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-02-14 11:54:24 -06:00
Dalibor Stanisavljevic ed8f865341 Revert "Adjusted folder naming and moved amdsmi_cli into amdsmi project folder"
This reverts commit 3eadf3a216
because build failed

Change-Id: Id9efa22f3e1167e1b1bb235b449aef60256c0e24
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-01-19 15:40:28 +01:00
Maisam Arif 3eadf3a216 Adjusted folder naming and moved amdsmi_cli into amdsmi project folder
Change-Id: I4b7c42161fc92450f496483e5b49c7def6810437
2023-01-18 08:47:38 -06:00
Charis Poag 4d7f3f2bc7 SWDEV-335697- Add support for dynamic partitioning
Original updates:
    * Added .gitignore to help with future commits
    * Updated/added copyrights on modified or added files
    * Updated rocm_smi.h/.cc
      - Added 3 new SMI API functions:
          rsmi_dev_compute_partition_set &
          rsmi_dev_compute_partition_get
      - Added helpful maps/enums used in
        new get/set compute_partition API calls
    * Updated rocm_smi.py
      - Added --showcomputepartition
      - Added --setcomputepartition
      - Fixed a few mistypes
    * Updated rsmiBindings.py - added helpful class/dict/list
    * Updated rocm_smi_example.cc
      - Added helpful MACRO to detect if api is not supported.
      - Added current_compute_partition set/get rocm lib calls
      - Added helpful macro to discover future RSMI errors
      - Commented out test_set_freq, was having permission issues
        on a Navi21
    * Updated rocm_smi_main.cc
      - Added helpful map to debug API calls, left in for future use
      - Added comment to better understand a non-class function returns
    * Added computepartition_read_write.cc/.h
      - Added get/set compute partition API test calls
      - Confirmed on devices that do not support the API calls, tests pass
    * Updated rocm_smi_test/main.cc
      - Calls new compute partition gtests

Added following updates from review feedback:
   * Updated rocm_smi.h/cc
       - Removed C++ API calls, adding support for both C/C++
         API calls could cause confusion and adds extra work for us
       - rsmi_dev_compute_partition_get -> Fixed an edge case where
         user gives a small buffer length size (smaller than data
         received), but does not receive the partial buffer back.
         google Tests are updated to reflect this find.
   * Updated rocm_smi_example.cc
       - Fixed test_set_freq, issue was that file was not writable.
         We now indicate this warning, so prior errors make sense.
       - General test code cleanup. Removed extra code,
         by creating loops for tests.
   * Updated rocm_smi_main.cc
     - Moved and got rid of an external reference to a map used
       for debugging RSMI enums, now is a const public reference.
   * Updated rocm_smi.py
     - Updated python code to identify NOT_SUPPORTED due to
       (currently) only a few GPU support the feature

Change-Id: I4a567acbb59d6771fb64df08d19175fe3604fd1b
2023-01-13 10:46:40 -05:00
Galantsev, Dmitrii aeb0bf5832 CMAKE: Repackage whole project for ROCm 5.5 release
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I932b11a111c8e0db04bd8c5e0c3d1a470e5b2386
2022-11-29 17:04:32 -06:00
Galantsev, Dmitrii c99e4e1501 Cleanup CMakeLists.txt for packaging
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-11-03 12:44:23 -05:00