Commit Graph

301 Commits

Author SHA1 Message Date
Galantsev, Dmitrii df4f5e8bf8 Merge rocmsmi/amd-staging into amd-dev 20231016
Change-Id: I137171162a64af4960d82336cc517c1b34a870f3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-16 14:31:13 -05:00
Charis Poag 6f1afd2678 bdfid fix for partition & xgmi nodes
* Updates:
    - [API] After discovering all amd gpus, we now properly
      map correct bdf (xgmi nodes). Especially important for
      partition changes - aka secondary nodes.
    - [API] While adding new secondary nodes we now have
      better grouping -> due to resorting based on
      kfd properties list & matching to primary uniqueid
    - [API] All secondary nodes are now AddToDeviceList
      with correct bdf (location id), provided by kfd
    - [API] Modified AddToDeviceList(..., uint64_t bdfid):
      providing an optional field - bdfid. This allows working
      around primary pcie cards with xgmi nodes
    - [API] Utils - cpplint minor fixes
    - [Example] Removed all endl references w/ newline, fixed
      spacing, and some incorrect values displaying as hex
      (needed dec representation)
    - [API] kfd node functions - now print full path of file
      for trace logs
    - [Tests] power_read.cc: Added in generic power test to
      confirm guaranteeing specific return values

Change-Id: I143474e8d64c4915a966e789be6bcea4fa7f4472
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-10-13 20:14:39 -05:00
Galantsev, Dmitrii 2a7589a065 TESTS - Skip XGMI test
Change-Id: Idd9f505f36fac4a670e5129f835aa051b5c4c9fa
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-12 21:27:55 -05:00
Galantsev, Dmitrii 6d72d65c48 Merge rocmsmi/amd-staging into amd-dev 20231010
Change-Id: I492562094a004eb78b2cc2b52d14d013d9f97112
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-11 18:58:12 -05:00
Charis Poag 31a1fcce7d Add rsmi_dev_power_get
* Updates:
  - [API] Added rsmi_dev_power_get(uint32_t dv_ind,
                                   uint64_t *power,
                                   RSMI_POWER_TYPE
                                   *type)
          provides generic get to average or
          current power & provides backwards
          compatibility
  - Added a utility function to get MonitorTypes
    (monitor_type_string(type)) &
    RSMI_POWER_TYPE (power_type_string(type))
    strings
  - [Tests] Added rsmi_dev_power_get tests and
    provided better verification of return values for
    all power APIs
  - [Tests] Updated power outputs to show correct
    units
  - [example] Now uses avg, current, and generic
    power functions with type output response

Change-Id: I5ca06ca37fd5f61e100f2835b664d6cdd1ca42e6
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-10-10 00:34:19 -05:00
Charis Poag b251bb0c9f Rename NPS -> memory partition + compute partition node fix
* Updates:
        - rocm_smi_lib + CLI:
          Rename all "NPS mode" -> "memory partition"
          related files/functions/API/CLI to align with correct
          technical naming
        - rocm_smi_main: fixed identifying primary card's unique id
          utilize rsmi_dev_unique_id_get to map which
          KFD nodes belong to it
        - rsmi_dev_*_partition*: now have better logging output
        - compute partition tests:
          Added 20 sec delay for workaround until GPU
          busy is confirmed as the issue
        - CPPLint fixes/formatting
        - [Example] Moved all endl to "\n" for efficiency
        - [Example] Added Edge & Junction temperature examples
        - [Example] Added rsmi_minmax_bandwidth_get() example - WIP

Change-Id: Ida6db6fda7e0ac9d696a34cb15b4746e69d58d51
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-10-06 11:51:09 -04:00
Galantsev, Dmitrii 3d3759061a Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I6037383a3efa777cc281a992fd9aa11d8e9ced28
2023-10-04 19:11:59 -05:00
Galantsev, Dmitrii e962d3b281 TESTS - Don't fail on TestFrequenciesRead
- Return from freq_output function early if clock is unsupported
- Right-align frequencies

Change-Id: I799c9351dac8a5be161bc9243cd3816539728357
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-04 18:24:56 -05:00
Galantsev, Dmitrii 6c8767a69a TESTS - Disable same tests as in rocm-smi
Change-Id: I2587baf8a76e4e3a54880e73941b1d973440e7d3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-03 09:58:58 -04:00
Galantsev, Dmitrii 871fae8b25 Upgrade to CXX-17 gtest-1.14 and cmake-3.14
Change-Id: I3bceb90f79235a9c0616c5d7ef9e37e458ffdce6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-29 13:18:48 -04:00
Galantsev, Dmitrii cf6bcbbb27 Upgrade to CXX-17 gtest-1.14 and cmake-3.14
Also change the TARGET from amd_smi_libraries to rocm_smi_libraries
This helps reduce confusion between rocm-smi and amd-smi

Change-Id: Ie54cedd831ba24bd9afc341ad15b7e8e20732059
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-28 12:44:51 -05:00
Galantsev, Dmitrii 31cc2eecfb Merge remote-tracking branch 'rocmsmi/amd-staging' into HEAD
Change-Id: I0661926c10eef2bc32b83d9a63a3a6eb6991e781
2023-09-25 04:35:53 -05:00
Charis Poag f078375350 Add Current (Instant) Socket Power
* Updates:
    - rocm_smi_logger:
      General cleanup &
      Aligned to cpplint rules for usage
    - rocm_smi_monitor:
      Fixed MonitorTypes
      from not displaying properly in logs
      & Added socket power label + current
      socket power MonitorTypes
    - rocm_smi API:
      Added rsmi_dev_current_socket_power_get API
    - rocm_smi CLI:
      General cleanup,
      Concise info now displays device data
      in variable width (see printLogSpacer's
      new field),
      printLogSpacer now as an adjustable
      variable that overrides appWidth,
      Added Socket Power to base rocm-smi +
      --showpower CLI calls,
      --showpower & base rocm-smi CLI defaults
      to printing socket power (if not available,
      displays average power)
    - Cleaned up temp label references
    - power_read gtests:
      Added current socket power to testing

Change-Id: Ica57e6f98ad96e2584e7c7955e188f68d2dab89d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-09-25 01:38:54 -04:00
Ori Messinger d44a6ef523 ROCm SMI LIB: Add Missing Firmware Blocks
The purpose of this patch is to add the following missing firmware
blocks to the SMI LIB:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5d4d37d883878dd02ef8533d4eb8891d54d70630
2023-09-25 01:37:38 -04:00
Galantsev, Dmitrii 0c662611e9 SWDEV-423672 - Always compile and install gtest
This commit makes sure GTest is always compiled with rocm_smi_lib_tests.

GTest installation was inconsistent outside of AMD CI environment.
libgtest.so wouldn't get installed with rocm_smi_lib_tests if gtest
existed on the build machine. Which is undesirable when packaging.

Change-Id: I607df6c67c81480e3b6487b28f14924e8bf56ad4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-23 21:10:12 -04:00
Galantsev, Dmitrii 5c41319c83 Merge remote-tracking branch 'rocmsmi/amd-staging' into HEAD
Change-Id: I65ed7f3a0d1b6e58bc8377932d7c39db21d1b422
2023-09-21 23:43:20 -05:00
Oliveira, Daniel e0483f2ee2 rocm_smi_lib: Fix [linux BM] [AMDSMI] Memory Bandwidth
Implements APIs for 'gpu_metrics_v1_3' utilization averages

Code changes related to the following:
  * rsmi_dev_activity_metric_get()
  * rsmi_dev_activity_avg_mm_get()
  * CLI shows "Avg.Memory Bandwidth" under "--showmemuse"

Change-Id: I8e4600f350a7c18499abf022534db2b875f09d5f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-09-21 11:00:29 -04:00
Galantsev, Dmitrii 5c574ac79c TESTS - Check power and frequency support
It is not guaranteed that power can be read or set for some GPUs
(MI300). It is also not guaranteed that frequencies can be set.

As this is not a tool issue - we simply skip the failing test.

Change-Id: I134e96a476040cef513cd924f00e30cd6dea42a5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-14 22:19:33 -04:00
Maisam Arif d2ef113457 SWDEV-412847 - Changed junction to hotspot
Change-Id: I7f6c1a0a77e6a09d2a3e831463cf03e35266bf40
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-09-14 17:43:26 -05:00
Galantsev, Dmitrii a4b470fe71 Add errors for existing but empty dev files
Change-Id: Iad9febc50f9b8e6085f8b605249ee884d2f134d6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-14 17:30:03 -04:00
Galantsev, Dmitrii d9381b6dae Fix misspelling averge -> average
Change-Id: I3546348560acadb1e775e10ad24115de4ccfc800
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-13 19:49:46 -05:00
Galantsev, Dmitrii ff992e9b56 TESTS - re-enable frequency tests on aqua_vanjaram
Change-Id: I8fcd9418da5b973897ccfffc7d8a2f3ea833ea77
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-11 19:43:25 -05:00
Charis Poag ed6777a8e7 Add GPU partition nodes
* Updates:
    - Fixed infinit loop on systems
      which did not have VRAM files
    - Fixed concise info from throwing exception
      with no amdgpu driver loaded
    - Fix for ability to see all nodes when
      after switching partitions (mirrors
      original card display/settings)
    - Added to logs build type, lib path,
      and set env. variables

Change-Id: Ic0333df355144ce2242cecea93fe4ce51caf311c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-09-07 22:17:54 -05:00
Galantsev, Dmitrii 4aef767596 Cleanup rocm_smi.cc
Change-Id: Ia676c237222b0dd5d9e8a054a93776f3b11e2225
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-07 15:50:40 -04:00
Bill(Shuzhou) Liu b52034fed8 Add API for the memory type
Get the memory type from libdrm and add a new API.

Change-Id: I89327bca2ef860f2e3f4f6ca20def2331eba66c0
2023-09-07 13:05:58 -05:00
Bill(Shuzhou) Liu 9021ef96dc Support PCIe vendor name
Add the support for PCIe vendor name.

Change-Id: Ibc1d289a08731e4c5a14f992f3b0d31b51482396
2023-08-28 16:46:43 -05:00
Galantsev, Dmitrii 14190c5a94 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I7a35220a2283b92c5b4825ee99d6693401ef8e1e
2023-08-28 16:01:19 -05:00
Galantsev, Dmitrii 84e90e55d5 TESTS - Add 90402 and simplify description
Change-Id: Ie6ab12d4201841fcb832d6827a5ec0ae5bb65114
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-25 14:01:53 -05:00
Bill(Shuzhou) Liu 471fbfddc1 Numa affinity shows large number
Change the affinity from unsigned int to integer to represent -1.

Change-Id: I82dc6f476b45fa4ec03a3c686fe8e6e2b7761b56
2023-08-25 09:01:08 -04:00
Galantsev, Dmitrii 936719eeb6 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I9c38b4facd472b877d1ad133f3176a023c890955
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-23 16:04:15 -05:00
Galantsev, Dmitrii 613bd8ad1d TESTS - Fix incorrect TestVoltCurvRead assert if not supported
Change-Id: I2242aa9be84543276c63f1f57fdc489754c9ee07
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-22 16:51:42 -04:00
Galantsev, Dmitrii 62f01cb150 TESTS - Use gpu version as a workaround for a missing name
Depends-On: Ifbd38f11fbde7ba28af4be1d611310dea1b5112a
Change-Id: Ia7b7975f03424854df0a470b2719cf2ff2cf8e40
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-21 19:18:22 -04:00
Maisam Arif ca59a60a9a Updated Versioning
corrected to amd-smi version from rocm-smi version
	Added newline characters in the gpu choices
	Updated cli versioning to 23.2.1.0 to match amd-smi

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ia6db3a281e2349e05a09209bdcfdfa5ac48e3a86
2023-08-01 14:28:27 -04:00
Charis Poag 98c607c8fd Merge branch 'amd-dev' into change-895251-1
Change-Id: I778bda482973b292d6de1b3f266619cbc852c2f5
2023-07-24 17:23:31 -05:00
Charis Poag afa174c655 Merge 'rocm-smi/amd-staging' into 'amd-smi/amd-dev'
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id35c9807c45a965c968fb430e3ce4f3c7069c210
2023-07-19 18:46:28 -05:00
Oliveira, Daniel 573620f586 Add revision to --showhw
Code changes related to the following:
  * Added 'rsmi_dev_revision_get()' related code
  * Test code
  * Functional tests

Change-Id: I8c2097c65384a028c8c8437b717d05d52fe45250
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-07-18 16:17:33 -05:00
Galantsev, Dmitrii 8fe848d10e Fix sys and id tests
The following read tests were failing:
*.TestIdInfoRead
*.TestSysInfoRead

1. *.TestIdInfoRead failed because rsmi_dev_brand_get did not specify
   dependency on vbios_version.

2. *.TestSysInfoRead failed because the test didn't expect vbios_version to
   be missing. Which is a new behavior in Aqua Vanjaram.

Change-Id: I9ee88a12fcf6cff2032049e2ecdfb2957efb03ab
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-07-17 15:52:23 -04:00
Bill(Shuzhou) Liu 4307330cb0 Fix unit test errors
Add unit test error handling for set freq and volt.

Change-Id: I5877f8300b942caac8f38e6efc03264bfc432def
2023-07-12 09:39:39 -04:00
Bill(Shuzhou) Liu 9e2fcd0e40 Fix fan write unit test failure
Even if fan speed can be read, sometimes the set is not supported.

Change-Id: I8584e6fe170c34144800af78d76f04234def11c8
2023-06-29 07:58:23 -05:00
Galantsev, Dmitrii 82078565e9 SWDEV-406542 - Add gtest to install targets
Change-Id: I116505aaa33109fce66ab8daf9921e2de11a27d4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-20 11:14:56 -05:00
Galantsev, Dmitrii 9519d5b8cf SWDEV-391041 - Disable TestPowerReadWrite
Change-Id: I56b5bea3e5206a6f0d5ecdb482103881f80f0b8b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-16 15:18:27 -04:00
Galantsev, Dmitrii e7585cc045 Assign tests to aqua_vanjaram
Change-Id: Iee78b1e810356327261006087b081e39dab0b9e8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-16 15:18:27 -04:00
Bill(Shuzhou) Liu d9b6af7a09 Expand showpids to provide more details
Provide details of GPU usage by an application.

Change-Id: I0f36df7d358754c2c8a60432b736d98f667ee99c
2023-06-16 08:52:18 -04:00
Galantsev, Dmitrii 0478d53e23 SWDEV-340919 - Package rsmitst
Similar to I879b21428e6642f19fda67092b365d8b78b7ba7b.

Main CMake improvements:

* Add rsmitst with -DBUILD_TESTS=ON
* Package tests into rocm-smi-lib-tests.deb and .rpm
* Note - this breaks build_rsmitst.sh

Misc improvements:

* Add .editorconfig to normalize code formatting
* Export compile_commands.json
* Remove gtest source and pull from github instead

Change-Id: Ib87ed4a5acd9f78badae6d028e5ff3d4f56dafc2
Depends-On: I8b26795471ad1432c805e45d8b58d7bb34abfcfc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-13 22:52:10 -05:00
Galantsev, Dmitrii ac94bf5ed5 Temporarily ignore TestFrequencies
See SWDEV-391039 and SWDEV-391040 for details

Change-Id: I662ba43363d949465454ea4af4d4586b3d47a811
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-12 19:26:21 -05:00
Maisam Arif 9cebc93cee Cleaned up APIs
Change-Id: I93487e01d7126bdfa77439b571df927a6af3bb70
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2023-06-07 10:48:37 -04:00
Galantsev, Dmitrii f78f9a4082 Fix test temp blacklist, ignore TestVoltCurvRead
Change-Id: I86fa14fdc06e1b170a0bc0c0727fc08e4f4e2074
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-06-06 17:02:14 -04:00
Bill(Shuzhou) Liu 62ce965409 Clean up the APIs
Remove and rename APIs after review.

Change-Id: I5464f200eb605b366673f8abca95183c3837843b
2023-05-30 16:08:54 -04:00
Dalibor Stanisavljevic 1bc1d431d8 SWDEV-384793 - Clean up API
Change-Id: I441b315d32df59a454e06d521e5ca8b2c229451a
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2023-05-19 16:40:26 +02:00
Bill(Shuzhou) Liu dc4ba12e00 Return NOT_SUPPORT for set function in VM guest
Fix the unit tests which are fail in VM guest environment.

Change-Id: Id7c58887692bbdecba54f5d2d8463b292e19b4ad
2023-05-11 10:42:55 -05:00