Commit graph

295 Commits

Autor SHA1 Nachricht Datum
Bill(Shuzhou) Liu 1a233f93fb APIs for the cache level and size
Read the cache level and size from topoogy sysfs file.

Change-Id: Id3c558c95bcb79139a19e4adbaa7ff333d06098f
2023-10-05 11:10:54 -05:00
Maisam Arif 572bf563d1 Added driver_name to amdsmi_cli tool
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I8f3d52e0b23298443b2b16afec418cbbbc5f77e0
2023-10-04 08:54:19 -04:00
Maisam Arif fadf1b6cc9 SWDEV-410230 - Added slot_type to amd-smi static --bus
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I2006a3525a8aa9091bf54501461d364f7237f00f
2023-10-02 10:15:34 -04:00
Bill(Shuzhou) Liu 9eccf20f0c Get PCIe slot type
Add API to get the PCIe slot type.

Change-Id: If6894af53894c524d61c7586c59768541bbf0ac6
2023-09-27 23:31:09 -04:00
Maisam Arif 95337c88fc Added sleep state to amd-smi metric --clock
Change-Id: Idb5fbc84a787ef1affdf0449b6dd77ab6e50e91d
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-09-26 15:21:25 -05:00
Galantsev, Dmitrii 21dcf6d66c SWDEV-423796 - Resolve stack smashing issue
Inconsistency between struct fields caused stack smashing

Change-Id: Ib06d67723e062d4306420854ba7ab45fb252ffe3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-25 11:24:55 -05:00
Galantsev, Dmitrii 31cc2eecfb Merge remote-tracking branch 'rocmsmi/amd-staging' into HEAD
Change-Id: I0661926c10eef2bc32b83d9a63a3a6eb6991e781
2023-09-25 04:35:53 -05:00
Maisam Arif 25b055014d Updated tool & lib versions & README.md
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic41a36bcfa988ce9c8304157593012752857e919
2023-09-25 02:02:22 -05:00
Charis Poag f078375350 Add Current (Instant) Socket Power
* Updates:
    - rocm_smi_logger:
      General cleanup &
      Aligned to cpplint rules for usage
    - rocm_smi_monitor:
      Fixed MonitorTypes
      from not displaying properly in logs
      & Added socket power label + current
      socket power MonitorTypes
    - rocm_smi API:
      Added rsmi_dev_current_socket_power_get API
    - rocm_smi CLI:
      General cleanup,
      Concise info now displays device data
      in variable width (see printLogSpacer's
      new field),
      printLogSpacer now as an adjustable
      variable that overrides appWidth,
      Added Socket Power to base rocm-smi +
      --showpower CLI calls,
      --showpower & base rocm-smi CLI defaults
      to printing socket power (if not available,
      displays average power)
    - Cleaned up temp label references
    - power_read gtests:
      Added current socket power to testing

Change-Id: Ica57e6f98ad96e2584e7c7955e188f68d2dab89d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-09-25 01:38:54 -04:00
Galantsev, Dmitrii 3d40c4bb2c SWDEV-422836 - Add sleep frequency support
Change-Id: I0bde403b010bf036ce44ed0600cc7eb03742c6b6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-25 01:38:27 -04:00
Ori Messinger d44a6ef523 ROCm SMI LIB: Add Missing Firmware Blocks
The purpose of this patch is to add the following missing firmware
blocks to the SMI LIB:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5d4d37d883878dd02ef8533d4eb8891d54d70630
2023-09-25 01:37:38 -04:00
Galantsev, Dmitrii 2589d677b0 actvity -> activity
Change-Id: Ie31d9faca2181cb2d47f7f4764b64ed8cc7f8007
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-22 11:45:21 -05:00
Maisam Arif e4fac177c1 SWDEV-417124 - Implement Power Management
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ib0d37038e49cec61d5415076a46a5666d95dcea2
2023-09-21 14:23:26 -05:00
Oliveira, Daniel e0483f2ee2 rocm_smi_lib: Fix [linux BM] [AMDSMI] Memory Bandwidth
Implements APIs for 'gpu_metrics_v1_3' utilization averages

Code changes related to the following:
  * rsmi_dev_activity_metric_get()
  * rsmi_dev_activity_avg_mm_get()
  * CLI shows "Avg.Memory Bandwidth" under "--showmemuse"

Change-Id: I8e4600f350a7c18499abf022534db2b875f09d5f
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-09-21 11:00:29 -04:00
Bill(Shuzhou) Liu f86f62b3f7 Return the driver loading status
When init the library, it could return status whether the driver is
loaded or not.

Change-Id: Id26b8058e32881ebe2514067a639a2a871d1f252
2023-09-18 08:38:16 -05:00
Maisam Arif 42b030def3 Spell check bandwith to bandwidth
Change-Id: Icfb3b2398fe0590dbab6e531c8ec1cdceebe658d
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-09-14 18:43:49 -04:00
Maisam Arif d2ef113457 SWDEV-412847 - Changed junction to hotspot
Change-Id: I7f6c1a0a77e6a09d2a3e831463cf03e35266bf40
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-09-14 17:43:26 -05:00
Shuzhou Liu ab615f6b2a Merge "Add API for the memory type" into amd-dev 2023-09-12 09:34:03 -04:00
Charis Poag ed6777a8e7 Add GPU partition nodes
* Updates:
    - Fixed infinit loop on systems
      which did not have VRAM files
    - Fixed concise info from throwing exception
      with no amdgpu driver loaded
    - Fix for ability to see all nodes when
      after switching partitions (mirrors
      original card display/settings)
    - Added to logs build type, lib path,
      and set env. variables

Change-Id: Ic0333df355144ce2242cecea93fe4ce51caf311c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-09-07 22:17:54 -05:00
Galantsev, Dmitrii 4aef767596 Cleanup rocm_smi.cc
Change-Id: Ia676c237222b0dd5d9e8a054a93776f3b11e2225
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-07 15:50:40 -04:00
Bill(Shuzhou) Liu b52034fed8 Add API for the memory type
Get the memory type from libdrm and add a new API.

Change-Id: I89327bca2ef860f2e3f4f6ca20def2331eba66c0
2023-09-07 13:05:58 -05:00
Bill(Shuzhou) Liu fab0542ab1 Fix doxygen warning messages
The Doxygen will enable warning as error message.

Change-Id: Ie7a7c9a823388c4140f31489604d65ec43005772
2023-09-07 08:48:38 -04:00
Deepak Mewar 14cf5f2762 Updated esmi error checking for graceful return
Change-Id: I1bcd498e3482dc7acd92b1a762f892b3dd978ff2
2023-09-04 08:27:12 -04:00
Dmitrii Galantsev f96c7663b5 Merge "Update amdsmi_wrapper.py and name fields" into amd-dev 2023-08-30 17:30:38 -04:00
Galantsev, Dmitrii 03cfdeefd5 Update amdsmi_wrapper.py and name fields
When updating the wrapper I ran into an issue with anonymous structs.
Generated wrapper would contain a string split into multiple lines,
which is invalid python.

e.g.
    'struct_struct anonymous
    (struct.... amdsmi.h:355)'

After naming the structs - the issue is gone. BDF union now has to be
addressed with .fields

e.g.
    OLD: bdf.function_number
    NEW: bdf.fields.function_number

Change-Id: Ib3c640c088ad0cc67893d636827356902051f17f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-30 16:30:03 -05:00
Shuzhou Liu fc5b481124 Merge "Support PCIe vendor name" into amd-dev 2023-08-30 09:58:21 -04:00
Deepak Mewar f1ade88d47 wrapper API to get first online core on cpu socket
Change-Id: Ia1785f94ff687e53fdb868e56d4a83c2466ba2ed
2023-08-29 05:15:33 -04:00
Deepak Mewar 0baa3f6b6a Renamed esmi library APIs and bound the APIs
to cpusocket handle

Change-Id: I6e3d8aa667df475339c28b27294349843f32230c
2023-08-29 05:15:12 -04:00
Deepak Mewar 7c0e21ddc7 Wrapper API declared for esmi error status
Change-Id: Ie3e00a50740d9ba58d7f4955ea6b76ab8b46fb5e
2023-08-29 05:14:01 -04:00
Galantsev, Dmitrii 1d24dd93a6 Fix uint32* -> int32* conversion error
Change-Id: I23c2a842468896e8d120ac4b8b55ef433dff6d85
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-28 18:32:31 -05:00
Bill(Shuzhou) Liu 9021ef96dc Support PCIe vendor name
Add the support for PCIe vendor name.

Change-Id: Ibc1d289a08731e4c5a14f992f3b0d31b51482396
2023-08-28 16:46:43 -05:00
Bill(Shuzhou) Liu 471fbfddc1 Numa affinity shows large number
Change the affinity from unsigned int to integer to represent -1.

Change-Id: I82dc6f476b45fa4ec03a3c686fe8e6e2b7761b56
2023-08-25 09:01:08 -04:00
Galantsev, Dmitrii 936719eeb6 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I9c38b4facd472b877d1ad133f3176a023c890955
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-23 16:04:15 -05:00
Charis Poag f191c2753c Error handling for unset freqs
Sending RSMI_STATUS_UNEXPECTED_DATA for drivers
which do not set some clock freqs

Change-Id: I43a9515c2757dddd412bb25cfd54095e63367030
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-23 10:44:57 -05:00
Bill(Shuzhou) Liu a10f00bf57 Fallback to kfd node when VRAM sysfs not available
The driver may not expose VRAM sysfs in certain system. Add a
fallback to it.

Change-Id: Ib3be71b4f4d2c79318d5026b0a97f3657d8a97b6
2023-08-17 14:36:03 -05:00
Charis Poag 755e14dbad [SWDEV-399953] Smart Temperature detection + partitioning display
* Updates:
    - Fix for devices which do not have edge sensors, but junction
    - Added partitioning (memory and dynamic) displays for
      base rocm-smi CLI calls
    - Added subheading for base rocm-smi call output
    - Added better hwmon and device detection logging

Change-Id: I8219884b2e532d6ed379527cacdc1f2b232a5451
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-10 19:53:38 -04:00
Oliveira, Daniel cc5ab079df Fix rsmitstReadWrite.TestPowerReadWrite test failure
Code changes related to the following:
  * All reinforcement work moved to their own files
  * Self contained changes only to support them
  * New files added to CMakeLists.txt

Change-Id: I761e91f54392824df9145eaed8b9805986861285
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-09 21:51:05 -05:00
Maisam Arif b14da692eb Added workaround for inconsistent current pcie speed from gpumetrics
Change-Id: If8404d21341cd15eb4d0221ab92cb0b351bbdf3e
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-09 11:35:35 -05:00
Maisam Arif 82ac307f9b Added Gen type to pcie info
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Icaa050a6f53fad608ed0353b2a0cbea33dee1dd2
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-02 23:42:48 -05:00
Charis Poag 9c7eed7edc [lib] Enhance Logger: gpu_metrics + enable console out
* Updates:
    - Env variable RSMI_LOGGING=0 or any other value
        -> all logging off
    - Env variable RSMI_LOGGING=1 -> logs only
    - Env variable RSMI_LOGGING=2 -> console only
    - Env variable RSMI_LOGGING=3 -> both logs + console
    - Metrics output includes hexdump of current file
      and decoded metrics (functions: logHexDump
      and log_gpu_metrics)
    - System info gathered, now includes if system's
      perceived endianness - little or big endian
      helpful for viewing decoded hexdump or any
      binary translation
    - Added templates for printing unsigned hex
      (print_unsigned_hex_and_int), unsigned integers
      (print_unsigned_int), and printing both unsigned
      hex and int with an optional header
      (print_unsigned_hex_and_int)
    - Fixed some build compile warnings/errors -
      ex. doing strncpys for sku or board names
      this operation is expected and needed
      and for temp file writes if unsuccessful
      we now properly send RSMI_STATUS_FILE_ERROR
    - Fixed on RHEL 8.8/9.x logrotate does not properly
      initialize

Change-Id: Ifa0f0218c9cafd0a8cd6aa8e7f94d61e9107200f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-01 21:46:19 -05:00
Maisam Arif a13d5be933 Updated READMEs
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Idf34bc431184414a17c3cb50c06543151ce3cb56
2023-08-01 14:28:33 -04:00
Maisam Arif ca59a60a9a Updated Versioning
corrected to amd-smi version from rocm-smi version
	Added newline characters in the gpu choices
	Updated cli versioning to 23.2.1.0 to match amd-smi

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ia6db3a281e2349e05a09209bdcfdfa5ac48e3a86
2023-08-01 14:28:27 -04:00
Maisam Arif d705801adf ASIC serial updates
Corrected asic serial fallback to use rsmi's unique id
	Removed product serial due to duplication

Change-Id: Ib4e9ac00d2bf31ccbc35060bc84f7e79e5332d37
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-01 14:28:19 -04:00
Deepak Mewar 8a9771b225 esmi library integration update v1.0
1. new class files for cpu socket and cpu core created
2. wrapper API's for getting energy monitoring, system
   statistics, power monitoring values implemented
3. modified amdsmi init & cleanup functions for esmi lib support
4. modified amdsmi system class for esmi lib support
5. sample test code created in example dir

Change-Id: Ic41f31641c283a681de696bb4346b557265bad42
2023-07-27 17:29:27 -05:00
Deepak Mewar 0187de61e2 esmi library header changes
1. New processor types AMD_CPU_CORE, AMD_APU added to ENUM
2. esmi errorcodes, wrappers for structures and library APIs
3. Macro introduced to enable/disable the esmi library code

Change-Id: Ia64b29303c231d3f17ac6b40fcd09b09b4380903
2023-07-27 16:21:24 -05:00
Bill(Shuzhou) Liu 55bf9cbe13 Change API to get the driver date
Support the driver date from libdrm.

Change-Id: I88e694732b538220e11fdb4029712bb5a6f44380
2023-07-21 08:28:06 -05:00
Oliveira, Daniel 573620f586 Add revision to --showhw
Code changes related to the following:
  * Added 'rsmi_dev_revision_get()' related code
  * Test code
  * Functional tests

Change-Id: I8c2097c65384a028c8c8437b717d05d52fe45250
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-07-18 16:17:33 -05:00
Marko Oblak 78faf411f8 SWDEV-391188 - [AMDSMI][LinuxGuest] Added description in amdsmi header file for amdsmi_get_gpu_process_list, changed mentioned API in py_interface
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: I8cb7f2c6595da6ab0263e6fa4365bde91d900979
2023-07-03 06:35:12 -04:00
Marko Oblak 01474ff14e SWDEV-392359 - [AMDSMI] [Linux] [Guest] Documented unsupported APIs
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: I0cff925082e6bc637e4b5073df64445380b3a3f5
2023-06-21 13:18:32 +02:00
Bill(Shuzhou) Liu 8f26e881fb SWDEV-405668 - BDF difference between amdsmi and rocmsmi
The render node discovery is changed to match rocm-smi index.

Change-Id: I707d0844b377304f4e8fc15035902c707805c2dc
2023-06-16 17:06:00 -04:00