Commit Graph

789 Commits

Author SHA1 Message Date
Maisam Arif 1394f74b92 Fixed metric temp try catch
Fixed tabbing
	Fixed gpu is to gpu ==
	Fixed metric temperature calls to do as much as possible and not
	error when one metric is not supported

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I360c380ad18581ab2e0cc8f7d1109d3da2556907
2023-09-14 18:44:00 -04:00
Maisam Arif 42b030def3 Spell check bandwith to bandwidth
Change-Id: Icfb3b2398fe0590dbab6e531c8ec1cdceebe658d
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-09-14 18:43:49 -04:00
Maisam Arif d2ef113457 SWDEV-412847 - Changed junction to hotspot
Change-Id: I7f6c1a0a77e6a09d2a3e831463cf03e35266bf40
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-09-14 17:43:26 -05:00
Galantsev, Dmitrii fe0b6e5f4c README - Add documentation links
Change-Id: I048c159394286545d518176d2751a43934b7fe9d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-13 19:38:38 -05:00
Shuzhou Liu ab615f6b2a Merge "Add API for the memory type" into amd-dev 2023-09-12 09:34:03 -04:00
Naveen Krishna Chatradhi 9a8770246a Merge "Updated esmi error checking for graceful return" into amd-dev 2023-09-11 05:36:58 -04:00
Dmitrii Galantsev d900792ef3 Merge "Removed replay_count from Virtual OS's" into amd-dev 2023-09-07 16:03:02 -04:00
Bill(Shuzhou) Liu b52034fed8 Add API for the memory type
Get the memory type from libdrm and add a new API.

Change-Id: I89327bca2ef860f2e3f4f6ca20def2331eba66c0
2023-09-07 13:05:58 -05:00
Deepak Mewar 14cf5f2762 Updated esmi error checking for graceful return
Change-Id: I1bcd498e3482dc7acd92b1a762f892b3dd978ff2
2023-09-04 08:27:12 -04:00
Galantsev, Dmitrii 489991a322 Fix temperature reads
Change-Id: Iad5e5201911f620495985591e21fc5aaae028faf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-31 18:50:58 -05:00
Dmitrii Galantsev f96c7663b5 Merge "Update amdsmi_wrapper.py and name fields" into amd-dev 2023-08-30 17:30:38 -04:00
Galantsev, Dmitrii 03cfdeefd5 Update amdsmi_wrapper.py and name fields
When updating the wrapper I ran into an issue with anonymous structs.
Generated wrapper would contain a string split into multiple lines,
which is invalid python.

e.g.
    'struct_struct anonymous
    (struct.... amdsmi.h:355)'

After naming the structs - the issue is gone. BDF union now has to be
addressed with .fields

e.g.
    OLD: bdf.function_number
    NEW: bdf.fields.function_number

Change-Id: Ib3c640c088ad0cc67893d636827356902051f17f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-30 16:30:03 -05:00
Maisam Arif cc4417d301 Removed replay_count from Virtual OS's
Change-Id: I5abb12d32e756f954f7e36595979bb8f1cbe5ad9
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-30 10:43:35 -05:00
Shuzhou Liu fc5b481124 Merge "Support PCIe vendor name" into amd-dev 2023-08-30 09:58:21 -04:00
Deepak Mewar f1ade88d47 wrapper API to get first online core on cpu socket
Change-Id: Ia1785f94ff687e53fdb868e56d4a83c2466ba2ed
2023-08-29 05:15:33 -04:00
Deepak Mewar 0baa3f6b6a Renamed esmi library APIs and bound the APIs
to cpusocket handle

Change-Id: I6e3d8aa667df475339c28b27294349843f32230c
2023-08-29 05:15:12 -04:00
Deepak Mewar a14237b508 Wrapper APIs for esmi library
ddr bw monitor, temperature query, Dimm statistics, xGMI BW control,

GMI3 width control, APB and LCLK level control, BW monitor

Change-Id: I9207451fbd81cfd904a2f9c29d9c28856894cf95
2023-08-29 05:14:52 -04:00
Deepak Mewar a7d7c5c6e1 Wrapper APIs and sample tests for esmi power control,
boostlimit monitor, boostlimit control, esmi error status

Change-Id: Id8db926eab2f6be386ed21081e651fcc9b389a22
2023-08-29 05:14:26 -04:00
Deepak Mewar 7c0e21ddc7 Wrapper API declared for esmi error status
Change-Id: Ie3e00a50740d9ba58d7f4955ea6b76ab8b46fb5e
2023-08-29 05:14:01 -04:00
Galantsev, Dmitrii 1d24dd93a6 Fix uint32* -> int32* conversion error
Change-Id: I23c2a842468896e8d120ac4b8b55ef433dff6d85
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-28 18:32:31 -05:00
Bill(Shuzhou) Liu 9021ef96dc Support PCIe vendor name
Add the support for PCIe vendor name.

Change-Id: Ibc1d289a08731e4c5a14f992f3b0d31b51482396
2023-08-28 16:46:43 -05:00
Galantsev, Dmitrii 14190c5a94 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I7a35220a2283b92c5b4825ee99d6693401ef8e1e
2023-08-28 16:01:19 -05:00
Maisam Arif f4d25d7ba3 Added mi300 market names & change tabs to spaces
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic0f363b888c9f19397fa84d2b26698b3881682b2
2023-08-28 15:53:57 -05:00
Galantsev, Dmitrii 84e90e55d5 TESTS - Add 90402 and simplify description
Change-Id: Ie6ab12d4201841fcb832d6827a5ec0ae5bb65114
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-25 14:01:53 -05:00
Bill(Shuzhou) Liu 471fbfddc1 Numa affinity shows large number
Change the affinity from unsigned int to integer to represent -1.

Change-Id: I82dc6f476b45fa4ec03a3c686fe8e6e2b7761b56
2023-08-25 09:01:08 -04:00
Oliveira, Daniel 3602447109 rocm_smi_lib/rocm_smi.py: Fix Add 'GPU name' in rocm-smi output
Code changes related to the following:
  * rocm_smi.py

Change-Id: I600e776bf479f972b8d639ce5a658a24916aed3c
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 13:22:24 -05:00
Oliveira, Daniel 654f65118b rocm_smi_lib/rocm_smi.py: Fix rocm-smi --resetfans shows 'permission denied'
Properly handles 'Not supported' fan cases where:
 * sysfs file (pwm#_enable) exists
 * sysfs file (pwm#_enable) does not exist

Change-Id: Ifa3c290e5ee1d27a550e94d86cd25ad8dcef3f59
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 10:54:50 -05:00
Oliveira, Daniel f9fd6b0a96 rocm_smi_lib/rocm_smi.py: Fix rocm-smi --showfan shows 'unable to detect fan'
Properly handles 'Unable to detect' vs 'Not supported' fan cases where:
 * sysfs file (pwm#) exists, and readings report zero (0), "Unable to detect fan speed"
 * sysfs file (pwm#) does not exist, then "Not supported"

Change-Id: If4b0312c872b76647a3e54427ba2a3f3e8e6dab1
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-23 20:43:08 -05:00
Galantsev, Dmitrii 936719eeb6 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I9c38b4facd472b877d1ad133f3176a023c890955
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-23 16:04:15 -05:00
Charis Poag f191c2753c Error handling for unset freqs
Sending RSMI_STATUS_UNEXPECTED_DATA for drivers
which do not set some clock freqs

Change-Id: I43a9515c2757dddd412bb25cfd54095e63367030
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-23 10:44:57 -05:00
Galantsev, Dmitrii 613bd8ad1d TESTS - Fix incorrect TestVoltCurvRead assert if not supported
Change-Id: I2242aa9be84543276c63f1f57fdc489754c9ee07
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-22 16:51:42 -04:00
Galantsev, Dmitrii 548b68cb67 .editorconfig - Remove broken whitespace rule
Change-Id: I67260f1f1952609dc89834d0763acd732bf39860
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-22 16:51:20 -04:00
Galantsev, Dmitrii 62f01cb150 TESTS - Use gpu version as a workaround for a missing name
Depends-On: Ifbd38f11fbde7ba28af4be1d611310dea1b5112a
Change-Id: Ia7b7975f03424854df0a470b2719cf2ff2cf8e40
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-21 19:18:22 -04:00
Maisam Arif ea560494d3 Merge "RPM:DEB: replace rocm_smi with amd_smi where needed" into amd-dev 2023-08-17 23:50:30 -04:00
Bill(Shuzhou) Liu a10f00bf57 Fallback to kfd node when VRAM sysfs not available
The driver may not expose VRAM sysfs in certain system. Add a
fallback to it.

Change-Id: Ib3be71b4f4d2c79318d5026b0a97f3657d8a97b6
2023-08-17 14:36:03 -05:00
Galantsev, Dmitrii 7cd72a583d RPM:DEB: replace rocm_smi with amd_smi where needed
__pycache__ directory wasn't getting removed.
Turns out we missed some rocm-smi renames when merging changes from it.

Culprit: I695bd085d4a43b678b563b4c35f6d2e8ddfa7d7c

Change-Id: Ieb0db41163af0337f1a3c06eb63a6960e6c52ff6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-17 01:37:40 -05:00
Sam Wu 18ef862886 update documentation requirements and dependabot configuration
Change-Id: I652872286c5975d770abb97513469f4527807d7f
2023-08-14 13:13:57 -06:00
Charis Poag 755e14dbad [SWDEV-399953] Smart Temperature detection + partitioning display
* Updates:
    - Fix for devices which do not have edge sensors, but junction
    - Added partitioning (memory and dynamic) displays for
      base rocm-smi CLI calls
    - Added subheading for base rocm-smi call output
    - Added better hwmon and device detection logging

Change-Id: I8219884b2e532d6ed379527cacdc1f2b232a5451
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-10 19:53:38 -04:00
Oliveira, Daniel cc5ab079df Fix rsmitstReadWrite.TestPowerReadWrite test failure
Code changes related to the following:
  * All reinforcement work moved to their own files
  * Self contained changes only to support them
  * New files added to CMakeLists.txt

Change-Id: I761e91f54392824df9145eaed8b9805986861285
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-09 21:51:05 -05:00
Maisam Arif b14da692eb Added workaround for inconsistent current pcie speed from gpumetrics
Change-Id: If8404d21341cd15eb4d0221ab92cb0b351bbdf3e
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-09 11:35:35 -05:00
Ranjith Ramakrishnan 9406cdd832 SWDEV-366827 - Disable file reorg backward compatibility support by default
Change-Id: I1de06d0d6a30c8c862d768b58460ef1b49d15e29
2023-08-07 09:21:19 -07:00
Maisam Arif c8f8734bc6 Updated amdsmi lib version call
Change-Id: Ibdf978760f0cd9126897a6a93b3c07ed34ee05cd
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-03 23:45:39 -05:00
Maisam Arif 38598c2ec5 Corrected bad_pages error checking
Change-Id: I9d00407987b28fcec523dfde7cab8db830c41174
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-03 00:43:14 -04:00
Maisam Arif d839192f21 SWDEV-412848 - Added power limit for parity with Host
Change-Id: Icb67a3642502107394bb525fcf6efb9e1830bbbd
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-03 00:43:09 -04:00
Maisam Arif 07a8287a18 SWDEV-412847 - Added Hotspot temp and edge limit checks
Change-Id: If549ee45214e784a28a3420f60bae7f4ae1a1022
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-03 00:43:05 -04:00
Maisam Arif 82ac307f9b Added Gen type to pcie info
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Icaa050a6f53fad608ed0353b2a0cbea33dee1dd2
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-02 23:42:48 -05:00
Charis Poag 9c7eed7edc [lib] Enhance Logger: gpu_metrics + enable console out
* Updates:
    - Env variable RSMI_LOGGING=0 or any other value
        -> all logging off
    - Env variable RSMI_LOGGING=1 -> logs only
    - Env variable RSMI_LOGGING=2 -> console only
    - Env variable RSMI_LOGGING=3 -> both logs + console
    - Metrics output includes hexdump of current file
      and decoded metrics (functions: logHexDump
      and log_gpu_metrics)
    - System info gathered, now includes if system's
      perceived endianness - little or big endian
      helpful for viewing decoded hexdump or any
      binary translation
    - Added templates for printing unsigned hex
      (print_unsigned_hex_and_int), unsigned integers
      (print_unsigned_int), and printing both unsigned
      hex and int with an optional header
      (print_unsigned_hex_and_int)
    - Fixed some build compile warnings/errors -
      ex. doing strncpys for sku or board names
      this operation is expected and needed
      and for temp file writes if unsuccessful
      we now properly send RSMI_STATUS_FILE_ERROR
    - Fixed on RHEL 8.8/9.x logrotate does not properly
      initialize

Change-Id: Ifa0f0218c9cafd0a8cd6aa8e7f94d61e9107200f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-01 21:46:19 -05:00
Maisam Arif 8630b59b81 Added Error handling to generator
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I77e869624e4f0c7586dc2c018242b8e5737f7d4b
2023-08-01 14:28:58 -04:00
Maisam Arif 6be5a69ef8 Checks before adding Units to output
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ib3f2cd8595693dd033a69523ed69d5807dc83346
2023-08-01 14:28:51 -04:00
Maisam Arif 27388c6208 Updated Clock minimum values
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ia4c34eca18077c595248ac34afed1b844a1be727
2023-08-01 14:28:45 -04:00