提交图

779 次代码提交

作者 SHA1 备注 提交日期
Bill(Shuzhou) Liu b52034fed8 Add API for the memory type
Get the memory type from libdrm and add a new API.

Change-Id: I89327bca2ef860f2e3f4f6ca20def2331eba66c0
2023-09-07 13:05:58 -05:00
Dmitrii Galantsev f96c7663b5 Merge "Update amdsmi_wrapper.py and name fields" into amd-dev 2023-08-30 17:30:38 -04:00
Galantsev, Dmitrii 03cfdeefd5 Update amdsmi_wrapper.py and name fields
When updating the wrapper I ran into an issue with anonymous structs.
Generated wrapper would contain a string split into multiple lines,
which is invalid python.

e.g.
    'struct_struct anonymous
    (struct.... amdsmi.h:355)'

After naming the structs - the issue is gone. BDF union now has to be
addressed with .fields

e.g.
    OLD: bdf.function_number
    NEW: bdf.fields.function_number

Change-Id: Ib3c640c088ad0cc67893d636827356902051f17f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-30 16:30:03 -05:00
Shuzhou Liu fc5b481124 Merge "Support PCIe vendor name" into amd-dev 2023-08-30 09:58:21 -04:00
Deepak Mewar f1ade88d47 wrapper API to get first online core on cpu socket
Change-Id: Ia1785f94ff687e53fdb868e56d4a83c2466ba2ed
2023-08-29 05:15:33 -04:00
Deepak Mewar 0baa3f6b6a Renamed esmi library APIs and bound the APIs
to cpusocket handle

Change-Id: I6e3d8aa667df475339c28b27294349843f32230c
2023-08-29 05:15:12 -04:00
Deepak Mewar a14237b508 Wrapper APIs for esmi library
ddr bw monitor, temperature query, Dimm statistics, xGMI BW control,

GMI3 width control, APB and LCLK level control, BW monitor

Change-Id: I9207451fbd81cfd904a2f9c29d9c28856894cf95
2023-08-29 05:14:52 -04:00
Deepak Mewar a7d7c5c6e1 Wrapper APIs and sample tests for esmi power control,
boostlimit monitor, boostlimit control, esmi error status

Change-Id: Id8db926eab2f6be386ed21081e651fcc9b389a22
2023-08-29 05:14:26 -04:00
Deepak Mewar 7c0e21ddc7 Wrapper API declared for esmi error status
Change-Id: Ie3e00a50740d9ba58d7f4955ea6b76ab8b46fb5e
2023-08-29 05:14:01 -04:00
Galantsev, Dmitrii 1d24dd93a6 Fix uint32* -> int32* conversion error
Change-Id: I23c2a842468896e8d120ac4b8b55ef433dff6d85
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-28 18:32:31 -05:00
Bill(Shuzhou) Liu 9021ef96dc Support PCIe vendor name
Add the support for PCIe vendor name.

Change-Id: Ibc1d289a08731e4c5a14f992f3b0d31b51482396
2023-08-28 16:46:43 -05:00
Galantsev, Dmitrii 14190c5a94 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I7a35220a2283b92c5b4825ee99d6693401ef8e1e
2023-08-28 16:01:19 -05:00
Maisam Arif f4d25d7ba3 Added mi300 market names & change tabs to spaces
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic0f363b888c9f19397fa84d2b26698b3881682b2
2023-08-28 15:53:57 -05:00
Galantsev, Dmitrii 84e90e55d5 TESTS - Add 90402 and simplify description
Change-Id: Ie6ab12d4201841fcb832d6827a5ec0ae5bb65114
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-25 14:01:53 -05:00
Bill(Shuzhou) Liu 471fbfddc1 Numa affinity shows large number
Change the affinity from unsigned int to integer to represent -1.

Change-Id: I82dc6f476b45fa4ec03a3c686fe8e6e2b7761b56
2023-08-25 09:01:08 -04:00
Oliveira, Daniel 3602447109 rocm_smi_lib/rocm_smi.py: Fix Add 'GPU name' in rocm-smi output
Code changes related to the following:
  * rocm_smi.py

Change-Id: I600e776bf479f972b8d639ce5a658a24916aed3c
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 13:22:24 -05:00
Oliveira, Daniel 654f65118b rocm_smi_lib/rocm_smi.py: Fix rocm-smi --resetfans shows 'permission denied'
Properly handles 'Not supported' fan cases where:
 * sysfs file (pwm#_enable) exists
 * sysfs file (pwm#_enable) does not exist

Change-Id: Ifa3c290e5ee1d27a550e94d86cd25ad8dcef3f59
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-24 10:54:50 -05:00
Oliveira, Daniel f9fd6b0a96 rocm_smi_lib/rocm_smi.py: Fix rocm-smi --showfan shows 'unable to detect fan'
Properly handles 'Unable to detect' vs 'Not supported' fan cases where:
 * sysfs file (pwm#) exists, and readings report zero (0), "Unable to detect fan speed"
 * sysfs file (pwm#) does not exist, then "Not supported"

Change-Id: If4b0312c872b76647a3e54427ba2a3f3e8e6dab1
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-23 20:43:08 -05:00
Galantsev, Dmitrii 936719eeb6 Merge remote-tracking branch 'rocmsmi/amd-staging' into amd-dev
Change-Id: I9c38b4facd472b877d1ad133f3176a023c890955
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-23 16:04:15 -05:00
Charis Poag f191c2753c Error handling for unset freqs
Sending RSMI_STATUS_UNEXPECTED_DATA for drivers
which do not set some clock freqs

Change-Id: I43a9515c2757dddd412bb25cfd54095e63367030
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-23 10:44:57 -05:00
Galantsev, Dmitrii 613bd8ad1d TESTS - Fix incorrect TestVoltCurvRead assert if not supported
Change-Id: I2242aa9be84543276c63f1f57fdc489754c9ee07
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-22 16:51:42 -04:00
Galantsev, Dmitrii 548b68cb67 .editorconfig - Remove broken whitespace rule
Change-Id: I67260f1f1952609dc89834d0763acd732bf39860
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-22 16:51:20 -04:00
Galantsev, Dmitrii 62f01cb150 TESTS - Use gpu version as a workaround for a missing name
Depends-On: Ifbd38f11fbde7ba28af4be1d611310dea1b5112a
Change-Id: Ia7b7975f03424854df0a470b2719cf2ff2cf8e40
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-21 19:18:22 -04:00
Maisam Arif ea560494d3 Merge "RPM:DEB: replace rocm_smi with amd_smi where needed" into amd-dev 2023-08-17 23:50:30 -04:00
Bill(Shuzhou) Liu a10f00bf57 Fallback to kfd node when VRAM sysfs not available
The driver may not expose VRAM sysfs in certain system. Add a
fallback to it.

Change-Id: Ib3be71b4f4d2c79318d5026b0a97f3657d8a97b6
2023-08-17 14:36:03 -05:00
Galantsev, Dmitrii 7cd72a583d RPM:DEB: replace rocm_smi with amd_smi where needed
__pycache__ directory wasn't getting removed.
Turns out we missed some rocm-smi renames when merging changes from it.

Culprit: I695bd085d4a43b678b563b4c35f6d2e8ddfa7d7c

Change-Id: Ieb0db41163af0337f1a3c06eb63a6960e6c52ff6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-08-17 01:37:40 -05:00
Sam Wu 18ef862886 update documentation requirements and dependabot configuration
Change-Id: I652872286c5975d770abb97513469f4527807d7f
2023-08-14 13:13:57 -06:00
Charis Poag 755e14dbad [SWDEV-399953] Smart Temperature detection + partitioning display
* Updates:
    - Fix for devices which do not have edge sensors, but junction
    - Added partitioning (memory and dynamic) displays for
      base rocm-smi CLI calls
    - Added subheading for base rocm-smi call output
    - Added better hwmon and device detection logging

Change-Id: I8219884b2e532d6ed379527cacdc1f2b232a5451
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-10 19:53:38 -04:00
Oliveira, Daniel cc5ab079df Fix rsmitstReadWrite.TestPowerReadWrite test failure
Code changes related to the following:
  * All reinforcement work moved to their own files
  * Self contained changes only to support them
  * New files added to CMakeLists.txt

Change-Id: I761e91f54392824df9145eaed8b9805986861285
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-08-09 21:51:05 -05:00
Maisam Arif b14da692eb Added workaround for inconsistent current pcie speed from gpumetrics
Change-Id: If8404d21341cd15eb4d0221ab92cb0b351bbdf3e
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-09 11:35:35 -05:00
Ranjith Ramakrishnan 9406cdd832 SWDEV-366827 - Disable file reorg backward compatibility support by default
Change-Id: I1de06d0d6a30c8c862d768b58460ef1b49d15e29
2023-08-07 09:21:19 -07:00
Maisam Arif c8f8734bc6 Updated amdsmi lib version call
Change-Id: Ibdf978760f0cd9126897a6a93b3c07ed34ee05cd
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-03 23:45:39 -05:00
Maisam Arif 38598c2ec5 Corrected bad_pages error checking
Change-Id: I9d00407987b28fcec523dfde7cab8db830c41174
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-03 00:43:14 -04:00
Maisam Arif d839192f21 SWDEV-412848 - Added power limit for parity with Host
Change-Id: Icb67a3642502107394bb525fcf6efb9e1830bbbd
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-03 00:43:09 -04:00
Maisam Arif 07a8287a18 SWDEV-412847 - Added Hotspot temp and edge limit checks
Change-Id: If549ee45214e784a28a3420f60bae7f4ae1a1022
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-03 00:43:05 -04:00
Maisam Arif 82ac307f9b Added Gen type to pcie info
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Icaa050a6f53fad608ed0353b2a0cbea33dee1dd2
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-02 23:42:48 -05:00
Charis Poag 9c7eed7edc [lib] Enhance Logger: gpu_metrics + enable console out
* Updates:
    - Env variable RSMI_LOGGING=0 or any other value
        -> all logging off
    - Env variable RSMI_LOGGING=1 -> logs only
    - Env variable RSMI_LOGGING=2 -> console only
    - Env variable RSMI_LOGGING=3 -> both logs + console
    - Metrics output includes hexdump of current file
      and decoded metrics (functions: logHexDump
      and log_gpu_metrics)
    - System info gathered, now includes if system's
      perceived endianness - little or big endian
      helpful for viewing decoded hexdump or any
      binary translation
    - Added templates for printing unsigned hex
      (print_unsigned_hex_and_int), unsigned integers
      (print_unsigned_int), and printing both unsigned
      hex and int with an optional header
      (print_unsigned_hex_and_int)
    - Fixed some build compile warnings/errors -
      ex. doing strncpys for sku or board names
      this operation is expected and needed
      and for temp file writes if unsuccessful
      we now properly send RSMI_STATUS_FILE_ERROR
    - Fixed on RHEL 8.8/9.x logrotate does not properly
      initialize

Change-Id: Ifa0f0218c9cafd0a8cd6aa8e7f94d61e9107200f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-08-01 21:46:19 -05:00
Maisam Arif 8630b59b81 Added Error handling to generator
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I77e869624e4f0c7586dc2c018242b8e5737f7d4b
2023-08-01 14:28:58 -04:00
Maisam Arif 6be5a69ef8 Checks before adding Units to output
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ib3f2cd8595693dd033a69523ed69d5807dc83346
2023-08-01 14:28:51 -04:00
Maisam Arif 27388c6208 Updated Clock minimum values
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ia4c34eca18077c595248ac34afed1b844a1be727
2023-08-01 14:28:45 -04:00
Maisam Arif d5ad387252 Removed cmdline options
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I3f98829e988468d657f280db6765f2f5e28ff5f1
2023-08-01 14:28:40 -04:00
Maisam Arif a13d5be933 Updated READMEs
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Idf34bc431184414a17c3cb50c06543151ce3cb56
2023-08-01 14:28:33 -04:00
Maisam Arif ca59a60a9a Updated Versioning
corrected to amd-smi version from rocm-smi version
	Added newline characters in the gpu choices
	Updated cli versioning to 23.2.1.0 to match amd-smi

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ia6db3a281e2349e05a09209bdcfdfa5ac48e3a86
2023-08-01 14:28:27 -04:00
Maisam Arif d705801adf ASIC serial updates
Corrected asic serial fallback to use rsmi's unique id
	Removed product serial due to duplication

Change-Id: Ib4e9ac00d2bf31ccbc35060bc84f7e79e5332d37
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-08-01 14:28:19 -04:00
Bill(Shuzhou) Liu 0522439ac2 Crash when ecc count sysfile cannot be read
Replace assert with error handling code.

Change-Id: I6500ae4d38a8caea87828aa7d76373d20c8354c7
2023-07-31 08:36:53 -05:00
Deepak Mewar 8a9771b225 esmi library integration update v1.0
1. new class files for cpu socket and cpu core created
2. wrapper API's for getting energy monitoring, system
   statistics, power monitoring values implemented
3. modified amdsmi init & cleanup functions for esmi lib support
4. modified amdsmi system class for esmi lib support
5. sample test code created in example dir

Change-Id: Ic41f31641c283a681de696bb4346b557265bad42
2023-07-27 17:29:27 -05:00
Deepak Mewar 0187de61e2 esmi library header changes
1. New processor types AMD_CPU_CORE, AMD_APU added to ENUM
2. esmi errorcodes, wrappers for structures and library APIs
3. Macro introduced to enable/disable the esmi library code

Change-Id: Ia64b29303c231d3f17ac6b40fcd09b09b4380903
2023-07-27 16:21:24 -05:00
Marko Oblak e0b84c5d1f SWDEV-413516 - [AMDSMI][Linux][BM] Changed mapping pcie speed from pcie type
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: I5bcee99ec596bb831465d5a4e98a78681c24b20f
2023-07-27 16:21:24 -05:00
Marko Oblak f9ce3925a4 SWDEV-412839 - PCIe speed - change in mapping
Signed-off-by: Marko Oblak <Marko.Oblak@amd.com>
Change-Id: If75530a8b2e1647d2cb31733decfba3837dac7bf
2023-07-27 16:21:24 -05:00
Bill(Shuzhou) Liu aeb6c61f54 Change reset power error message to logging
Since the reset will continue if the reset power and current power
is the same, error may confuse the user.

Change-Id: I35b9ef17afd47b5af5bd2b8882a44f63991fe509
2023-07-27 15:18:28 -05:00