533 Υποβολές

Συγγραφέας SHA1 Μήνυμα Ημερομηνία
Deepak Mewar 1bbb19c8b7 Added amdsmi cpu family & cpu model
- Updated header and source files
- Updated python interface
- Generated python wrapper for updated header
- Updated the CLI to have cpu family & cpu model
  as part of metric table

Change-Id: Iea440251797270d5d29ffe883b0ad6db790be658


[ROCm/amdsmi commit: 6f7273fda5]
2024-02-06 18:46:27 -05:00
Maisam Arif 39537d999d SWDEV-436533 - Cache Info Struct Update
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic640fa657cdcc32d7b00ff78fc9452ec7e05dd07


[ROCm/amdsmi commit: 88192d8b6b]
2024-02-05 16:51:04 -05:00
Maisam Arif d5f2a6770a Fixed gpu_metric and cache cli checks
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic71e2b50dfa8fc106a17079842a7564a8e24b69d


[ROCm/amdsmi commit: 59d885a9ca]
2024-02-01 05:47:18 -05:00
Oliveira, Daniel a2f04dd3bc fix: [rocm/amd_smi_lib] header cleanup Remove non-unified headers
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards

Code changes related to the following:
  * '_get_gpu_metrics_' APIs
  * Functional tests

Change-Id: I2dd2ecde11c1d77e343e0ae0e10aeb9120ae9b99
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/amdsmi commit: 55734d2d7a]
2024-01-26 10:38:48 -05:00
Charis Poag f357c180e7 Fix metric type error output + re-align with ROCm SMI metrics
Changes:
* [CLI] Provide fix for "/opt/rocm/bin/amd-smi metric
TypeError: '>' not supported between instances of 'str' and 'i"
--> Python API was updated, CLI needed to reflect these changes
* [API] Updated amdsmi.h's with ROCm SMI
--> Incorrectly added mem_bandwidth_acc & mem_max_bandwidth
--> Realigned wrapper with updates
* [Test] Added metrics not shown in gpu_metrics_read.cc

Change-Id: Ia3a172377fd5a582254dd5a46d81dbec7e763cd9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 34bd26c68e]
2024-01-24 21:23:40 -06:00
Bill(Shuzhou) Liu 25ffbb0304 Unified API
amdsmi_get_link_metrics() and amdsmi_get_pcie_info()

Change-Id: Iea060e449813b842236243b772e8809497ce98fe


[ROCm/amdsmi commit: 0b67c2ccc4]
2024-01-24 18:27:20 -05:00
Maisam Arif 7e831b1992 24.2.0 Version update
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ied7c24d63ca38c2e5ea5eca6b411e0156f61a403


[ROCm/amdsmi commit: c400a22d4d]
2024-01-24 11:13:02 -06:00
Maisam Arif 9eef868334 24.1.0 Version update
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ibfe92d199b10dc48ece85dfdeda1041f5ea98626


[ROCm/amdsmi commit: c48c989bbc]
2024-01-24 12:09:48 -05:00
Deepak Mewar 6ef2131a21 amdsmi library updated for esmi error status mapping to amdsmi
Change-Id: I7e4dd146a1a9af496556efcf811b2e1ed565b09e


[ROCm/amdsmi commit: 5d0b479661]
2024-01-16 11:41:22 -06:00
Deepak Mewar 171f4818f4 amdsmi library updated for metric table structure
Change-Id: Ie8a9840a9020282599dd413e964d86bfb8850f6a


[ROCm/amdsmi commit: a0c95e855b]
2024-01-16 11:41:22 -06:00
Deepak Mewar 3a00172186 amdsmi library and sample code updated for amdsmi_get_metrics_table
Change-Id: Ie03c556f5c38fe4a0365743d3a94220e3aa62b23


[ROCm/amdsmi commit: 9f3a6dbd29]
2024-01-16 11:41:22 -06:00
Bill(Shuzhou) Liu 28f354796d Use the same mutex as rocm-smi
Share the same mutex as rocm-smi implementation. Handle the crash
when a user is not in render group.

Change-Id: I486b26569f9b523b41bbdaf95d51f4a730978cfd


[ROCm/amdsmi commit: 5a6b5d2a0a]
2024-01-15 13:12:49 -05:00
Charis Poag 601a254f37 Fix GPU metric tests & cleanup test output
- CLI: Added average_power to display if current_power is empty
    - CLI: fixed PCIe current_speed not displaying GT/s
    - ROCm API: 1.3 & 1.4
                -> commented out setting avg clocks to current clock value
(leave as max uint value, not re-assign; these are not same values)
                    -> commented out setting current_socket_power = average_power
(leave as max uint value, not re-assign; these are not same values)
                    -> For all non-array clocks, placed value in first
                        array[0] to keep outputs consistent
                    (helps xcd calc)
      - ROCm API: rsmi_dev_metrics_curr_gfxclk_get fixed to count
        XCDs using backwards compatible rsmi_dev_gpu_metrics_info_get.
      - ^ Fixes XCD count overall + assigning clock[0] in 1.3 to curr
        freq
      - AMD SMI API: amdsmi_get_gpu_metrics_info() initialized all new
        1.5 metric values for all lower metric tables
      - AMD SMI API: wrapper -> fix is here + returns correct AMD SMI return
      - AMD SMI API: wrapper -> now displays amdsmi return status as
        string in logs
      - gpu_metrics_read.cc -> now has better overview of backwards
        compatible output
      - gpu_metrics_read.cc -> Cleaned up output, added units, and
        display all array output

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id5b60ded5b0ed2cdf0f96ca72c79e356f0410960


[ROCm/amdsmi commit: 5ff5af0b5a]
2023-12-19 14:18:15 -05:00
Naveen Krishna Chatradhi 4bd015f945 amd-smi: fix cpu specific apis and header
1. provide prototype and documentation for esmi specific api.
   define structures and update classes as required
2. update cmake files as required and add esmi api to the
   amdsmi esmi integration example.

Change-Id: I753ec176f9b381e74c9646525dfd9075237bf8d9


[ROCm/amdsmi commit: 65eed73f4d]
2023-12-18 06:28:15 -05:00
Charis Poag 4f502e5dab Add vcn and jpeg activity
Changes:
    - Add new engine field vcn_activity (from 1.4/1.5
      gpu_metrics
    - Updated log output to enhance view of gpu_metric
      data as json pretty print
    - Added new fields provided in 1.5
    - Added unit overview in python API, CLI is WIP

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: I7d9f29e7ecc35dcd0697814c222cdd02b0d5518e


[ROCm/amdsmi commit: 8f3861e1d9]
2023-12-15 22:18:46 -05:00
Bill(Shuzhou) Liu 9dc60e00cb Support max_num_cu_shared and num_cache_instance
Add above fields for cache info. Remove driver_date in CLI and
Remove the disable properties of cache.

Change-Id: I80672490908d9e32a149076cc37459fa56b8b0bf


[ROCm/amdsmi commit: 59b510de2b]
2023-12-14 09:59:35 -05:00
Bill(Shuzhou) Liu 985ddbc5d5 Collect compute partition devices under the same socket
The socket represents a physical device, and the partition devices
should belong to the socket. The partition devices are only
different in function id in BDF. Use the BD part of the BDF to
identify a socket.

Change-Id: I5d355a6f5db02faa7555b760a36c7351b8d8d835


[ROCm/amdsmi commit: de7e74f7db]
2023-11-29 08:23:23 -06:00
Maisam Arif a8138bfd5e Change xgmi_physical_id to oam_id
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I35fb36ec0e9f72a7135d8bb9070dbdc0e956b93a


[ROCm/amdsmi commit: b54086a037]
2023-11-22 12:16:38 -06:00
Maisam Arif 09f4046345 Refactor gpu_metrics usage in CLI
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I599878971ab94a768d008f046f2d303ad76fdb3b


[ROCm/amdsmi commit: 5b36b438b7]
2023-11-22 03:32:55 -06:00
Maisam Arif ff96f50145 Refactor gpu_metrics usage in libraries
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I763638d4b546bf49b234e823df81028c357e8f49


[ROCm/amdsmi commit: d790ebc62b]
2023-11-22 03:32:15 -06:00
Bill(Shuzhou) Liu c7f9cff2cb Add APIs for PM table and register table
Read the PM table and register table as the name value pair.

Change-Id: Ie44fe67a28af3341bd6beb90d809e90f280351ac


[ROCm/amdsmi commit: ac1ba33371]
2023-11-20 12:31:18 -05:00
Maisam Arif 37a41c3bc8 SWDEV-426130 - Updated firmware subcommand output
Corrected truncation
	corrected xgmi to ta_xgmi
	remapped smc(system management controller) to pm(power
management)

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I404cefa7b90a454d4f4b08f6490448b47cf32107


[ROCm/amdsmi commit: 545e57d3e3]
2023-11-14 11:56:43 -05:00
Deepak Mewar 591221eee6 modified local esmi functions called from amdsmi_init
for gtest compatibility

Change-Id: I627c9887a1f1e340c358f060818a1a7d74ce33f9


[ROCm/amdsmi commit: 0c790752ac]
2023-11-10 15:50:42 -05:00
Maisam Arif 0a20cc33ab Updated License Dates
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Id6fd66b03c602232ecc1a063a534a15fe3a03f56


[ROCm/amdsmi commit: 5dba2f3120]
2023-11-07 03:57:08 -05:00
Bill(Shuzhou) Liu e05f594cba Support cache type in cache info
Add the cache type to the cache info.

Change-Id: Ic13ca9640b65d2b414eeebe7b884530f2036aac8


[ROCm/amdsmi commit: 56b246cc3c]
2023-11-02 04:53:38 -05:00
Deepak Mewar a09863f550 Esmi Auxillary API wrappers removed from amdsmi library
that are called during amdsmi inititalization
    amdsmi_get_cpu_family,
    amdsmi_get_cpu_model,
    amdsmi_get_cpu_threads_per_core,
    amdsmi_get_number_of_cpu_cores,
    amdsmi_get_number_of_cpu_sockets

Added amdsmi_get_cpucore_info to amdsmi library

Change-Id: Ib88d580e1d85afdf578963247e585cfae05c58ad


[ROCm/amdsmi commit: 28f6383639]
2023-10-30 20:59:21 -04:00
Maisam Arif b14d1ca543 SWDEV-410051 - Updates to board_info struct & CLI
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I8735d8965140ee5da0c35106b388af1dca87ec71


[ROCm/amdsmi commit: 2b4637ff9f]
2023-10-27 16:52:56 -05:00
Maisam Arif 0bddd17717 Updated READMEs & Versioning for 6.0 Release
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Idadece3c1022ecba4291b96ddbe23112e27394de


[ROCm/amdsmi commit: 5018a57b62]
2023-10-16 16:57:49 -05:00
Maisam Arif e77abc0a1d Added memory & compute partitions to amd-smi lib
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: If3acea6ad281298f1f05785b2e6d8e70fae8d89b


[ROCm/amdsmi commit: 1f8d9cb9ef]
2023-10-13 21:47:59 -04:00
Deepak Mewar fad0af9ba2 esmi: remove energy reporting, fix errors from clang compiler
Clang compiler reporting errors while generating python wrappers for esmi lib

Change-Id: I62352aba3b87f9a6b044c97af6b9fd649612b622


[ROCm/amdsmi commit: ee890c5060]
2023-10-13 14:45:25 -04:00
Bill(Shuzhou) Liu b9073f2bf7 Add new API for RAS related information
The API to get the EEPROM version and ECC schema.

Change-Id: Iee6b3c555541a33bf16bf9ac1fd60100dfff5643


[ROCm/amdsmi commit: d92d4e4b38]
2023-10-13 02:06:14 -04:00
Galantsev, Dmitrii e3ee60fc5e Merge rocmsmi/amd-staging into amd-dev 20231010
Change-Id: I492562094a004eb78b2cc2b52d14d013d9f97112
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 6d72d65c48]
2023-10-11 18:58:12 -05:00
Galantsev, Dmitrii 4e46b9ebf1 Fix amdsmi.h and update wrapper
Having an unnamed struct confuses our wrapper generator.
Adding a name solved it.

Change-Id: Iab3e73317fb21fb3667beef04878d4f3da96eadf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 1b606acf73]
2023-10-10 17:58:25 -05:00
Bill(Shuzhou) Liu dc0d637136 Add support to XGMI physical id
Get XGMI physical id from sysfs.

Change-Id: Ifd9e431bc2fbfd759d888a71b99046a5eb07b6ed


[ROCm/amdsmi commit: 6ca95c1a2d]
2023-10-10 09:29:05 -04:00
Charis Poag d54164d733 Add rsmi_dev_power_get
* Updates:
  - [API] Added rsmi_dev_power_get(uint32_t dv_ind,
                                   uint64_t *power,
                                   RSMI_POWER_TYPE
                                   *type)
          provides generic get to average or
          current power & provides backwards
          compatibility
  - Added a utility function to get MonitorTypes
    (monitor_type_string(type)) &
    RSMI_POWER_TYPE (power_type_string(type))
    strings
  - [Tests] Added rsmi_dev_power_get tests and
    provided better verification of return values for
    all power APIs
  - [Tests] Updated power outputs to show correct
    units
  - [example] Now uses avg, current, and generic
    power functions with type output response

Change-Id: I5ca06ca37fd5f61e100f2835b664d6cdd1ca42e6
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 31a1fcce7d]
2023-10-10 00:34:19 -05:00
Deepak Mewar fe29a848bc added metric table wrapper APIS & test code
Change-Id: I24207b3c32d7294337140a1f5108b81f3bf33580


[ROCm/amdsmi commit: 192fb538be]
2023-10-10 00:03:11 -04:00
Oliveira, Daniel 52f3e90525 rocm_smi_lib: Fix Modernize and refactor gpu_metrics
Adds support for 'gpu_metrics_v1_4' and new counters

Code changes related to the following:
  * rsmi gpu_metrics APIs
  * rsmi gpu_metrics Logs
  * The new gpu_metrics are now part of the Device

Build changes related to the following: None

Change-Id: Ie748e977cd0a01c6a2fb82260014c0699605dbb3
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/amdsmi commit: 4e4ebde640]
2023-10-09 21:43:22 -05:00
Charis Poag 5d15251762 Rename NPS -> memory partition + compute partition node fix
* Updates:
        - rocm_smi_lib + CLI:
          Rename all "NPS mode" -> "memory partition"
          related files/functions/API/CLI to align with correct
          technical naming
        - rocm_smi_main: fixed identifying primary card's unique id
          utilize rsmi_dev_unique_id_get to map which
          KFD nodes belong to it
        - rsmi_dev_*_partition*: now have better logging output
        - compute partition tests:
          Added 20 sec delay for workaround until GPU
          busy is confirmed as the issue
        - CPPLint fixes/formatting
        - [Example] Moved all endl to "\n" for efficiency
        - [Example] Added Edge & Junction temperature examples
        - [Example] Added rsmi_minmax_bandwidth_get() example - WIP

Change-Id: Ida6db6fda7e0ac9d696a34cb15b4746e69d58d51
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: b251bb0c9f]
2023-10-06 11:51:09 -04:00
Bill(Shuzhou) Liu c1a7a09f30 APIs for the cache level and size
Read the cache level and size from topoogy sysfs file.

Change-Id: Id3c558c95bcb79139a19e4adbaa7ff333d06098f


[ROCm/amdsmi commit: 1a233f93fb]
2023-10-05 11:10:54 -05:00
Maisam Arif 401d3f229c Added driver_name to amdsmi_cli tool
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I8f3d52e0b23298443b2b16afec418cbbbc5f77e0


[ROCm/amdsmi commit: 572bf563d1]
2023-10-04 08:54:19 -04:00
Maisam Arif 76d025cff0 SWDEV-410230 - Added slot_type to amd-smi static --bus
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I2006a3525a8aa9091bf54501461d364f7237f00f


[ROCm/amdsmi commit: fadf1b6cc9]
2023-10-02 10:15:34 -04:00
Bill(Shuzhou) Liu 90c9c8de4e Get PCIe slot type
Add API to get the PCIe slot type.

Change-Id: If6894af53894c524d61c7586c59768541bbf0ac6


[ROCm/amdsmi commit: 9eccf20f0c]
2023-09-27 23:31:09 -04:00
Maisam Arif fb0440d493 Added sleep state to amd-smi metric --clock
Change-Id: Idb5fbc84a787ef1affdf0449b6dd77ab6e50e91d
Signed-off-by: Maisam Arif <maisarif@amd.com>


[ROCm/amdsmi commit: 95337c88fc]
2023-09-26 15:21:25 -05:00
Galantsev, Dmitrii 07e65d05d4 SWDEV-423796 - Resolve stack smashing issue
Inconsistency between struct fields caused stack smashing

Change-Id: Ib06d67723e062d4306420854ba7ab45fb252ffe3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 21dcf6d66c]
2023-09-25 11:24:55 -05:00
Galantsev, Dmitrii 49553cf896 Merge remote-tracking branch 'rocmsmi/amd-staging' into HEAD
Change-Id: I0661926c10eef2bc32b83d9a63a3a6eb6991e781


[ROCm/amdsmi commit: 31cc2eecfb]
2023-09-25 04:35:53 -05:00
Maisam Arif d0656df4ca Updated tool & lib versions & README.md
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic41a36bcfa988ce9c8304157593012752857e919


[ROCm/amdsmi commit: 25b055014d]
2023-09-25 02:02:22 -05:00
Charis Poag 6e81bbcf16 Add Current (Instant) Socket Power
* Updates:
    - rocm_smi_logger:
      General cleanup &
      Aligned to cpplint rules for usage
    - rocm_smi_monitor:
      Fixed MonitorTypes
      from not displaying properly in logs
      & Added socket power label + current
      socket power MonitorTypes
    - rocm_smi API:
      Added rsmi_dev_current_socket_power_get API
    - rocm_smi CLI:
      General cleanup,
      Concise info now displays device data
      in variable width (see printLogSpacer's
      new field),
      printLogSpacer now as an adjustable
      variable that overrides appWidth,
      Added Socket Power to base rocm-smi +
      --showpower CLI calls,
      --showpower & base rocm-smi CLI defaults
      to printing socket power (if not available,
      displays average power)
    - Cleaned up temp label references
    - power_read gtests:
      Added current socket power to testing

Change-Id: Ica57e6f98ad96e2584e7c7955e188f68d2dab89d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: f078375350]
2023-09-25 01:38:54 -04:00
Galantsev, Dmitrii e9addd72cc SWDEV-422836 - Add sleep frequency support
Change-Id: I0bde403b010bf036ce44ed0600cc7eb03742c6b6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 3d40c4bb2c]
2023-09-25 01:38:27 -04:00
Ori Messinger 0d1ac5edac ROCm SMI LIB: Add Missing Firmware Blocks
The purpose of this patch is to add the following missing firmware
blocks to the SMI LIB:
-RSMI_FW_BLOCK_MES
-RSMI_FW_BLOCK_MES_KIQ

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5d4d37d883878dd02ef8533d4eb8891d54d70630


[ROCm/amdsmi commit: d44a6ef523]
2023-09-25 01:37:38 -04:00
Galantsev, Dmitrii 49bd046e6e actvity -> activity
Change-Id: Ie31d9faca2181cb2d47f7f4764b64ed8cc7f8007
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 2589d677b0]
2023-09-22 11:45:21 -05:00