نمودار کامیت

337 کامیت‌ها

مولف SHA1 پیام تاریخ
Deepak Mewar 34ccbb5d1b Updated amdsmi header for ESMI doxygen formatting
Referencing htttps://github.com/ROCm/amdsmi/pull/10

Change-Id: I516e3643130db8a4213aee7dfcaca27363e3171e
Signed-off-by: Maisam Arif <maisarif@amd.com>
2024-02-14 02:03:05 -06:00
Oliveira, Daniel 78074d7d77 fix: [rocm/amd_smi_lib] amdsmi_get_gpu_activity gfx/memory activity does not update
Checks and forces rereading gpu metrics unconditionally

Code changes related to the following:
  * Device::dev_log_gpu_metrics()
  * amdsmi_get_gpu_metrics_header_info()
    Removed unintentionally during work on 'header cleanup Remove non-unified headers'
  * Examples
  * Unit tests

Change-Id: I83710e173c0f7102d0b7f865c18474c979a95cd8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-02-13 10:15:17 -06:00
Maisam Arif f831cf49f7 Renamed amdsmi_get_metrics_table to amdsmi_get_cpu_metrics_table
Renamed structs to be more conistent with what they are calling

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I6f2be2fcb76f004aa592f0dad8545565700ccd4b
2024-02-12 16:30:18 -06:00
Bill(Shuzhou) Liu 86d025daaa Add @platform doxygen alias
The @platform alias will describe which platform (for example,
gpu_baremetal or/and host) an API can be used.

The get_platform.py is a tool to compare APIs in different platforms.

Change-Id: I902bc4fea048269eace6e9f3f4a8e93f3adb7f87
2024-02-07 07:28:38 -05:00
Deepak Mewar 6f7273fda5 Added amdsmi cpu family & cpu model
- Updated header and source files
- Updated python interface
- Generated python wrapper for updated header
- Updated the CLI to have cpu family & cpu model
  as part of metric table

Change-Id: Iea440251797270d5d29ffe883b0ad6db790be658
2024-02-06 18:46:27 -05:00
Maisam Arif 88192d8b6b SWDEV-436533 - Cache Info Struct Update
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic640fa657cdcc32d7b00ff78fc9452ec7e05dd07
2024-02-05 16:51:04 -05:00
Maisam Arif 59d885a9ca Fixed gpu_metric and cache cli checks
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic71e2b50dfa8fc106a17079842a7564a8e24b69d
2024-02-01 05:47:18 -05:00
Oliveira, Daniel 55734d2d7a fix: [rocm/amd_smi_lib] header cleanup Remove non-unified headers
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards

Code changes related to the following:
  * '_get_gpu_metrics_' APIs
  * Functional tests

Change-Id: I2dd2ecde11c1d77e343e0ae0e10aeb9120ae9b99
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-01-26 10:38:48 -05:00
Charis Poag 34bd26c68e Fix metric type error output + re-align with ROCm SMI metrics
Changes:
* [CLI] Provide fix for "/opt/rocm/bin/amd-smi metric
TypeError: '>' not supported between instances of 'str' and 'i"
--> Python API was updated, CLI needed to reflect these changes
* [API] Updated amdsmi.h's with ROCm SMI
--> Incorrectly added mem_bandwidth_acc & mem_max_bandwidth
--> Realigned wrapper with updates
* [Test] Added metrics not shown in gpu_metrics_read.cc

Change-Id: Ia3a172377fd5a582254dd5a46d81dbec7e763cd9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-01-24 21:23:40 -06:00
Bill(Shuzhou) Liu 0b67c2ccc4 Unified API
amdsmi_get_link_metrics() and amdsmi_get_pcie_info()

Change-Id: Iea060e449813b842236243b772e8809497ce98fe
2024-01-24 18:27:20 -05:00
Maisam Arif c400a22d4d 24.2.0 Version update
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ied7c24d63ca38c2e5ea5eca6b411e0156f61a403
2024-01-24 11:13:02 -06:00
Maisam Arif c48c989bbc 24.1.0 Version update
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ibfe92d199b10dc48ece85dfdeda1041f5ea98626
2024-01-24 12:09:48 -05:00
Deepak Mewar 5d0b479661 amdsmi library updated for esmi error status mapping to amdsmi
Change-Id: I7e4dd146a1a9af496556efcf811b2e1ed565b09e
2024-01-16 11:41:22 -06:00
Deepak Mewar a0c95e855b amdsmi library updated for metric table structure
Change-Id: Ie8a9840a9020282599dd413e964d86bfb8850f6a
2024-01-16 11:41:22 -06:00
Deepak Mewar 9f3a6dbd29 amdsmi library and sample code updated for amdsmi_get_metrics_table
Change-Id: Ie03c556f5c38fe4a0365743d3a94220e3aa62b23
2024-01-16 11:41:22 -06:00
Bill(Shuzhou) Liu 5a6b5d2a0a Use the same mutex as rocm-smi
Share the same mutex as rocm-smi implementation. Handle the crash
when a user is not in render group.

Change-Id: I486b26569f9b523b41bbdaf95d51f4a730978cfd
2024-01-15 13:12:49 -05:00
Charis Poag 5ff5af0b5a Fix GPU metric tests & cleanup test output
- CLI: Added average_power to display if current_power is empty
    - CLI: fixed PCIe current_speed not displaying GT/s
    - ROCm API: 1.3 & 1.4
                -> commented out setting avg clocks to current clock value
(leave as max uint value, not re-assign; these are not same values)
                    -> commented out setting current_socket_power = average_power
(leave as max uint value, not re-assign; these are not same values)
                    -> For all non-array clocks, placed value in first
                        array[0] to keep outputs consistent
                    (helps xcd calc)
      - ROCm API: rsmi_dev_metrics_curr_gfxclk_get fixed to count
        XCDs using backwards compatible rsmi_dev_gpu_metrics_info_get.
      - ^ Fixes XCD count overall + assigning clock[0] in 1.3 to curr
        freq
      - AMD SMI API: amdsmi_get_gpu_metrics_info() initialized all new
        1.5 metric values for all lower metric tables
      - AMD SMI API: wrapper -> fix is here + returns correct AMD SMI return
      - AMD SMI API: wrapper -> now displays amdsmi return status as
        string in logs
      - gpu_metrics_read.cc -> now has better overview of backwards
        compatible output
      - gpu_metrics_read.cc -> Cleaned up output, added units, and
        display all array output

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id5b60ded5b0ed2cdf0f96ca72c79e356f0410960
2023-12-19 14:18:15 -05:00
Naveen Krishna Chatradhi 65eed73f4d amd-smi: fix cpu specific apis and header
1. provide prototype and documentation for esmi specific api.
   define structures and update classes as required
2. update cmake files as required and add esmi api to the
   amdsmi esmi integration example.

Change-Id: I753ec176f9b381e74c9646525dfd9075237bf8d9
2023-12-18 06:28:15 -05:00
Charis Poag 8f3861e1d9 Add vcn and jpeg activity
Changes:
    - Add new engine field vcn_activity (from 1.4/1.5
      gpu_metrics
    - Updated log output to enhance view of gpu_metric
      data as json pretty print
    - Added new fields provided in 1.5
    - Added unit overview in python API, CLI is WIP

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: I7d9f29e7ecc35dcd0697814c222cdd02b0d5518e
2023-12-15 22:18:46 -05:00
Bill(Shuzhou) Liu 59b510de2b Support max_num_cu_shared and num_cache_instance
Add above fields for cache info. Remove driver_date in CLI and
Remove the disable properties of cache.

Change-Id: I80672490908d9e32a149076cc37459fa56b8b0bf
2023-12-14 09:59:35 -05:00
Bill(Shuzhou) Liu de7e74f7db Collect compute partition devices under the same socket
The socket represents a physical device, and the partition devices
should belong to the socket. The partition devices are only
different in function id in BDF. Use the BD part of the BDF to
identify a socket.

Change-Id: I5d355a6f5db02faa7555b760a36c7351b8d8d835
2023-11-29 08:23:23 -06:00
Maisam Arif b54086a037 Change xgmi_physical_id to oam_id
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I35fb36ec0e9f72a7135d8bb9070dbdc0e956b93a
2023-11-22 12:16:38 -06:00
Maisam Arif 5b36b438b7 Refactor gpu_metrics usage in CLI
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I599878971ab94a768d008f046f2d303ad76fdb3b
2023-11-22 03:32:55 -06:00
Maisam Arif d790ebc62b Refactor gpu_metrics usage in libraries
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I763638d4b546bf49b234e823df81028c357e8f49
2023-11-22 03:32:15 -06:00
Bill(Shuzhou) Liu ac1ba33371 Add APIs for PM table and register table
Read the PM table and register table as the name value pair.

Change-Id: Ie44fe67a28af3341bd6beb90d809e90f280351ac
2023-11-20 12:31:18 -05:00
Maisam Arif 545e57d3e3 SWDEV-426130 - Updated firmware subcommand output
Corrected truncation
	corrected xgmi to ta_xgmi
	remapped smc(system management controller) to pm(power
management)

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I404cefa7b90a454d4f4b08f6490448b47cf32107
2023-11-14 11:56:43 -05:00
Deepak Mewar 0c790752ac modified local esmi functions called from amdsmi_init
for gtest compatibility

Change-Id: I627c9887a1f1e340c358f060818a1a7d74ce33f9
2023-11-10 15:50:42 -05:00
Maisam Arif 5dba2f3120 Updated License Dates
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Id6fd66b03c602232ecc1a063a534a15fe3a03f56
2023-11-07 03:57:08 -05:00
Bill(Shuzhou) Liu 56b246cc3c Support cache type in cache info
Add the cache type to the cache info.

Change-Id: Ic13ca9640b65d2b414eeebe7b884530f2036aac8
2023-11-02 04:53:38 -05:00
Deepak Mewar 28f6383639 Esmi Auxillary API wrappers removed from amdsmi library
that are called during amdsmi inititalization
    amdsmi_get_cpu_family,
    amdsmi_get_cpu_model,
    amdsmi_get_cpu_threads_per_core,
    amdsmi_get_number_of_cpu_cores,
    amdsmi_get_number_of_cpu_sockets

Added amdsmi_get_cpucore_info to amdsmi library

Change-Id: Ib88d580e1d85afdf578963247e585cfae05c58ad
2023-10-30 20:59:21 -04:00
Maisam Arif 2b4637ff9f SWDEV-410051 - Updates to board_info struct & CLI
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I8735d8965140ee5da0c35106b388af1dca87ec71
2023-10-27 16:52:56 -05:00
Maisam Arif 5018a57b62 Updated READMEs & Versioning for 6.0 Release
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Idadece3c1022ecba4291b96ddbe23112e27394de
2023-10-16 16:57:49 -05:00
Maisam Arif 1f8d9cb9ef Added memory & compute partitions to amd-smi lib
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: If3acea6ad281298f1f05785b2e6d8e70fae8d89b
2023-10-13 21:47:59 -04:00
Deepak Mewar ee890c5060 esmi: remove energy reporting, fix errors from clang compiler
Clang compiler reporting errors while generating python wrappers for esmi lib

Change-Id: I62352aba3b87f9a6b044c97af6b9fd649612b622
2023-10-13 14:45:25 -04:00
Bill(Shuzhou) Liu d92d4e4b38 Add new API for RAS related information
The API to get the EEPROM version and ECC schema.

Change-Id: Iee6b3c555541a33bf16bf9ac1fd60100dfff5643
2023-10-13 02:06:14 -04:00
Galantsev, Dmitrii 6d72d65c48 Merge rocmsmi/amd-staging into amd-dev 20231010
Change-Id: I492562094a004eb78b2cc2b52d14d013d9f97112
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-11 18:58:12 -05:00
Galantsev, Dmitrii 1b606acf73 Fix amdsmi.h and update wrapper
Having an unnamed struct confuses our wrapper generator.
Adding a name solved it.

Change-Id: Iab3e73317fb21fb3667beef04878d4f3da96eadf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-10 17:58:25 -05:00
Bill(Shuzhou) Liu 6ca95c1a2d Add support to XGMI physical id
Get XGMI physical id from sysfs.

Change-Id: Ifd9e431bc2fbfd759d888a71b99046a5eb07b6ed
2023-10-10 09:29:05 -04:00
Charis Poag 31a1fcce7d Add rsmi_dev_power_get
* Updates:
  - [API] Added rsmi_dev_power_get(uint32_t dv_ind,
                                   uint64_t *power,
                                   RSMI_POWER_TYPE
                                   *type)
          provides generic get to average or
          current power & provides backwards
          compatibility
  - Added a utility function to get MonitorTypes
    (monitor_type_string(type)) &
    RSMI_POWER_TYPE (power_type_string(type))
    strings
  - [Tests] Added rsmi_dev_power_get tests and
    provided better verification of return values for
    all power APIs
  - [Tests] Updated power outputs to show correct
    units
  - [example] Now uses avg, current, and generic
    power functions with type output response

Change-Id: I5ca06ca37fd5f61e100f2835b664d6cdd1ca42e6
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-10-10 00:34:19 -05:00
Deepak Mewar 192fb538be added metric table wrapper APIS & test code
Change-Id: I24207b3c32d7294337140a1f5108b81f3bf33580
2023-10-10 00:03:11 -04:00
Oliveira, Daniel 4e4ebde640 rocm_smi_lib: Fix Modernize and refactor gpu_metrics
Adds support for 'gpu_metrics_v1_4' and new counters

Code changes related to the following:
  * rsmi gpu_metrics APIs
  * rsmi gpu_metrics Logs
  * The new gpu_metrics are now part of the Device

Build changes related to the following: None

Change-Id: Ie748e977cd0a01c6a2fb82260014c0699605dbb3
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2023-10-09 21:43:22 -05:00
Charis Poag b251bb0c9f Rename NPS -> memory partition + compute partition node fix
* Updates:
        - rocm_smi_lib + CLI:
          Rename all "NPS mode" -> "memory partition"
          related files/functions/API/CLI to align with correct
          technical naming
        - rocm_smi_main: fixed identifying primary card's unique id
          utilize rsmi_dev_unique_id_get to map which
          KFD nodes belong to it
        - rsmi_dev_*_partition*: now have better logging output
        - compute partition tests:
          Added 20 sec delay for workaround until GPU
          busy is confirmed as the issue
        - CPPLint fixes/formatting
        - [Example] Moved all endl to "\n" for efficiency
        - [Example] Added Edge & Junction temperature examples
        - [Example] Added rsmi_minmax_bandwidth_get() example - WIP

Change-Id: Ida6db6fda7e0ac9d696a34cb15b4746e69d58d51
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2023-10-06 11:51:09 -04:00
Bill(Shuzhou) Liu 1a233f93fb APIs for the cache level and size
Read the cache level and size from topoogy sysfs file.

Change-Id: Id3c558c95bcb79139a19e4adbaa7ff333d06098f
2023-10-05 11:10:54 -05:00
Maisam Arif 572bf563d1 Added driver_name to amdsmi_cli tool
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I8f3d52e0b23298443b2b16afec418cbbbc5f77e0
2023-10-04 08:54:19 -04:00
Maisam Arif fadf1b6cc9 SWDEV-410230 - Added slot_type to amd-smi static --bus
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I2006a3525a8aa9091bf54501461d364f7237f00f
2023-10-02 10:15:34 -04:00
Bill(Shuzhou) Liu 9eccf20f0c Get PCIe slot type
Add API to get the PCIe slot type.

Change-Id: If6894af53894c524d61c7586c59768541bbf0ac6
2023-09-27 23:31:09 -04:00
Maisam Arif 95337c88fc Added sleep state to amd-smi metric --clock
Change-Id: Idb5fbc84a787ef1affdf0449b6dd77ab6e50e91d
Signed-off-by: Maisam Arif <maisarif@amd.com>
2023-09-26 15:21:25 -05:00
Galantsev, Dmitrii 21dcf6d66c SWDEV-423796 - Resolve stack smashing issue
Inconsistency between struct fields caused stack smashing

Change-Id: Ib06d67723e062d4306420854ba7ab45fb252ffe3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-09-25 11:24:55 -05:00
Galantsev, Dmitrii 31cc2eecfb Merge remote-tracking branch 'rocmsmi/amd-staging' into HEAD
Change-Id: I0661926c10eef2bc32b83d9a63a3a6eb6991e781
2023-09-25 04:35:53 -05:00
Maisam Arif 25b055014d Updated tool & lib versions & README.md
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic41a36bcfa988ce9c8304157593012752857e919
2023-09-25 02:02:22 -05:00