Граф коммитов

351 Коммитов

Автор SHA1 Сообщение Дата
Maisam Arif 092908daee Bump Version to 24.5.1.0
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I842e223b78f337a39098f652fa6e7ef51948fbaf
2024-04-05 02:31:08 -05:00
Oliveira, Daniel 08e2e21bab fix: [SWDEV-442525] [rocm/amd_smi_lib]
Fixes gpu_process_list

Code changes related to the following:
  * amdsmi_get_gpu_process_list()
  * CLI
  * Examples
  * Unit tests
  * Changelog
  * Readme
  * rocm_smi_lib commit: 677433b367

Change-Id: I9210fbca7a5da92d0a8b472b72ca82597c8e4fb5
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-03-27 16:48:24 -05:00
Maisam Arif 51b3f8cccb SWDEV-452739 - Add CEM slot type to amd-smi
Updated CHANGELOG.md and re-added spaces after bolded lines

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic728b3e9b083c62fe4c9791b8ede991f5dacc1ca
2024-03-27 02:01:25 -04:00
Bill(Shuzhou) Liu e4085c6414 Get and set the XGMI PLPD
Update the API and CLI to support XGMI Per-Link Power Down Policy.

Change-Id: Iaf04a771eb8bb0829a5b3088d803a7355a8dfd0b
2024-03-26 01:48:14 -05:00
Oliveira, Daniel 1310c767ce fix: [SWDEV-448201] [rocm/amd_smi_lib]
Adds Add PCIE Errors

Code changes related to the following:
  * amdsmi_get_pcie_info()
  * CLI
  * examples

Change-Id: Ie0b7053e77c88fb18309c16e74bce75d862c45a9
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-03-24 23:33:32 -04:00
Bill(Shuzhou) Liu 108e6d4ae6 Set and get DPM policy for GPU device
Add new APIs to set and get dpm policy for the GPU device.

Change-Id: I26fa49cd17d0ce66bda3446c38945a6cf35717ff
2024-03-12 10:32:31 -04:00
Bill(Shuzhou) Liu c489cb8f3f Add support for deferred RAS errors in API
The API will support the deferred errors

Change-Id: I221a146f09fefde1fc31e5f746d0870e07c93561
2024-03-04 22:46:44 -05:00
Maisam Arif 69caba8727 Bump Version to 24.4.0.0 & Corrected argument checks for set subcommand
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I651f8ca652c764f30845503dd869f435f728d5ba
2024-02-23 20:47:19 -06:00
Bill(Shuzhou) Liu db33cda0c1 Unify the amdsmi_get_pcie_info python interface
Make the python interface consistent with the C interface.

Change-Id: Idda08f888947c757e475d5a024b0ec3d8e1d846a
2024-02-22 03:33:59 -05:00
Maisam Arif f58613561c Refactor ESMI Initialization and Argument Parsing
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Iefab3a8110e0d3c525ee0cef1bdef9101550e9de
2024-02-21 19:02:14 -05:00
Deepak Mewar 84608807da Fix for multiple hsmp freq sources not reported on some setups
Change-Id: I8afe7076bd7790cf408ef104c50ac8d258b7d3fc
Signed-off-by: Maisam Arif <maisarif@amd.com>
2024-02-21 06:30:03 -06:00
Maisam Arif 703fdb0ed2 Aligned cache property enum with Host
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ie64a33f55c9a9a7cc8c806419509897351f37c70
2024-02-20 05:48:53 -06:00
Maisam Arif 61f8888488 24.3.0 Version update
Change-Id: I936c896117ad64d06ea919a8b7bd6ba4cc388592
Signed-off-by: Maisam Arif <maisarif@amd.com>
2024-02-15 17:21:24 -05:00
Maisam Arif 77710921a4 Align list and cache_info to Host
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I4fa55b360b74d5a202d0b9b4eb7aee660b0a1bcf
2024-02-15 01:47:59 -05:00
Deepak Mewar 34ccbb5d1b Updated amdsmi header for ESMI doxygen formatting
Referencing htttps://github.com/ROCm/amdsmi/pull/10

Change-Id: I516e3643130db8a4213aee7dfcaca27363e3171e
Signed-off-by: Maisam Arif <maisarif@amd.com>
2024-02-14 02:03:05 -06:00
Oliveira, Daniel 78074d7d77 fix: [rocm/amd_smi_lib] amdsmi_get_gpu_activity gfx/memory activity does not update
Checks and forces rereading gpu metrics unconditionally

Code changes related to the following:
  * Device::dev_log_gpu_metrics()
  * amdsmi_get_gpu_metrics_header_info()
    Removed unintentionally during work on 'header cleanup Remove non-unified headers'
  * Examples
  * Unit tests

Change-Id: I83710e173c0f7102d0b7f865c18474c979a95cd8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-02-13 10:15:17 -06:00
Maisam Arif f831cf49f7 Renamed amdsmi_get_metrics_table to amdsmi_get_cpu_metrics_table
Renamed structs to be more conistent with what they are calling

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I6f2be2fcb76f004aa592f0dad8545565700ccd4b
2024-02-12 16:30:18 -06:00
Bill(Shuzhou) Liu 86d025daaa Add @platform doxygen alias
The @platform alias will describe which platform (for example,
gpu_baremetal or/and host) an API can be used.

The get_platform.py is a tool to compare APIs in different platforms.

Change-Id: I902bc4fea048269eace6e9f3f4a8e93f3adb7f87
2024-02-07 07:28:38 -05:00
Deepak Mewar 6f7273fda5 Added amdsmi cpu family & cpu model
- Updated header and source files
- Updated python interface
- Generated python wrapper for updated header
- Updated the CLI to have cpu family & cpu model
  as part of metric table

Change-Id: Iea440251797270d5d29ffe883b0ad6db790be658
2024-02-06 18:46:27 -05:00
Maisam Arif 88192d8b6b SWDEV-436533 - Cache Info Struct Update
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic640fa657cdcc32d7b00ff78fc9452ec7e05dd07
2024-02-05 16:51:04 -05:00
Maisam Arif 59d885a9ca Fixed gpu_metric and cache cli checks
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ic71e2b50dfa8fc106a17079842a7564a8e24b69d
2024-02-01 05:47:18 -05:00
Oliveira, Daniel 55734d2d7a fix: [rocm/amd_smi_lib] header cleanup Remove non-unified headers
Cleans up individual gpu metric APIs which will be implemented according to 'unified-headers' standards

Code changes related to the following:
  * '_get_gpu_metrics_' APIs
  * Functional tests

Change-Id: I2dd2ecde11c1d77e343e0ae0e10aeb9120ae9b99
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-01-26 10:38:48 -05:00
Charis Poag 34bd26c68e Fix metric type error output + re-align with ROCm SMI metrics
Changes:
* [CLI] Provide fix for "/opt/rocm/bin/amd-smi metric
TypeError: '>' not supported between instances of 'str' and 'i"
--> Python API was updated, CLI needed to reflect these changes
* [API] Updated amdsmi.h's with ROCm SMI
--> Incorrectly added mem_bandwidth_acc & mem_max_bandwidth
--> Realigned wrapper with updates
* [Test] Added metrics not shown in gpu_metrics_read.cc

Change-Id: Ia3a172377fd5a582254dd5a46d81dbec7e763cd9
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-01-24 21:23:40 -06:00
Bill(Shuzhou) Liu 0b67c2ccc4 Unified API
amdsmi_get_link_metrics() and amdsmi_get_pcie_info()

Change-Id: Iea060e449813b842236243b772e8809497ce98fe
2024-01-24 18:27:20 -05:00
Maisam Arif c400a22d4d 24.2.0 Version update
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ied7c24d63ca38c2e5ea5eca6b411e0156f61a403
2024-01-24 11:13:02 -06:00
Maisam Arif c48c989bbc 24.1.0 Version update
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Ibfe92d199b10dc48ece85dfdeda1041f5ea98626
2024-01-24 12:09:48 -05:00
Deepak Mewar 5d0b479661 amdsmi library updated for esmi error status mapping to amdsmi
Change-Id: I7e4dd146a1a9af496556efcf811b2e1ed565b09e
2024-01-16 11:41:22 -06:00
Deepak Mewar a0c95e855b amdsmi library updated for metric table structure
Change-Id: Ie8a9840a9020282599dd413e964d86bfb8850f6a
2024-01-16 11:41:22 -06:00
Deepak Mewar 9f3a6dbd29 amdsmi library and sample code updated for amdsmi_get_metrics_table
Change-Id: Ie03c556f5c38fe4a0365743d3a94220e3aa62b23
2024-01-16 11:41:22 -06:00
Bill(Shuzhou) Liu 5a6b5d2a0a Use the same mutex as rocm-smi
Share the same mutex as rocm-smi implementation. Handle the crash
when a user is not in render group.

Change-Id: I486b26569f9b523b41bbdaf95d51f4a730978cfd
2024-01-15 13:12:49 -05:00
Charis Poag 5ff5af0b5a Fix GPU metric tests & cleanup test output
- CLI: Added average_power to display if current_power is empty
    - CLI: fixed PCIe current_speed not displaying GT/s
    - ROCm API: 1.3 & 1.4
                -> commented out setting avg clocks to current clock value
(leave as max uint value, not re-assign; these are not same values)
                    -> commented out setting current_socket_power = average_power
(leave as max uint value, not re-assign; these are not same values)
                    -> For all non-array clocks, placed value in first
                        array[0] to keep outputs consistent
                    (helps xcd calc)
      - ROCm API: rsmi_dev_metrics_curr_gfxclk_get fixed to count
        XCDs using backwards compatible rsmi_dev_gpu_metrics_info_get.
      - ^ Fixes XCD count overall + assigning clock[0] in 1.3 to curr
        freq
      - AMD SMI API: amdsmi_get_gpu_metrics_info() initialized all new
        1.5 metric values for all lower metric tables
      - AMD SMI API: wrapper -> fix is here + returns correct AMD SMI return
      - AMD SMI API: wrapper -> now displays amdsmi return status as
        string in logs
      - gpu_metrics_read.cc -> now has better overview of backwards
        compatible output
      - gpu_metrics_read.cc -> Cleaned up output, added units, and
        display all array output

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: Id5b60ded5b0ed2cdf0f96ca72c79e356f0410960
2023-12-19 14:18:15 -05:00
Naveen Krishna Chatradhi 65eed73f4d amd-smi: fix cpu specific apis and header
1. provide prototype and documentation for esmi specific api.
   define structures and update classes as required
2. update cmake files as required and add esmi api to the
   amdsmi esmi integration example.

Change-Id: I753ec176f9b381e74c9646525dfd9075237bf8d9
2023-12-18 06:28:15 -05:00
Charis Poag 8f3861e1d9 Add vcn and jpeg activity
Changes:
    - Add new engine field vcn_activity (from 1.4/1.5
      gpu_metrics
    - Updated log output to enhance view of gpu_metric
      data as json pretty print
    - Added new fields provided in 1.5
    - Added unit overview in python API, CLI is WIP

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Change-Id: I7d9f29e7ecc35dcd0697814c222cdd02b0d5518e
2023-12-15 22:18:46 -05:00
Bill(Shuzhou) Liu 59b510de2b Support max_num_cu_shared and num_cache_instance
Add above fields for cache info. Remove driver_date in CLI and
Remove the disable properties of cache.

Change-Id: I80672490908d9e32a149076cc37459fa56b8b0bf
2023-12-14 09:59:35 -05:00
Bill(Shuzhou) Liu de7e74f7db Collect compute partition devices under the same socket
The socket represents a physical device, and the partition devices
should belong to the socket. The partition devices are only
different in function id in BDF. Use the BD part of the BDF to
identify a socket.

Change-Id: I5d355a6f5db02faa7555b760a36c7351b8d8d835
2023-11-29 08:23:23 -06:00
Maisam Arif b54086a037 Change xgmi_physical_id to oam_id
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I35fb36ec0e9f72a7135d8bb9070dbdc0e956b93a
2023-11-22 12:16:38 -06:00
Maisam Arif 5b36b438b7 Refactor gpu_metrics usage in CLI
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I599878971ab94a768d008f046f2d303ad76fdb3b
2023-11-22 03:32:55 -06:00
Maisam Arif d790ebc62b Refactor gpu_metrics usage in libraries
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I763638d4b546bf49b234e823df81028c357e8f49
2023-11-22 03:32:15 -06:00
Bill(Shuzhou) Liu ac1ba33371 Add APIs for PM table and register table
Read the PM table and register table as the name value pair.

Change-Id: Ie44fe67a28af3341bd6beb90d809e90f280351ac
2023-11-20 12:31:18 -05:00
Maisam Arif 545e57d3e3 SWDEV-426130 - Updated firmware subcommand output
Corrected truncation
	corrected xgmi to ta_xgmi
	remapped smc(system management controller) to pm(power
management)

Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I404cefa7b90a454d4f4b08f6490448b47cf32107
2023-11-14 11:56:43 -05:00
Deepak Mewar 0c790752ac modified local esmi functions called from amdsmi_init
for gtest compatibility

Change-Id: I627c9887a1f1e340c358f060818a1a7d74ce33f9
2023-11-10 15:50:42 -05:00
Maisam Arif 5dba2f3120 Updated License Dates
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Id6fd66b03c602232ecc1a063a534a15fe3a03f56
2023-11-07 03:57:08 -05:00
Bill(Shuzhou) Liu 56b246cc3c Support cache type in cache info
Add the cache type to the cache info.

Change-Id: Ic13ca9640b65d2b414eeebe7b884530f2036aac8
2023-11-02 04:53:38 -05:00
Deepak Mewar 28f6383639 Esmi Auxillary API wrappers removed from amdsmi library
that are called during amdsmi inititalization
    amdsmi_get_cpu_family,
    amdsmi_get_cpu_model,
    amdsmi_get_cpu_threads_per_core,
    amdsmi_get_number_of_cpu_cores,
    amdsmi_get_number_of_cpu_sockets

Added amdsmi_get_cpucore_info to amdsmi library

Change-Id: Ib88d580e1d85afdf578963247e585cfae05c58ad
2023-10-30 20:59:21 -04:00
Maisam Arif 2b4637ff9f SWDEV-410051 - Updates to board_info struct & CLI
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I8735d8965140ee5da0c35106b388af1dca87ec71
2023-10-27 16:52:56 -05:00
Maisam Arif 5018a57b62 Updated READMEs & Versioning for 6.0 Release
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: Idadece3c1022ecba4291b96ddbe23112e27394de
2023-10-16 16:57:49 -05:00
Maisam Arif 1f8d9cb9ef Added memory & compute partitions to amd-smi lib
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: If3acea6ad281298f1f05785b2e6d8e70fae8d89b
2023-10-13 21:47:59 -04:00
Deepak Mewar ee890c5060 esmi: remove energy reporting, fix errors from clang compiler
Clang compiler reporting errors while generating python wrappers for esmi lib

Change-Id: I62352aba3b87f9a6b044c97af6b9fd649612b622
2023-10-13 14:45:25 -04:00
Bill(Shuzhou) Liu d92d4e4b38 Add new API for RAS related information
The API to get the EEPROM version and ECC schema.

Change-Id: Iee6b3c555541a33bf16bf9ac1fd60100dfff5643
2023-10-13 02:06:14 -04:00
Galantsev, Dmitrii 6d72d65c48 Merge rocmsmi/amd-staging into amd-dev 20231010
Change-Id: I492562094a004eb78b2cc2b52d14d013d9f97112
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2023-10-11 18:58:12 -05:00