Граф коммитов

1307 Коммитов

Автор SHA1 Сообщение Дата
Joe Narlo 73f909cd8b SWDEV-495316 [AMDSMI] In amdsmi.h, change typedef amdsmi_accelerator_partition_profile_t to match definition in Confluence
Move memory_caps defintion and correct the number in reserved to match Confluence

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: Id94144f4b3d2d3d7b4d7327211ffc1957ffd0a93


[ROCm/amdsmi commit: 54462ab447]
2024-10-31 12:48:48 -04:00
adapryor a33cdd7da6 [SWDEV-446215] Update cmake to put test libs in proper lib dir
Change-Id: I2e91b904b3f869cdba717d872c10d799d0260c30
Signed-off-by: adapryor <Adam.pryor@amd.com>


[ROCm/amdsmi commit: 6e01df00ca]
2024-10-29 16:07:58 -04:00
Charis Poag 6e0b0792ab [SWDEV-463406] Update sample rate + align metric output
Changes:
- Corrected max speed users can sample from FW/driver
  is 100 ms
- Added warning to amdsmi_get_violation_status()
  call on delay required 100ms to sample
- Removed guest support, this API will not be supported
- Updated CLI `amd-smi metric --throttle` outputs from
    XXX_active -> XXX_status
    XXX_percent -> XXX_activity
  to align with host
- Changelog updated

Change-Id: Ib30dd35dcc04ff67904ca82c86a55a16689df226
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 0ceca28f41]
2024-10-23 17:36:35 -04:00
gabrpham 118ce35c67 SWDEV-478748 Changed TestPciReadWrite Test Failure message to Warning
TEST FAILURE message for `amdsmi_get_gpu_cpi_throughput` and
`amdsmi_get_gpu_pci_bandwidth` changed to WARNING to indicate that
pcie_bw and/or pp_dpm_pcie sysfs files may not be supported on respetive
devices.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I1ad6e15eceacb5a00b022458ee5fb21df9d845c7


[ROCm/amdsmi commit: 00b3184e9f]
2024-10-18 16:32:57 -05:00
gabrpham 072e67c9c3 [SWDEV-490187] reset gpu partition were removed
The reset gpu partition support for both compute and memory were removed

Code changes related to the following:
  * amdsmi_reset_gpu_compute_partition()
  * amdsmi_reset_gpu_memory_partition()
  * CLI

Change-Id: I372589074b4da172bedd39223edde18939e373ae
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/amdsmi commit: f5b7761ac7]
2024-10-18 16:22:26 -05:00
Justin Williams fcb033f780 [SWDEV-482058 / SWDEV-482971] Added setup.py install
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Change-Id: Ibad07d34dfb455043ce307fe036289f1d5c20a9a


[ROCm/amdsmi commit: 2e5b164c43]
2024-10-18 16:59:13 -04:00
Oliveira, Daniel d108c5ae2b [SWDEV-488526] BI-Direction Table mismatch
Implements DiscoverIOLinkPerNodeDirection() based on KFD Node infrastructure;
'/kfd/topology/nodes/*/io_links'

Code changes related to the following:
  * Internal implementation

Change-Id: Iccd84d1d69234dbeae4d4925f657e7e3bd801106
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>


[ROCm/amdsmi commit: 25bcf6af2a]
2024-10-17 15:27:09 -04:00
gabrpham d0ad17d9d5 [SWDEV-488846] Removed '--ecc' option from 'amd-smi monitor' when platform is VM
Change-Id: I8f5d7771cbfac3fe5f52dbccbd9f28020adb5f6f


[ROCm/amdsmi commit: 27b5a35d65]
2024-10-16 10:34:19 -04:00
gabrpham 4b461904b2 [SWDEV-486872] Removed '--ras' from static command when platform is VM
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I0b03f168d7011428cfea3ab303865f4eaeea78ac


[ROCm/amdsmi commit: eb9116e8c2]
2024-10-16 09:29:24 -05:00
Maisam Arif 628682645b [SWDEV-491466] Fix throttle metrics CLI on VM
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I41166df4d155ec1d7d5f30b51dd9e0e02e655eb9


[ROCm/amdsmi commit: 9a0d56fea8]
2024-10-16 09:14:25 -05:00
Joe Narlo 06ef0819a3 SWDEV-487604 [AMD SMI][Unified Header] integration_test.py is failing with unified header
The script generator.py was not handling all of the anonymous and unnamed structures.
Logic was added to correct the errors seen in the script amdsmi_wrapper.py

Removed adding _ to structure definitions

Change-Id: I51958d23b3da40ec67e883e13dc74feaeaf1d58e
Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: b5887c2f05]
2024-10-15 16:22:02 -04:00
Khader Basha Shaik 806d1a25c3 amdsmi [CPU]: Add implementation to get cpu handles and core handles API
- Update the API names, parameters to return cpu handles and core
handles in the system.
  - Update the amdsmi_wrapper.py.
  - Update the amdsmi_interface.py to use the processor handles and
    core handles API.

Change-Id: Ie24f62f345864f8b6773fdb3c6369993bca7e25b


[ROCm/amdsmi commit: 8308ede9e8]
2024-10-14 05:41:19 -04:00
Jeremy Newton 5c89fc37aa goamdsmi: Use CMAKE_INSTALL_LIBDIR
Match libamd_smi and don't hardcode to "lib", so distros can customize
the library location.

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I0d2ff761975529fc06776c75cefea6907ec1ee8f


[ROCm/amdsmi commit: dd8795b099]
2024-10-10 15:12:35 -04:00
Maisam Arif 5e3d644769 Corrected clean local data partition indexing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ib0eeb065f160fccd3c3f4a2d13f0869af01a74ae


[ROCm/amdsmi commit: 27a48e69d8]
2024-10-10 10:54:45 -05:00
Maisam Arif 0368ce662d [SWDEV-447451] Fix attribute error for set/reset on Linux Guest
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5d55bef44d2eea75c33ba489a57544976900c4a4


[ROCm/amdsmi commit: 4fcf281f1d]
2024-10-09 12:59:19 -05:00
Charis Poag 5278e0c290 [SWDEV-463406] Add volation_status current counter/accumulated values
Changes:
  - amdsmi_violation_status_t now includes current accumulated/counter
   values
  - Tests/wrapper now include added values
  - Removed ASIC references in header for host/bm alignment
  - Fix violation_status->per_hbm_thrm /
    violation_status->active_hbm_thrm
    calculations.

Change-Id: Ic86a7cbad5198a41018f82f6b588b83158d9ba0b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 5eff39915b]
2024-10-04 15:56:01 -04:00
Maisam Arif d759e8d704 Udpated market name
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I71948b185b6ac60610fedf2d48dd9c95c26e5777


[ROCm/amdsmi commit: e402fe7f36]
2024-10-02 14:24:03 -05:00
Maisam Arif 014d0a4e96 [SWDEV-488819] - Backward Compatibility Disclaimer
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8b00d2009e3d01da134ac21ddcb0994357d76a54


[ROCm/amdsmi commit: 30f6a114e1]
2024-10-01 14:57:23 -05:00
Maisam Arif ffcf02ba63 Corrected throttle status value check
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2d75108c64c3ca3e290be1dd5b8c1435c5576f91


[ROCm/amdsmi commit: b233db729b]
2024-09-30 13:40:32 -05:00
Maisam Arif ee79530a3c Bump Version to 24.7.0.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ife9277f6abf64ed862e11e12a6472c6e6ea4d68f


[ROCm/amdsmi commit: a266d602c5]
2024-09-27 18:55:19 -05:00
Galantsev, Dmitrii 76c1321d87 CMAKE - Fix version
Change-Id: Ieefdd4c64ae657a53f1f5fd9a7fc94b3d2c899c2
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 88ed9e2f09]
2024-09-27 18:34:16 -05:00
gabrpham 5ca4c2e976 Added amd-smi partition as preliminary command.
new command includes following arguments:
  - current - display the current partition information for the selected
    gpu(s)
  - memory - display memory partition information for the selected
    gpu(s)
  - accelerator - display accelerator partition information for the
    selected gpu(s)
additional functionality will be added as more partition APIs are added.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ica86160139002ef5213d6d4b0e390670aeef01c8


[ROCm/amdsmi commit: 4e2fc2d604]
2024-09-27 17:05:04 -05:00
Maisam Arif 3e1f707eb7 Adjusted throttle unit logic in amdsmi_commands.py
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Icce949ff93f45c9751f43df0a80614fd377318fa


[ROCm/amdsmi commit: 2c8e2060cb]
2024-09-27 13:26:58 -05:00
Charis Poag 7a35c805b0 [SWDEV-422195/SWDEV-440985] GPU metrics 1.6
Changes:
    - Added new GPU metrics:
      1) Violation status' (ex. PVIOL/TVIOL) accumulators
      2) XCP (Graphics Compute Partitions) statistics
      3) pcie other end recovery counter
    - CLI/API/tests changes were made accordingly

Change-Id: I589b9b1f570f25dda12d95bb501feca85da8b3bb
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 3a4abbd8c0]
2024-09-27 12:04:21 -05:00
Lang Yu 94d349573d SWDEV-463405: Add amdsmi_get_link_topology_nearest support
amdsmi_get_link_topology_nearest() is used to retrieve
the set of GPUs that are nearest to a given device
at a specific interconnectivity level.

Code changes related to the following:
    * API
    * CLI
    * Unit tests
    * Examples

Header Unification Change: "/amdsmi/+/1122408"

Change-Id: Id0317797c652c267742513936d321677793ec634
Signed-off-by: Lang Yu <lang.yu@amd.com>


[ROCm/amdsmi commit: 7a557b1c50]
2024-09-26 16:43:27 -05:00
Ranjith Ramakrishnan e7415cb9a9 Remove package provides field from RPM and DEB package
The provides tag is required when the package provides a virtual package.
Package name along with version will be provided by default and the provides tag is not required for this.

Change-Id: I6d42cd1a6e2247e33708a1fa2627897e86099815


[ROCm/amdsmi commit: f00a03ed2b]
2024-09-26 17:42:49 -04:00
Ryo Ficano 701e69686b [SWDEV-482963] [Test updates] Add new tests for p0 items - BM v2
Updates:
- Added tests for these API calls:

amdsmi_get_socket_handles
amdsmi_get_processor_type
amdsmi_get_clk_freq
amdsmi_get_gpu_process_info
amdsmi_get_gpu_ras_block_features_enabled
amdsmi_get_gpu_ecc_count
amdsmi_get_gpu_memory_usage
amdsmi_get_gpu_vendor_name
amdsmi_get_utilization_count

- Added amdsmi_init() and amdsmi_shut_down() before and after each test.
- Updated README and removed all pytest references.

Change-Id: Ida0c165a466571b1df36c413161bd95c070f6ff1
Signed-off-by: Ryo Ficano <Ryo.Ficano@amd.com>


[ROCm/amdsmi commit: 9979be8512]
2024-09-26 14:08:13 -04:00
Justin Williams 8eee7b2d1e Removed Post Install PyYAML and Pip Upgrades
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Change-Id: I25f0e8087a212fd29d33a8a40303436279789029


[ROCm/amdsmi commit: 807f1e3111]
2024-09-25 18:20:23 -05:00
muthusamy c1e6f3a1a7 amdsmi: Optimizing go shim to default pick amdsmi
Optimizing go shim to default pick amdsmi and other code cleanup in goshim.

Signed-off-by: muthusamy <muthusamy.ramalingam@amd.com>
Change-Id: I0e6a2d28404cbb751d2b6e90c793b359fec9be13


[ROCm/amdsmi commit: e037cde86b]
2024-09-25 16:30:02 -04:00
Bill(Shuzhou) Liu c20427e1f0 amdsmi cannot read power cap more than 10 characters
Extend the default read array size.

Change-Id: I2739981873cb3c360661e3ef5f6e70d4f36cb0e8


[ROCm/amdsmi commit: 69109de8d3]
2024-09-24 14:31:40 -04:00
Harkirat Gill c8da426b98 Updated error message when driver modules not loaded
Small change to add sudo modprobe amdgpu/amd_hsmp suggestion if modules are not loaded. Requested per Maisam, will close https://github.com/ROCm/amdsmi/issues/45

Change-Id: Ia7ffcc99df18296c5c682f2082ff8dd8f007d557


[ROCm/amdsmi commit: 3660724a08]
2024-09-23 23:56:05 -04:00
Maisam Arif c0ed174976 Update spacing in amdsmi.h
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I6147b8e545fdb50f3d3ef37f4df994e7cd9c3046


[ROCm/amdsmi commit: dfbd0ab8ba]
2024-09-23 22:53:13 -05:00
Maisam Arif 457619bfad [SWDEV-469278] - Lowered PyYAML dependency
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Icfee09b84cf1071ec82b65fc2877be69e0283489


[ROCm/amdsmi commit: 09c9574454]
2024-09-20 18:03:00 -04:00
gabrpham 2aa8e8c755 Corrected partition changes in header and wrapper
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Iafd7de8f08924873da841ee6eca62100a17b2b6c


[ROCm/amdsmi commit: 8bc4abc88b]
2024-09-20 17:01:55 -05:00
Dmitrii Galantsev f59f2caf2c Revert "[SWDEV-469278] Lowered PyYAML post install script dependency"
Revert submission 1125402

Reason for revert: Packaging a tar archive of 3rd party sources
Reverted Changes:
I8908451c0:[SWDEV-482058] Updated Packaging for offline insta...
I764c8bf01:[SWDEV-469278] Lowered PyYAML post install script ...

Change-Id: I3886b5370e352fc33a249c4657d7ed0c1ee75baf


[ROCm/amdsmi commit: 6beec5f3ec]
2024-09-20 16:42:29 -04:00
Dmitrii Galantsev 4fbcdc5862 Revert "[SWDEV-482058] Updated Packaging for offline installs"
Revert submission 1125402

Reason for revert: Packaging a tar archive of 3rd party sources
Reverted Changes:
I8908451c0:[SWDEV-482058] Updated Packaging for offline insta...
I764c8bf01:[SWDEV-469278] Lowered PyYAML post install script ...

Change-Id: Ib32fa5b9351b1cfc2a8d453e744c0d00209359eb


[ROCm/amdsmi commit: 9924574cbe]
2024-09-20 16:42:29 -04:00
muthusamy aee9386a04 amdsmi: Adding GO wrappers for amd_smi_exporter
Adding GO wrappers as part of amdsmi library, so that
amd_smi_exporter can fetch the cpu, gpu data directly from amdsmi library.

Signed-off-by: muthusamy <muthusamy.ramalingam@amd.com>
Change-Id: I8fba57c1d20d21758a1aed38ed2c00c9d5c9ecfa


[ROCm/amdsmi commit: 66c98fd722]
2024-09-20 04:08:27 -04:00
Maisam Arif 01b5b4a93c [SWDEV-456049] & [SWDEV-442181] Fix early exiting loop while enumerating GPU stats
Skip missing vram_str_path and sdma_str_path if sysfs files not created when passing some, but not all, GPUs to a docker image.

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I83b7a62331672810688a94e4023b0ae740436e6d


[ROCm/amdsmi commit: b40b405332]
2024-09-20 03:01:22 -05:00
Maisam Arif cf183c9d10 Bump Version to 24.6.5.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I93d6d397bd8d647f472017c28101dabe9ff8199b


[ROCm/amdsmi commit: 6a76f8a705]
2024-09-20 02:53:45 -05:00
gabrpham 0fd0b46b7f Moved partition_id from static --asic-info to static --partition.
partition_id also removed from the `amdsmi_asic_info_t` struct and
supporting API has been added for querying partition information.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Id5a6291a77d11bb97a1c7a200fc465898e86e081


[ROCm/amdsmi commit: c9a489d437]
2024-09-20 03:48:42 -04:00
Maisam Arif 82096d7f74 Moved KFD information to separate structure and API
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: If6eaea589edc704cf408d6391b5f2154134035e7


[ROCm/amdsmi commit: 3b7f661e71]
2024-09-20 03:48:42 -04:00
Maisam Arif f1eae9f051 [SWDEV-482058] Updated Packaging for offline installs
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8908451c013fc944645b5b5df3104a2ff73e72bd


[ROCm/amdsmi commit: 2cfae06560]
2024-09-20 00:55:48 -04:00
Justin Williams 90c854ad72 [SWDEV-469278] Lowered PyYAML post install script dependency
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Change-Id: I764c8bf01e6cb6acb0b3fc1db396707099e5ed12


[ROCm/amdsmi commit: f2f02aa317]
2024-09-20 00:55:48 -04:00
Charis Poag 34ff05c5bb Fix python unittest not installing amd-smi-lib-test package install
Moving to TESTS_COMPONENT allows files to be placed
within the amd-smi-lib-test package.
Previously, was put within the amd-smi-lib package,
which will never be triggered for installation with
latest changes.

Change-Id: Id49dbe69bfc7d5bd1af403c28b946fe1edf64d8e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: ede0e6318d]
2024-09-18 19:25:48 -05:00
Charis Poag b6a68dd877 Fix amd-smi CLI calls returning TypeError
$ amd-smi version
TypeError: unsupported operand type(s) for |: 'type' and 'type'

---------------
Python3 --versions lower than 3.10
do not support str | None

Using typing Optional and Union, we can create equivalent logic for
str | none
and
str | list | none

Change-Id: I1f4a7ab67333914b33639dc62652881e1127411e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 5c778cadf1]
2024-09-18 16:59:12 -05:00
Harkirat Gill e73ae3d79d Fix for GitHub Issue #24: Update Event Stop Behavior
amd-smi event is failing to exit as it waits for all threads to complete before exiting. Each thread has to listen for a maximum of 10 seconds prior to exiting in the current implementation.

Lowered individual listen time for _event_thread allowing for a quicker exit while still capturing all events (Looped until escape sequence detected).

Added logging for escape character, not sure if needed but helps confirm that key press was registered.

Change-Id: I916608754798f966980a558342c7c62693252d7f


[ROCm/amdsmi commit: d263b53797]
2024-09-18 14:54:40 -04:00
gabrpham fbb1071046 [SWDEV-448738] Added rocmsmi extremum command as 'set -L'
Change-Id: I997c630bd20cc61673813a2301eb5e3002619a32
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>

Change-Id: Ifa884303f9a0fa058af093a23f5be449bba54f29


[ROCm/amdsmi commit: b7f779182d]
2024-09-18 14:51:01 -04:00
Juan Castillo 487fd5e1fd [SWDEV-482966/ SWDEV-482967] Removing pytest dependency + install path change
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I7aace93fcad18d67443e6849c10a1fbbc65d0fa8
Signed-off-by: Juan Castillo <juan.castillo@amd.com>


[ROCm/amdsmi commit: ac593f9fa0]
2024-09-18 00:27:00 -04:00
gabrpham d04eadec17 Removed _validate_positive function and replaced with _positive_int or _not_negative_int as appropriate
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I01effcdf9bae31fd8bc926c5d4bdf58274838618


[ROCm/amdsmi commit: 0d4b332fe4]
2024-09-17 18:37:16 -04:00
Maisam Arif 53130a9c3c Fixed amdsmi_get_utilization_count() wrapper generation
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ifd59fca042c4b3b0fc53e100b6892c6b4f7b3e95


[ROCm/amdsmi commit: 639daa3d90]
2024-09-17 16:34:42 -04:00