Commit Graph

760 Commits

Author SHA1 Message Date
Su, Daniel 2ae703fed6 External CI: enable trigger for amd-mainline (#30)
[ROCm/rocm_smi_lib commit: 172707cbd3]
2025-03-26 08:24:51 -05:00
Castillo, Juan 3aa80ec0e4 SWDEV-518214: GPU Metrics 1.8 (#31)
* SWDEV-518214: GPU Metrics 1.8 (#31)

- Updates:
    - Adding the following metrics to allow new calculations for violation status:
        - Per XCP metrics gfx_below_host_limit_ppt_acc
        - Per XCP metrics gfx_below_host_limit_thm_acc
        - Per XCP metrics gfx_low_utilization_acc
        - Per XCP metrics gfx_below_host_limit_total_acc
    - Increasing available JPEG engines to 40. Current ASICs may not support all 40. These will be indicated as UINT16_MAX or N/A in CLI.

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/rocm_smi_lib commit: f69e65f7bd]
2025-03-20 18:07:32 -05:00
Zhang, Ava 6075f89576 Merge branch 'amd-mainline' into amd-staging
[ROCm/rocm_smi_lib commit: 7327e645c6]
2025-03-17 08:58:09 +08:00
Mallya, Ameya Keshava 19a9ff813d Added release trigger for further releases
[ROCm/rocm_smi_lib commit: 8fe9882c49]
2025-03-14 14:06:21 -07:00
Arif, Maisam 9ae7a0b7b1 [SWDEV-517717] Maintence Mode Notice (#20)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/rocm_smi_lib commit: 1416c2043d]
2025-03-09 14:23:33 -05:00
Kanangot Balakrishnan, Bindhiya 165cf24119 SWDEV-510419: Restore compute partition after memory partition test (#15)
Memory partition test was changing original compute partiton based
on default compute mode. Corrected this to set back to original
compute partition.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/rocm_smi_lib commit: d8de415960]
2025-03-09 14:23:33 -05:00
Charis Poag 5fc11e8325 [SWDEV-514998/SWDEV-511662] Fix tests for Guest and BM with static CPX config
Guest: Tests needed to account for not supporting changing compute
partitions.

BM: Tests need to account for invalid responses from Driver (due to
static CPX config).


[ROCm/rocm_smi_lib commit: 967493c39a]
2025-03-09 14:23:22 -05:00
Galantsev, Dmitrii 00ff814afc [SWDEV-508785] Bump version number to 7.6.0
Change-Id: I084f139802f73311f15c68f94bc98f631c7f2bd8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 9c82706fc1]
2025-03-09 14:23:22 -05:00
Charis Poag 8e841f22ac [SWDEV-504146] Fix Device Name
Changes: - Fixed Device Name (market name)
  - Added new API rsmi_dev_market_name_get()
  - Updated tests
  - Updated amdgpu_drm.h to match latest mainline kernel
  - Fixed subsystem ID to only show hex value (not subsystem name)
  - rocm_smi_lib now has a recommended requirement for libdrm
Change-Id: Ic438529e16c8c3dbbdd620da664918148c40c997


[ROCm/rocm_smi_lib commit: b951a65cf2]
2025-03-09 14:23:22 -05:00
Galantsev, Dmitrii 83f16ffa06 Fix warnings on CXX/linker flags (#12)
1) When `clang` is used as system compiler, libraries were built without respecting LDFLAGS. For example, this affected LTO flags, if any (and it only affected clang, not gcc).

2) Linker flags are registered as CXX flags, which produces warnings during compilation:
```
clang++: warning: -Wl,-z,noexecstack: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-znoexecheap: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,relro: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,now: 'linker' input unused [-Wunused-command-line-argument]
```

3) Clang does not support `-Wtrampolines` flag:
```
warning: unknown warning option '-Wtrampolines' [-Wunknown-warning-option]
```

4) No linkers support `noexecheap` anymore. `noexecheap` linker flag was a part of PaX patches to GNU ld, (which were dropped in 2017)[https://www.gentoo.org/support/news-items/2017-08-19-hardened-sources-removal.html]. Now ld/ld.lld/ld.gold don't support it and protection of heap is managed by NX bit. Therefore every compiler produces this warning:
```
ld.lld: warning: unknown -z value: noexecheap
```

Closes #210.

Co-authored-by: Sv. Lockal <lockalsash@gmail.com>

[ROCm/rocm_smi_lib commit: 59cbeb57d1]
2025-03-09 14:23:22 -05:00
Johar, Adel 957699f7ba Docs: Fix broken links, warnings and use automodule (#11)
- Fixes the broken links in rocm_smi.h
- Uses automodule instead of autofunction in docs/reference/python_api.rst
- Fixes some warnings during docs build
- Update some of the versions in requirements.txt

[ROCm/rocm_smi_lib commit: fc61e40506]
2025-03-09 14:23:22 -05:00
Mallya, Ameya Keshava d86308f726 Added !verify trigger
[ROCm/rocm_smi_lib commit: a938e743f0]
2025-03-09 14:23:22 -05:00
Kanangot Balakrishnan, Bindhiya 7bbbaeb020 [SWDEV-481004] Fix for incorrect gfx_version number (#8)
The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/rocm_smi_lib commit: 6337f7b05b]
2025-03-09 14:23:22 -05:00
Mallya, Ameya Keshava 6af7a0590d Fixed Workflow for updated KWS structure
[ROCm/rocm_smi_lib commit: 685c057acb]
2025-03-09 14:23:22 -05:00
Mallya, Ameya Keshava 7bfad6e604 Create kws-caller.yml
[ROCm/rocm_smi_lib commit: d5cbe04ecf]
2025-03-09 14:23:22 -05:00
Galantsev, Dmitrii 89913b7899 [SWDEV-495169] Update ROCm SMI CLI and Error handling (#3)
Issues include:

Update ROCm SMI displaying None or Not Supported to N/A
Update ROCm SMI displaying err msg to instead log err

Signed-off-by: Juan Castillo juan.castillo@amd.com
Change-Id: I1a2ce6e4f329666b5666664a7d7b4475d6c1cbc7

[ROCm/rocm_smi_lib commit: 898ae4ffc1]
2025-03-09 14:23:22 -05:00
James Xu 0918fa7fd4 [SWDEV-501108] Update Doxygen note on rsmi_dev_pci_id_get
- To address https://github.com/ROCm/rocm_smi_lib/issues/208
where use of fake BDFs for partitions can cause confusion. This note
is already in the comments of the function definition, but was not
updated in the function declaration.
- Fix broken formatting for the location table for PCIE coordinate fields
- Tracked in SWDEV-501108

Change-Id: Ic85439866cb836bb43acc52314a7f1d026c3215d


[ROCm/rocm_smi_lib commit: fdc8d73c64]
2025-03-09 14:23:21 -05:00
Choudhary, Rahul 42cc253bff Create rocm_ci_caller.yml init file to call shared workflow
[ROCm/rocm_smi_lib commit: 78f66f81de]
2025-03-09 14:23:21 -05:00
gabrpham c20d2d30ec Fixed reset event issues
Issues include:
	SWDEV-480250
	SWDEV-480255
	SWDEV-480248

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Icf12211e4b136f26fce18f09a7bf8b7e9cd20691


[ROCm/rocm_smi_lib commit: 76ac0808fe]
2025-03-09 14:23:21 -05:00
Charis Poag 125bdaf4f5 [SWDEV-496693] GPU metrics 1.7
Changes:
    - Added new GPU metrics:
      1) XGMI link status - Up/Down; 1 = up; 0 = down
      2) Graphics clocks below host limit (per XCP)
         accumulators -> used to help calculate a violation status
      3) VRAM max bandwidth at max memory clock
    - Updated rocm-smi --showmetrics to include new metrics.
    Units/values reflect as indicated by driver, may differ
    from AMD SMI or other ROCm SMI interfaces which
    use these fields.
    - N/A fields means the device does not support providing this
    data.

Change-Id: I17b313345f15070a76b3a30dd8d5645d212d601b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 88a7e4b8ad]
2025-03-09 14:23:21 -05:00
Charis Poag 4451b18408 [SWDEV-475712] Fix MI2x target_graphics_version
Removed correcting target_graphics_version by
product name. Instead detected target_graphics_version which
needs to be corrected -> populate accordingly.

Change-Id: I90765c87e0629daea5c732dace8acfd17e8c62c7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: d2efac3d93]
2025-03-09 14:23:21 -05:00
Poag, Charis 66d66a872d [SWDEV-514998/SWDEV-511662] Fix tests for Guest and BM with static CPX config (#26)
Guest: Tests needed to account for not supporting changing compute
partitions.

BM: Tests need to account for invalid responses from Driver (due to
static CPX config).

Change-Id: I09ccee981c6b73684b64e5053068920a6c1b6439

Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/rocm_smi_lib commit: 23e945c6b3]
2025-03-09 14:08:02 -05:00
Poag, Charis ef21f5d254 Revert "[SWDEV-514998/SWDEV-511662] Fix tests for Guest and BM with static CPX config" (#25)
* Revert - this reverts commit c03341cc02efb70c35c4e96ff4fc3e6c53f5be9d.

* Revert "[SWDEV-514998/SWDEV-511662] Fix tests for Guest and BM with static CPX config"

This reverts commit 9bd169da4801c32f7c48f83cb70f790faa0dca96.

[ROCm/rocm_smi_lib commit: 08fee73075]
2025-03-09 14:08:02 -05:00
Arif, Maisam cefb538d80 [SWDEV-517717] Maintence Mode Notice (#20)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/rocm_smi_lib commit: d953b2ce66]
2025-03-09 14:08:02 -05:00
Kanangot Balakrishnan, Bindhiya c648438732 SWDEV-510419: Restore compute partition after memory partition test (#15)
Memory partition test was changing original compute partiton based
on default compute mode. Corrected this to set back to original
compute partition.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/rocm_smi_lib commit: 1d6b8d9422]
2025-03-09 14:08:02 -05:00
Charis Poag 4d47e514f3 [SWDEV-514998/SWDEV-511662] Fix tests for Guest and BM with static CPX config
Guest: Tests needed to account for not supporting changing compute
partitions.

BM: Tests need to account for invalid responses from Driver (due to
static CPX config).


[ROCm/rocm_smi_lib commit: 1dd9ca9df4]
2025-03-09 14:07:51 -05:00
Galantsev, Dmitrii bf7a2eab27 [SWDEV-508785] Bump version number to 7.6.0
Change-Id: I084f139802f73311f15c68f94bc98f631c7f2bd8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: ceb47b0b04]
2025-02-21 14:46:56 -06:00
Charis Poag 7b867182f3 [SWDEV-504146] Fix Device Name
Changes: - Fixed Device Name (market name)
  - Added new API rsmi_dev_market_name_get()
  - Updated tests
  - Updated amdgpu_drm.h to match latest mainline kernel
  - Fixed subsystem ID to only show hex value (not subsystem name)
  - rocm_smi_lib now has a recommended requirement for libdrm
Change-Id: Ic438529e16c8c3dbbdd620da664918148c40c997


[ROCm/rocm_smi_lib commit: 6a5e94c451]
2025-02-19 08:49:50 -06:00
Galantsev, Dmitrii 809ef97d3c Fix warnings on CXX/linker flags (#12)
1) When `clang` is used as system compiler, libraries were built without respecting LDFLAGS. For example, this affected LTO flags, if any (and it only affected clang, not gcc).

2) Linker flags are registered as CXX flags, which produces warnings during compilation:
```
clang++: warning: -Wl,-z,noexecstack: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-znoexecheap: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,relro: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: -Wl,-z,now: 'linker' input unused [-Wunused-command-line-argument]
```

3) Clang does not support `-Wtrampolines` flag:
```
warning: unknown warning option '-Wtrampolines' [-Wunknown-warning-option]
```

4) No linkers support `noexecheap` anymore. `noexecheap` linker flag was a part of PaX patches to GNU ld, (which were dropped in 2017)[https://www.gentoo.org/support/news-items/2017-08-19-hardened-sources-removal.html]. Now ld/ld.lld/ld.gold don't support it and protection of heap is managed by NX bit. Therefore every compiler produces this warning:
```
ld.lld: warning: unknown -z value: noexecheap
```

Closes #210.

Co-authored-by: Sv. Lockal <lockalsash@gmail.com>

[ROCm/rocm_smi_lib commit: 4dbc2b6d57]
2025-02-04 22:16:57 -06:00
Johar, Adel 0c358ecffe Docs: Fix broken links, warnings and use automodule (#11)
- Fixes the broken links in rocm_smi.h
- Uses automodule instead of autofunction in docs/reference/python_api.rst
- Fixes some warnings during docs build
- Update some of the versions in requirements.txt

[ROCm/rocm_smi_lib commit: e5c5a1d5b7]
2025-01-29 08:06:09 -06:00
Mallya, Ameya Keshava 0fc5d97026 Added !verify trigger
[ROCm/rocm_smi_lib commit: 2764498bbf]
2025-01-28 20:11:45 -08:00
Kanangot Balakrishnan, Bindhiya d2461a186b [SWDEV-481004] Fix for incorrect gfx_version number (#8)
The target_graphics_version was not formatted properly and was
showing incorrect Target Name. Corrected this by fomatting
major, minor and revision numbers.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/rocm_smi_lib commit: 94dca7073b]
2025-01-21 15:46:56 -06:00
Mallya, Ameya Keshava e2520ad547 Fixed Workflow for updated KWS structure
[ROCm/rocm_smi_lib commit: aefc865bf4]
2025-01-17 08:26:52 -08:00
Mallya, Ameya Keshava 49b96ca627 Create kws-caller.yml
[ROCm/rocm_smi_lib commit: c6ce9a5aa0]
2025-01-15 11:12:25 -08:00
Galantsev, Dmitrii bc13dfe3c8 [SWDEV-495169] Update ROCm SMI CLI and Error handling (#3)
Issues include:

Update ROCm SMI displaying None or Not Supported to N/A
Update ROCm SMI displaying err msg to instead log err

Signed-off-by: Juan Castillo juan.castillo@amd.com
Change-Id: I1a2ce6e4f329666b5666664a7d7b4475d6c1cbc7

[ROCm/rocm_smi_lib commit: 55ee3cc442]
2025-01-14 17:15:18 -06:00
James Xu 84400150b4 [SWDEV-501108] Update Doxygen note on rsmi_dev_pci_id_get
- To address https://github.com/ROCm/rocm_smi_lib/issues/208
where use of fake BDFs for partitions can cause confusion. This note
is already in the comments of the function definition, but was not
updated in the function declaration.
- Fix broken formatting for the location table for PCIE coordinate fields
- Tracked in SWDEV-501108

Change-Id: Ic85439866cb836bb43acc52314a7f1d026c3215d


[ROCm/rocm_smi_lib commit: 67a0de4279]
2025-01-14 15:49:55 -06:00
Choudhary, Rahul d748d8b247 Create rocm_ci_caller.yml init file to call shared workflow
[ROCm/rocm_smi_lib commit: 3c01c13dfd]
2025-01-07 11:53:58 -08:00
gabrpham a62f424b90 Fixed reset event issues
Issues include:
	SWDEV-480250
	SWDEV-480255
	SWDEV-480248

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Icf12211e4b136f26fce18f09a7bf8b7e9cd20691


[ROCm/rocm_smi_lib commit: 6f51cd651e]
2024-12-30 13:12:46 -05:00
Charis Poag 7b3c814501 [SWDEV-496693] GPU metrics 1.7
Changes:
    - Added new GPU metrics:
      1) XGMI link status - Up/Down; 1 = up; 0 = down
      2) Graphics clocks below host limit (per XCP)
         accumulators -> used to help calculate a violation status
      3) VRAM max bandwidth at max memory clock
    - Updated rocm-smi --showmetrics to include new metrics.
    Units/values reflect as indicated by driver, may differ
    from AMD SMI or other ROCm SMI interfaces which
    use these fields.
    - N/A fields means the device does not support providing this
    data.

Change-Id: I17b313345f15070a76b3a30dd8d5645d212d601b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 4de2168866]
2024-12-15 16:48:13 -05:00
Charis Poag 5d7555e586 [SWDEV-475712] Fix MI2x target_graphics_version
Removed correcting target_graphics_version by
product name. Instead detected target_graphics_version which
needs to be corrected -> populate accordingly.

Change-Id: I90765c87e0629daea5c732dace8acfd17e8c62c7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 8488518b1c]
2024-12-08 22:01:43 -06:00
Charis Poag 72fd1821b0 Merge amd-staging into amd-master 20241125
Change-Id: I801dcda853066d8d2e19a8727b2b07dcafc253b4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 562575c73d]
2024-11-25 08:39:32 -06:00
Charis Poag 06fead7e41 [SWDEV-499029] Fix unable to change memory partition modes
Changes:
  * [API] Removed checking board name, fixes for other MI ASICs
  * [CLI] Increased progress bar to change memory partition modes
    to 140 seconds, since driver reload is variable per system

Change-Id: Ifcaf40d28b4adf5eaa800c9e3748d33749dc414a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: d04cec7f1d]
2024-11-22 20:19:29 -05:00
Zhang Ava c0c3e61cfb Merge amd-staging into amd-master 20241114
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: I97175c160d157e6f8ad0d94b65d2b6f2a2384949


[ROCm/rocm_smi_lib commit: c827c54093]
2024-11-15 11:38:47 +08:00
gabrpham 21d3a831d7 [SWDEV-478748] Changing PCIE Read/Write message TEST FAILURE to WARNING
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I534a94b358f7fddbe3c11d249c6e090cf3fa121e


[ROCm/rocm_smi_lib commit: 5428d29b19]
2024-11-13 15:05:26 -06:00
Peter Park 23190fd3d9 [SWDEV-479054] update doc for rsmi_compute_process_info_get to note 2-step usage:w
Change-Id: I81608f807ab679a27be12be591193712d81232bd
Signed-off-by: Peter Park <peter.park@amd.com>


[ROCm/rocm_smi_lib commit: c3f1d2baf1]
2024-11-13 12:52:18 -05:00
Charis Poag ff46ab4258 Merge amd-staging into amd-master 20241112
Change-Id: I3fba6fb940aa19532037e2125fd1837de4d3f282
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 99c1b5a0df]
2024-11-12 16:43:50 -06:00
Charis Poag 2258c26c53 [SWDEV-488276/SWDEV-497613] Update memory partition set functionality
Changes:
  - Added warning screen to ROCm SMI users
    setting memory partition
  - Added new API (rsmi_dev_memory_partition_capabilities_get)
    to retrieve memory partition capabilities
    (What users can set memory partition modes to)
  - Increased time-bar for CLI sets display to 40 seconds
  - API now waits until the driver reloads with SYSFS files active
  - [SWDEV-475712] [CLI/API] Fixed target_graphics_version field
    not properly displaying for MI2x or Navi 3x ASICs.
  - Updated tests

Change-Id: Iaf89d1b7ad9ceb449b289bc82ea198fe3b23992e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 46902274b6]
2024-11-12 12:18:44 -04:00
Peter Park 8af04d3531 Update changelog fmt to internal standard
Change-Id: Icdb7eb59c6770f46ddae401f23b84cd06e6d3b09


[ROCm/rocm_smi_lib commit: 568cc6e7c7]
2024-11-08 16:20:49 -05:00
Zhang Ava 2a5e1a98bb Merge amd-staging into amd-master 20241106
Signed-off-by: Zhang Ava <niandong.zhang@amd.com>
Change-Id: Ib125dee62e5a893871f5c6df7715177973361a02


[ROCm/rocm_smi_lib commit: fa2c9180d7]
2024-11-08 08:42:13 +08:00
Jorge López d51bd18649 Updates driverInitialized() to support amdgpu built as module as well as kernel built-in. Fixes ROCm/rocm_smi_lib#102 and is an updated version of ROCm/rocm_smi_lib#104
Change-Id: Icb3abe820bc67035b822358a1c04bd09a7c22b6b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Reviewed-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rocm_smi_lib commit: 35c1d00f5a]
2024-11-05 16:30:37 -05:00