Commit Graph

201 Commits

Author SHA1 Message Date
Galantsev, Dmitrii 8234acd12b Azure - Switch to amd-staging branch
Change-Id: If37b4cd804e0ea50ea4031118b83090263fd39f6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 35357d85d0]
2024-07-23 17:08:32 -05:00
Tom Rix 27f35431ea Fix build with BUILD_STANDALONE=OFF
When the rdc is built with this configure option
-DBUILD_STANDALONE=OFF

This error is caused
CMake Error at rdc_libs/CMakeLists.txt:106 (export):
  export given target "rdc_client" which is not built by this project.

Resolve this by using conditional

Change-Id: I3f6bb2946c609c7db9fc38015b7d9c8ae766f3a0
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6762a6dd8b]
2024-07-08 12:49:09 -05:00
Galantsev, Dmitrii 970cc3e72a Update CHANGELOG.md and README.md for ROCm 6.2
Change-Id: If062cb23290469beef0b04a146c485602377be5d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: bd9901324c]
2024-06-26 17:40:59 -05:00
Galantsev, Dmitrii 9a2806ac95 SWDEV-452795 - Disable RAS plugin, fix XGMI
RAS plugin loaded rocm-smi which is in conflict with amd-smi library

Main source of grief was the map 'devInfoTypesStrings' that is defined
in both rocm-smi and amd-smi

We assume that rocm-smi would get lazy-loaded by RAS library and
overwrite symbols defined in amd-smi. devInfoTypesStrings in rocm-smi
contains different number of elements, the enums are also different.
RDC relies on amd-smi's enums.

One such enum is kDevGpuMetrics:
  rocm-smi: kDevGpuMetrics = 68
  amd-smi:  kDevGpuMetrics = 75

Example of overlapping map definitions:

  $ objdump --dynamic-syms /opt/rocm/lib/libamd_smi.so | grep devInfoTypesStrings
  00000000003c4980 g    DO .data.rel.ro0000000000000008  Base        devInfoTypesStrings
  00000000003db830 g    DO .bss0000000000000030  Base        _ZN3amd3smi6Device19devInfoTypesStringsE
  $ objdump --dynamic-syms /opt/rocm/lib/librocm_smi64.so  | grep devInfoTypesStrings
  00000000003dc590 g    DO .bss0000000000000030  Base        _ZN3amd3smi6Device19devInfoTypesStringsE
  00000000003c9c68 g    DO .data.rel.ro0000000000000008  Base        devInfoTypesStrings

Change-Id: Ib2f2db32b6abd7ebe84e7807c25581461eb86bae
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: d85657e5f2]
2024-06-26 03:42:07 -05:00
Galantsev, Dmitrii 3132f91d38 SWDEV-468423 - Install authentication scripts
Change-Id: I4289fa546bf44861c18f71e156c84a4f7dd4a2ed
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: a885944d97]
2024-06-18 17:20:12 -05:00
Galantsev, Dmitrii b50c64b868 Use correct rocprofiler metrics
Change-Id: I26603de7425abb6588f770ed68c22e14d6d20d56
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: d4bb33d100]
2024-06-11 11:15:18 -05:00
Galantsev, Dmitrii 73948f95e2 Rewrite rocprofiler plugin
Change-Id: Ic7dd967cc60cacd2b16a465180505ea2a342fccf
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 3514225b83]
2024-06-11 03:11:15 -05:00
Galantsev, Dmitrii 29b86095ed Fix rocprofiler plugin
- Replace non-working fields with working ones
    - remove CU_OCCUPANCY completely as it isn't well supported
- Fix rocprofiler initialization with shared_ptr and rdc_module_init
- Replace env var ROCPROFILER_METRICS_PATH with ROCP_METRICS
    - ROCPROFILER_METRICS_PATH is only relevant for rocprofv2
    - ROCP_METRICS is only relevant for rocprofv1 (which we are using)

Change-Id: I21e6fa3f0e1694c38f44ca0e5659d672559f7380
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 20ca2ce574]
2024-06-06 01:51:39 -05:00
Galantsev, Dmitrii c2a75bbe4c Finalize the rocprofiler fields
Change-Id: I4ed1c4309f21bdcc7281d911663036caf5947182
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 07c414af5e]
2024-06-04 19:49:06 -05:00
Galantsev, Dmitrii f73e123900 Add GPU indexing and fix check for fields in rocprof
- Fix RUNPATH for tests

Change-Id: I79517592b49d27080a010a2e41e5878adf24a157
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: e11afbf60f]
2024-06-04 12:56:22 -05:00
Maisam Arif d9adf280cd Updated RDC to use AMD-SMI 24.6.0 structs
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I9ef0f3cb786c1238e53cf21df5c6afafac829175


[ROCm/rdc commit: 7c6bd4dc1c]
2024-05-31 10:37:39 -05:00
Galantsev, Dmitrii a80dfd4f00 Add memory bandwidth metrics
Change-Id: I310ca8af0536497be619d2bda1e540d1f11c2565
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 53033a5b77]
2024-05-17 14:55:01 -05:00
Galantsev, Dmitrii cff2ac8490 Add rocprofiler_example.cc and fix logging
Change-Id: Ib3ed8754f314edc76ea56bfec9a645d720f8926d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: c7fcb1ad25]
2024-05-17 14:55:01 -05:00
Galantsev, Dmitrii 3e4bd50edd Azure - Add rocm-ci.yml
Change-Id: I44bf12857f894363fd8acc7d831c4cfa29fb77d9
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 7f4dd08c69]
2024-05-15 12:53:10 -05:00
Galantsev, Dmitrii 83cf97e280 Profiler - Add all required metrics
Change-Id: Iea3938df9407789c061c3a6ead9167a69069d6e6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: c3a4c899d5]
2024-05-09 23:24:02 -05:00
randyh62 41c946a4f8 link updates, spelling
Change-Id: I71aafc2a0145d139c5c9ca6cb53214c77d88acc5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 383c0b19e8]
2024-05-08 18:15:38 -05:00
randyh62 56f6f3ca19 leo update
Change-Id: I34cb1cdadc1a99d0d226441f1a6b180cb8b4b258
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: eeb59ed080]
2024-05-08 18:15:28 -05:00
randyh62 3076eec4d7 doc reorganization
Change-Id: I526e5e594032299d85d995a7e6fe2d269c3621aa
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 2b815a68c8]
2024-05-08 18:15:06 -05:00
Galantsev, Dmitrii 8b317a6490 Add rocprofiler plugin
Rename ROCR -> Runtime and ROCP -> Profiler

Change-Id: If90953da8fa5d695b681813dad4a3e7ec26a9c7e
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 234b2d835b]
2024-05-07 04:39:39 -05:00
Galantsev, Dmitrii 0ba8f5cf12 Fix runpath for rdci and rdcd
Change-Id: Ic131e9a5abfdf26f2b8e78799fe0e3450171d20d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 796435c568]
2024-05-07 04:39:39 -05:00
Galantsev, Dmitrii 24f30a6ee3 Error if power metric inaccessible
Change-Id: I359c24f24d0200181646d5a7c13a6e0e4d4958b6
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 1f5fa94132]
2024-05-07 04:39:39 -05:00
Galantsev, Dmitrii 93b990ffa0 AMDSMI - Add ring hang event
Change-Id: I84696e3cc1a4eba8de48e464f1a208ed9c6e489d
Depends-On: I2e73ba08ee0004f6f30660b2fa425ea94bafceca
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 5525bf8c86]
2024-05-03 16:45:42 -05:00
Bill(Shuzhou) Liu 79897be094 Add new XGMI and PCIE bandwidth fields from gpu_metrics
For new ASIC, the RDC_EVNT_XGMI, RDC_FI_PCIE_RX and RDC_FI_PCIE_TX
are not supported. New fileds RDC_FI_XGMI and RDC_FI_PCIE_BANDWIDTH
should be used.

Change-Id: Iff5bbef4c07994090fa7c4e9b319966215525283


[ROCm/rdc commit: 61a75d346b]
2024-05-03 16:18:17 -04:00
Brandon Bagwell a459fe4150 Adds the ability to modify 'rdc' options
Modifying the /opt/rocm/etc/rdc file modifies RDC launch options.  If
the file doesn't exist, the service should still launch (though a new
file should likely be included with the next released package of 'rdc'.

Change-Id: I1a1891e9c5c3e6048754eb555779a97a170754c0


[ROCm/rdc commit: de3cb36ce0]
2024-04-30 10:28:16 -05:00
Galantsev, Dmitrii f74f1684de Update kBlockNameMap
Change-Id: I096f40f2b953fad7081d4b9bc05c0291c0f8058d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: cb87eeeae7]
2024-04-24 23:50:55 -05:00
Galantsev, Dmitrii b517730e57 CMAKE - Use ADDRESS_SANITIZER env var
Change-Id: I4727120de2f9d7bded8c24033c252ede718831fc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 0c8827c4b7]
2024-04-24 23:04:25 -05:00
Galantsev, Dmitrii 028355dff0 SWDEV-439576 - rocmsmi -> amdsmi
- Migrate to amdsmi library
- NOTE: raslib still uses rocmsmi
- Remove unused rocmsmi service
- Remove unused RDC client code
- Remove RSMI calls from protos/rdc.proto

Change-Id: Ifc34a264c506b0ec5792307ee56b34526268762d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 9702d0f2d7]
2024-04-09 20:19:28 -05:00
Galantsev, Dmitrii d1b8e1b484 git-blame - Ignore formatting commit
There are several ways to ignore the formatting commit:

1. Configure local project:
    git config --local blame.ignoreRevsFile .git-blame-ignore-revs

2. Run blame with an argument:
    --ignore-revs-file .git-blame-ignore-revs
example:
    git blame --ignore-revs-file .git-blame-ignore-revs rdci/src/rdci.cc

Change-Id: Ic6eaa740850d9f1462d841361480307646e46b5e
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 60467c45af]
2024-04-09 20:10:47 -05:00
Ranjith Ramakrishnan 916d40d5bf Remove hard coded ROCm path in rdc.service
The executable rdcd was using an absolute path in rdc.service. Using update-alternatives gives the flexibility to invoke the binary from anywhere and no absolute path is required.

Change-Id: I2f3d6fcbf9dd854870cfc2e00532c504ce6cd6fc


[ROCm/rdc commit: 0ca6d6fa59]
2024-04-09 10:27:19 -05:00
Galantsev, Dmitrii c314326da0 Revert "Sort the ROCr gpu index based on BDF"
Fix 'rdcd diag' compute and system tests.
This reverts commit 4acaddc32d.

Change-Id: Ia092c46649c1d6338fb96ffe7e6feba4b045f027


[ROCm/rdc commit: 662cc0f8b2]
2024-04-09 10:27:19 -05:00
Galantsev, Dmitrii 43cfd2f014 GIT - Sync dependabot settings with amdsmi
Change-Id: I9442355fa0b4a7858c4c9232631a044789166601
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: d1400df06c]
2024-04-04 17:02:05 -05:00
Galantsev, Dmitrii 53ecc0fc81 Remove -X from .hsaco files
Change-Id: I1f1b4f07eb854ce2e254564b83719be52b553b02
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 9d55c26247]
2024-03-27 20:35:08 -05:00
Galantsev, Dmitrii ce94a3df37 Update CHANGELOG.md for ROCm 6.1
Change-Id: I50fd82a14f26f0f23f3c3931e242fddf46c5bd62
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 534d00e31f]
2024-03-20 10:16:16 -05:00
Galantsev, Dmitrii 006f6b5fc7 Fix links and add certificate gen guide
Change-Id: Ieece04baade54ee3a7cde968aa08077e0d0d8391
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 67578106c4]
2024-03-19 14:41:16 -05:00
Ranjith Ramakrishnan 285cafc0df Start rdc.service after installing the rdc package
The starting of rdc.service was done in preinstall scripts. It should be started after installing rdc package.
Moved the functionality to postinstall scripts

Change-Id: I9a8c733beea43f95474b990a35a431db287b9a8e


[ROCm/rdc commit: b09eede016]
2024-03-12 13:30:27 -07:00
Galantsev, Dmitrii f35772f2e2 Add .github/CONTRIBUTING.md
Change-Id: I7aa7381d973520a515d0539f4915ce67342a3a34
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: ba88baef9c]
2024-03-08 16:19:47 -06:00
David Galiffi 5cde62bc0e Add Doc team to CODEOWNERS file
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: Iad8eea0645b63bddb835ed22080facc7d25c1bc0


[ROCm/rdc commit: b34eafe45a]
2024-03-06 17:58:36 -06:00
Galantsev, Dmitrii dd257bfcac CMAKE - Find hsa-runtime64
Change-Id: Id877eb9cfcc61d81993a6a43703ef2e5f72e1e8f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 6d5d9971c2]
2024-02-19 23:49:38 -05:00
Galantsev, Dmitrii 3c18db8861 SWDEV-444700 - CMAKE - Fix RUNPATH
These RUNPATH changes make it so libraries can be found without setting
LD_LIBRARY_PATH.

Mostly tested on installed RDC binaries and libraries. The
build binaries should also work.

Change-Id: Ifd908a5b61d24dfcbb1d08d21b4ee830156d8643
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 32806681ca]
2024-02-13 16:56:28 -06:00
Galantsev, Dmitrii 39d6f482b8 Remove unsupported rocprofiler metrics
Change-Id: If6cfbcbe018227c591733471ab203fc6675d50af
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 81e3a78b1f]
2024-02-09 15:18:54 -06:00
Galantsev, Dmitrii 9db00be1c1 README - Fix URLs and add lychee config
Use Lychee[1] to check dead links

[1] - https://github.com/lycheeverse/lychee

Change-Id: I0e8aade7879748dbcb4700a527bcae5a2c29ecb5
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 2c27473d6f]
2024-02-08 17:06:02 -06:00
Galantsev, Dmitrii d4308e5175 Upgrade gRPC v1.59.1 -> v1.61.0
Change-Id: I8a3f13dd8f264e28474bd65e92ac53f87ab7db3f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Depends-On: Icbb7b4a580894d78d8ef992befa26ce20fcf3309


[ROCm/rdc commit: f13a1fbea8]
2024-02-06 19:39:50 -06:00
Galantsev, Dmitrii 185245cafa CMAKE: Reduce install messages size
Change-Id: I6fa7cfe986b1de702492a96bddbfd406501bba50
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: aa5448fc16]
2024-02-06 00:31:32 -06:00
Bill(Shuzhou) Liu d1efa59fe8 Fallback to junction temperature and socket power
If the card does not have edge temperature, fallback to junction
temperature. If the card only have socket power, then use socket
power instead.

Change-Id: I053a67a89cf3b29a34e82123f522c08d7dd68916


[ROCm/rdc commit: 5cfe2b4169]
2024-02-05 10:10:26 -06:00
Galantsev, Dmitrii 80d3711aca Add __pycache__ to .gitignore
Change-Id: I815cf3cdb644978d959b80136ac7e95da3d2ca8d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: adf0d7094f]
2024-01-19 09:32:35 -06:00
Galantsev, Dmitrii 4f32e14513 Rebuild librdc_ras.so
- Make librdc_ras.so executable

Change-Id: I715ef1d828fe4d0ecf63b8272ffeccbab280f9dc
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 70ada65079]
2024-01-17 15:19:14 -06:00
Galantsev, Dmitrii 703d6c0d44 Use templates for module population
Also add stddef.h workaround for old GCC.
RHEL-8 still uses GCC 8.5 and templates are not well supported.

Change-Id: Ia4dae23892ec63682ea848c46ba81de85cf6d209
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: f9e80cc37a]
2024-01-10 00:27:09 -06:00
Galantsev, Dmitrii 38c60ff90b RVS: Finish initial RVS integration
NOTE: RVS Build is disabled by default due to CI build issues.

Change-Id: I1593f0fe22075a9f86f54afa3ac151e109f1f7bd
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: eaa1862a80]
2024-01-10 00:27:04 -06:00
Galantsev, Dmitrii ea624cbb7c LINT: Add cpplint, clang-format and pre-commit support
Change-Id: I3cbb787ef27d90486b212dfb1a8c77c460acc2ac
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 434e40305d]
2024-01-09 11:37:11 -06:00
Galantsev, Dmitrii 61cf14d7cc Simplify ModuleMgr
Change-Id: I3a57876c73e50771fcedb7ca4c67d55ac406b34d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/rdc commit: 95e057c88d]
2024-01-09 11:37:11 -06:00