- Fixes SWDEV-559349
- Fix build failure caused by correct libunwind not being found in some environments.
- Updated the `timemory` submodule to commit `24407d37ab85c46ba6c18fba9498320f825ee4e4 `.
* Use static catch2.lib instead of catch2.dll
Using catch2.dll incraeses execution time by 12x
* handle debug option for static catch2
* SWDEV-573539 - skip atomics on windows since its taking a very long time to execute
mlsejenkins needs newer cmake but compiler breaks with newer versions
so skipping on windows can be a workaround for now
---------
Co-authored-by: Joseph Macaranas <145489236+jayhawk-commits@users.noreply.github.com>
**Thread limit configuration and enforcement: **
* Added a check in `CMakeLists.txt` to ensure `ROCPROFSYS_MAX_THREADS` is at least 128, automatically setting it to 128 with a warning if a lower value is provided.
* Replaced hardcoded thread limit (`allowed_max_threads`) in `pthread_create_gotcha.cpp` with the configurable `ROCPROFSYS_MAX_THREADS` value, ensuring all runtime checks and warnings use the actual configured limit.
**Documentation improvements: **
* Updated the development guide to explain the new thread limit behavior, including how exceeding the limit is handled gracefully, how to configure it, and the build-time validation rules.
**Test updates: **
* Modified thread limit tests to use the configurable `ROCPROFSYS_MAX_THREADS` value instead of a hardcoded limit and expanded the range of tested thread values.
* Increased test timeouts to accommodate larger thread counts and ensure reliability with higher limits.
This has started failing on various developer build systems. Looking at it, it is not precisely clear how this ever worked given that nothing appears to be adding the DRM include dirs.
I'd prefer that we remove this delay loading (at least for TheRock builds where it is never needed), but in the meantime, this does fix the issue and is verified on an affected system.
Fixes https://github.com/ROCm/TheRock/issues/2744
* SWDEV-508225 - do not assert() after calling digestFatBinary() if it fails. Otherwise this causes assertions to trigger easily in systems that have an APU and a discrete GPU and the code was compiled for the discrete one
* SWDEV-508225 - fix that when using a non-existent ordinal in HIP_VISIBLE_DEVICES, getCurrentArch() would crash
fix: Add gpu_metrics 1.0 support which is still used by some hardware
Code changes related to the following:
* APIs
* Unit tests
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
* Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control
- Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control
in AMD SMI and also in the CLI tool
* CHANGELOG.md file updated
* SWDEV-562837: Update amdsmi-py-api.md as per the new APIs
Updated amdsmi-py-api.md as per the new APIs added.
---------
Signed-off-by: Soumya <sranjanr@amd.com>
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Saka Sitharammurthy <SitharamMurthy.Saka@amd.com>
* SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers
* Remove KernelMutex and KernelSharedMutex abstractions with std::mutex and std::shared_mutex
* Replaced unique_locks with lock_guards
* More changes
* Replace new and deletes with smart pointers
* Replaced some more with shared ptrs
* Replacements with smart pointers - pt 2
* missed change
* Remove MFMA functionality in rocflop sample since its not supported in MI50
* Add gfx arc based support for MFMA and SMFMAC in rocflop.cpp
* Add --int32 usage doc
* Address review comments
* Fix buffer tracing synchronization lock
- PR #529 (in rocprofiler-sdk-internal) introduced waiting on the syncer flag when emplacing in a buffer to prevent the overwriting buffer records currently being processed in a buffer flush callback
- The above fix introduced a block on the both buffers when a buffer flush callback was being executed instead of a block on the buffer being flushed.
* Add rocpd tests for duplicate records
* Address code review comments
* Update rocprofiler workflows to use new runner naming for mi325
* Add input options to workflow_dispatch for rocprofiler-systems CI workflow
* Update runner name on therock-ci-linux.yml as well
`tests/rocprofiler/rocprofiler.cpp` uses `std::string` without including `<string>` directly.
This works with libstdc++ due to transitive includes, but fails with libc++.
Closes#1240
* Fix merging logic for multi process
* Fix dispatch id reset logic in case of rocpd format
* Fix kernel id reset logic in case of csv format
* Revert correlation logic change in csv format
* Do inner join instead of left join
* Added tool for dumping counter and metric values
* Skip Linting
* Added support for iteration multiplexing
* Remove subparser and supress compute options
* Specify output dir
* Add kernel info
* csv name change
* Added comments
* Support dispatch id-less dataframes
* Formatting fix
* Add default for path
* Print help with no args
* Support only single workload
Fixed incorrect error code expectation in FrequenciesRead
test when calling amdsmi_get_gpu_pci_bandwidth() with nullptr
parameter.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Faster counter accuracy testing
* Better handle SPI_CSN_* metrics for lesser than MI350 series
* Use metric filtering to collect only relevant counters for comparison
* Ensure all workload folders are deleted after testing is completed
* Dont use clean_existing=False
* Add manual test for all counter accuracy
* Test env. vars. in rocprofiler-sdk backend
* Improve rocprofiler-sdk backend test case to check for env. vars. and
ensure we do not overwrite irrelevant env. vars.
* Remove unnecessary usage of ROCPROF_INDIVIDUAL_XCC_MODE env. var.
* Formatting fixes
* Test fixes
* Remove redundant code in tests
* Remove usage of utils_mod and use utils instead, this prevents
duplicate imports
* Fix for multi process workload profiling
Native counter collection tool updates:
* Do not dump empty counter data for a process
* Use PID instead of UUID for dumped csv files to facilitate correlation
* Handle merging multiple pairs of rocpd (from sdk tool) and csv (from
native tool) files
* Handle merging multiple pairs of csv (from sdk tool) and csv (from
native tool) files
Rocpd output format updates:
* Merge multiple rocpd databases into a single csv
* Reset dispatch id and kernel id for unique dispatches and unique
kernels respectively
* Retain multiple rocpd databases per run for multi process workloads
* Add test case for multiprocess profiling using rocflop workload
* Add rocflop
* Fix native counter csv to rocprofv3 csv conversion
* Use kernel_id instead of dispatch_id to correlate native counter csv
and kernel trace csv
* python formatting using ruff 0.14 instead of 0.13
## Motivation
When profiling multi-process applications where a parent process sends SIGKILL to child processes, the termination can occur before the profiler has a chance to flush collected data. This PR introduces a configurable delay before SIGKILL signals are forwarded, allowing profiling data to be captured before process termination. This is workaround.
## Technical Details
- Added new configuration setting `ROCPROFSYS_KILL_DELAY` (default: 0 seconds) to specify a delay before SIGKILL signals are forwarded to other processes
- Implemented `kill_gotcha` component that intercepts the `kill()` system call
- The gotcha only delays SIGKILL signals sent to external processes (pid > 0 and not self)
- Integrated `kill_gotcha_t` into the `preinit_bundle_t` for early initialization