Jonathan R. Madsen 73ff4f2502 Update HSA async copy active signals handling (#732)
* Enable INFO logging on retried CI jobs

* Update lib/rocprofiler-sdk/async_copy.cpp

- rework active_signals
  - make hsa_signal_t member variable
  - remove sync from destructor
  - replace _is_set with atomic counter
  - timeout of 30 seconds hsa_signal_wait
  - switch from relaxed to scacquire/screlease memory ordering
- improve logging and error handling
- destroy hsa signal in active_signals in async_fini

* Update lib/rocprofiler-sdk/async_copy.cpp

- active_signals::create
- change initial value of signal to 1 instead of value of completion signal
- change condition trigger of signal callback

* Update tests/counter-collection/validate.py

* Update lib/rocprofiler-sdk/async_copy.cpp

- improved logging
- fix hsa_signal_wait_scacquire_fn check

* Cleanup tests/lib/transpose/transpose.cpp

- remove huge comment block

* Appears to be working on MI200

Dependency Versions:

clr: f7b1398361  - compile mode: release

hsa-runtime: 4cd6c62f25dbbdbaa8580dd4ad8f388c98c508da - compile mode: RelWithDebug

* Update source/lib/rocprofiler-sdk/hsa/async_copy.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Format fix

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <aelwazir@hpe6u-21.amd.com>

[ROCm/rocprofiler-sdk commit: 8c5399a68a]
2024-04-09 08:31:08 -05:00
S
Descripción
No description provided
282 MiB
Languages
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
Otros 1.1%