* Enhance logging in NCCL initialization
It's convenient to log comms obj and default channels together for debugging
* Add opCount to collDevWork and update increment logic
Added opCount to collDevWork and incremented it when proxyOpQueue is empty (e.g., for intra-node comms)
* Clarify opCount increment logic in enqueue.cc
Updated comment to clarify incrementing opCount for intranode communications.
* Refactor NCCL_INIT logging format
Updated logging format for NCCL_INIT to improve clarity.
* Remove duplicate INFO logging in init.cc
* attach: rocprofv3-attach py improvements
- Handle error status during detachment
- Add detection and error for changing rocprofv3 configuration on reattachment
- Add and improve console messages during attachment and detachment
- Documentation update pass
* attach: fix test permissions
- Test is now skipped if insufficient permissions detected
- Should fix test (for now) in Azure CI pipeline
- Add more extensive permission checking for the tests
- Add default parameters to prevent running rm -rf on a root directory
- Add use for unused LOG_LEVEL parameter
* Introduce HsaKFDContext structure and infrastructure for multiple KFD contexts, enabling
independent contexts within a single process.
* Refactor core components (queue, event, FMM, topology) to be context-aware,
using explicit HsaKFDContext parameters instead of global state.
* Replace global hsakmt_kfd_fd with context-specific file descriptors, ensuring full context isolation.
* Maintain backward compatibility by redirecting legacy APIs to use the primary context.
This refactoring establishes a foundation for multi-context support while preserving existing functionality.
Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>
* clr: Adjust call to ICmdBuffer::CmdCopyMemoryToImage for PAL >= 955
PAL starting versino 955 adds a new argument to
ICmdBuffer::CmdCopyMemoryToImage. Adjust teh callsite to account
fort his.
* clr: Handle new GpuUtil::TraceSessionState cases for PAL >= 939
Starting PAL API version 939, GpuUtil::TraceSessionState changes its
possible values. Adjust for it.
* clr: require PAL version 954
Bump the PAL required vesion to 954, as this is required for proper
debugger support.
* Support Windows HANDLE in interop_map_buffer
* Refactored Windows HANDLE in interop_map_buffer
* ROCr System Dependent Handle Type
* Fix for ROCr Handle Conversion Bug
* Remove Windows Header
Remove libamdhsacode/win32/elf.h due to license restrictions.
Separate Linux coredump implementation because we do not have the ELF
definitions on Windows.
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
* Fix for SWDEV-552584
Two calls to ompt_callback_task_scheduled were issued for the same
prior task. One of them was ompt_task_complete, which causes
internal storage to be release and a pointer zeroed. The other
was ompt_task_early_fulfill, which attempted to reference the
pointer. The callbacks could come in any order as they were
from different threads, thus causing a null pointer
dereference on occasion. The code was changed to do nothing
for the early_fulfill. Additional null pointer checks were
added.
* formatting
* Update ompt.cpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add clean up of buffered_storage files
* Add step to workflows to test for remaining temp files after tests
* Applied suggestions from code review
* add deletion of all cache files
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
- Integrate rocprofiler-systems with rocprofiler-sdk-rocpd to fetch schema
- If rocprofiler-sdk-rocpd is not availabe, use embedded schema files. With this we provide rocpd format support even if ROCm is not available
- Include detection in CMake if rocprofiler-sdk-rocpd package is available (and valid), and build database class upon that
- Update embedded schema that is used as a fallback.
- Update some validation tests to account for schema changes.
Were not handling the case where the eval result is None e.g. some
columns have a peak value, but it is unused, so we use 'None', which
evaluates to the None object.
Return empty string in this case.
disable fine-grain and coarse-grain memory testst until a fix is
available in ROCm 7.1 and/or our CI image. Otherwise we might miss other
errors due to constant CI failures.
[ROCm/rocshmem commit: 4fc5541d78]
disable fine-grain and coarse-grain memory testst until a fix is
available in ROCm 7.1 and/or our CI image. Otherwise we might miss other
errors due to constant CI failures.
* Update rocprofiler_config_interfaces.cmake to use different elf naming
* try out conditional for libelf
* run cmake-format to fix formatting issue
* Remove libelf.patch file from therock-ci-windows.yml
* Remove libelf patch from therock-ci-linux.yml as well
A copy paste mistake in a previous commit caused source and dest to
be reversed. Correct the source and dest params.
Fixes: e8a7371007
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
[ROCm/rocshmem commit: e2dcf99456]