ldconfig is run during rocm-opencl package installation.
Installing libamdocl.so in /opt/rocm-xxx/lib exposes all ROCm libraries when /opt/rocm/lib is added to ldconfig.
To prevent this, libamdocl.so is now installed in /opt/rocm-xxx/lib/opencl.
ldconfig will use the updated path, limiting exposure to only libamdocl.so library.
Co-authored-by: raramakr <raramakr@amd.com>
* clr: Adjust call to ICmdBuffer::CmdCopyMemoryToImage for PAL >= 955
PAL starting versino 955 adds a new argument to
ICmdBuffer::CmdCopyMemoryToImage. Adjust teh callsite to account
fort his.
* clr: Handle new GpuUtil::TraceSessionState cases for PAL >= 939
Starting PAL API version 939, GpuUtil::TraceSessionState changes its
possible values. Adjust for it.
* clr: require PAL version 954
Bump the PAL required vesion to 954, as this is required for proper
debugger support.
* Support Windows HANDLE in interop_map_buffer
* Refactored Windows HANDLE in interop_map_buffer
* ROCr System Dependent Handle Type
* Fix for ROCr Handle Conversion Bug
* Remove Windows Header
* clr: SWDEV-547890 - Maintain an MQD for the emulated AQL queue
To simplify the shader debugger implementation, maintain the relevant
parts of the emulated AQL queue's MQD (amd_queue_t): read_dispatch_id,
write_dispatch_id, compute_tmpring_size.
With this MQD, the shader debugger can handle the emulated AQL queue
the same way it does the real AQL queue, no specialization is required.
* clr: SWDEV-547890 - Conservatively update the MQD's read_dispatch_id
The read_dispatch_id cannot be smaller than the current aql_packet_id
- hsa_queue.size for the debugger to work correctly.
The read_dispatch_id really should be updated when the CmdBuf is marked
as complete. Left a FIXME to address it in a future commit.
---------
Co-authored-by: Laurent Morichetti <laurent.morichetti@amd.com>
* clr: SWDEV-547890 - Maintain an MQD for the emulated AQL queue
To simplify the shader debugger implementation, maintain the relevant
parts of the emulated AQL queue's MQD (amd_queue_t): read_dispatch_id,
write_dispatch_id, compute_tmpring_size.
With this MQD, the shader debugger can handle the emulated AQL queue
the same way it does the real AQL queue, no specialization is required.
* clr: SWDEV-547890 - Conservatively update the MQD's read_dispatch_id
The read_dispatch_id cannot be smaller than the current aql_packet_id
- hsa_queue.size for the debugger to work correctly.
The read_dispatch_id really should be updated when the CmdBuf is marked
as complete. Left a FIXME to address it in a future commit.
* SWDEV-533237 Add initial support for hipOccupancyAvailableDynamicSMemPerBlock API
* SWDEV-533237 Add hipOccupancyAvailableDynamicSMemPerBlock wrapper for nvidia
* SWDEV-533237 Add implementation of hipOccupancyAvailableDynamicSMemPerBlock API
* SWDEV-533237 Add LDSAlignment field in Isa table
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* SWDEV-546311 - implement hipKernelGetLibrary & hipLibraryEnumerateKernels API
* Fix for LibraryEnumerateKernel and KernelGetName
* Update Enumerate Kernels to handle 0 numKernels
* Minor fixes to function names
* fix error checking in internal function
* Update changelog for new apis
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* SWDEV-560065 - Revert "SWDEV-555484 - Invalidate capturing stream only for null/legacy stream. (#1032)"
This reverts commit 99613f1009.
* SWDEV-560065 - Revert "SWDEV-542700 - Return an error if stream capture is attempted on the null stream while a stream capture is active. (#450)"
This reverts commit 0647cf1d28.
* SWDEV-555347 - Remove lock contention in async events loop
* SWDEV-555347 - Introduce Pool of AsyncEventItems
* create generic mempool for AsyncEventItem
* Use BaseShared allocate and free for async event pool
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
1. Create a set of mini numa interface.
In Linux, the interface is based on system call rather than libnuma.
In Windows, the interface can also work, but the policy class is dummy.
Different from Linux, Windows doesn't provide numactl tool or numa lib to setup numa policy, thus
the default policy is followed in Windows, that is, using the closest host numa node to allocate
pinned host memory in hipHostMalloc().
To get the closest host numa node of a GPU device, you need query the new attribute
hipDeviceAttributeHostNumaId. Then you can create a thread with CPU affinity on the numa node.
For example, reference the test in hip-tests/catch/perftests/memory/hipPerfHostNumaAllocWin.cc.
2. Remove pfnSetThreadGroupAffinity and pfnGetNumaNodeProcessorMaskEx as the functions have been exposed since Win7 and Win server 2008.
3. Other minor fixes.