These will print compiler warnings if used, so we can weed them out
before removing.
Also add a default flags args for hipHostAlloc, in the C++ functioin
headers. So you can replace hipMallocHost(&ptr, size( with hipHostAlloc(&ptr, size)
[ROCm/hip commit: cea37c3e91]
-Move staging buffer locks inside the staging buffer code.
-Remove dedicated per-device completion_signal + per-device lock -
instead allocated signal from the per-stream pool. This elimintes
the lock and allows more concurrency.
-remove switch HIP_DISABLE_BIDIR_MEMCPY
[ROCm/hip commit: 0af4d3623f]
- refactor staging buffer to operate on hsa* data structures not
hc::accelerator.
- use hsa_memory_allocate to allocate staging buffers rather than
am_alloc.
- Refactor device reset with single member function. Don't reallocate
staging buffers on reset.
- Properly track dependencies based on command type. Add new deps for
H2D and D2D rather than overloading H2D.
[ROCm/hip commit: 7d500599fa]
Still #include staging_buffer.cpp into hip_hcc.cpp.
Directed tests compile hip_hcc to static library and use the library.
[ROCm/hip commit: 28ee7aff71]
- Control with HIP_DB=mask (env var). See src/hip_hcc.cpp for mask
values:
#define DB_API 0 /* 0x01 - shortcut to enable HIP_TRACE_API on single switch */
#define DB_SYNC 1 /* 0x02 - trace synchronization pieces */
#define DB_MEM 2 /* 0x04 - trace memory allocation / deallocation */
#define DB_COPY1 3 /* 0x08 - trace memory copy commands. . */
#define DB_SIGNAL 4 /* 0x10 - trace signal pool commands */
- Combine with HIP_TRACE to see debug with API trace.
- Use colors to distinguish different flows of debug.
- Add define COMPILE_DB_TRACE to allow removing all debug at compile-time
[ROCm/hip commit: aa03e1264c]
On HIP path property obtaining done through hsa_iterate_agents and counting the devices of HSA_DEVICE_TYPE_GPU type.
P.S.
On multi-boards systems it might be problems with detection what board a GPU plugged into (not tested).
[ROCm/hip commit: 57e212606d]
- add API to add / remove user-pointers from the tracker.
- test for thread-safety with MultiThreadtest_2 - rapid
insertions/removal.
- add mutex to provide thread-safety.
- rename tracker interface to "memtracker_..." for consistency.
- add am_memtracker_reset, connect to hipDeviceReset.
-
[ROCm/hip commit: de45e2291e]
Tracks device where memory is allocated, pinned-host or device, and
more.
Uses memory-range-based lookups - so pointers that exist anywhere in
the range of hostPtr + size will find the associated AmPointerInfo.
The insertions and lookups use a self-balancing binary tree and
should support O(logN) lookup speed.
[ROCm/hip commit: 4ee2a5229b]
Device property MaxSharedMemoryPerMultiprocessor set equal to totalGlobalMem (HIP path).
Reason: MaxSharedMemoryPerMultiprocessor should be as the same as group memory size. Group memory will not be paged out, so, the physical memory size = total shared memory size = group region size. NVCC path remains untouched: CUDA's device property MaxSharedMemoryPerMultiprocessor is reported.
hipify is updated as well.
[ROCm/hip commit: ea8f99702d]