diff --git a/docs/markdown/hip_programming_guide.md b/docs/markdown/hip_programming_guide.md index d5549659da..e9d25b06bc 100644 --- a/docs/markdown/hip_programming_guide.md +++ b/docs/markdown/hip_programming_guide.md @@ -60,9 +60,10 @@ HIP supports Stream Memory Operations to enable direct synchronization between N hipStreamWriteValue64 Note, CPU access to the semaphore's memory requires volatile keyword to disable CPU compiler's optimizations on memory access. - For more details, please check the documentation HIP-API.pdf. +Please note, HIP stream does not gurantee concurrency on AMD hardware for the case of multiple (at least 6) long running streams executing concurrently, using hipStreamSynchronize(nullptr) for synchronization. + ### Coherency Controls ROCm defines two coherency options for host memory: - Coherent memory : Supports fine-grain synchronization while the kernel is running.  For example, a kernel can perform atomic operations that are visible to the host CPU or to other (peer) GPUs.  Synchronization instructions include threadfence_system and C++11-style atomic operations. @@ -130,7 +131,10 @@ The link here(https://github.com/ROCm-Developer-Tools/HIP/blob/main/tests/src/hi ## Device-Side Malloc -HIP-Clang currently doesn't supports device-side malloc and free. +HIP-Clang now supports device-side malloc and free. +This implementation does not require the use of `hipDeviceSetLimit(hipLimitMallocHeapSize,value)` nor respects any setting. The heap is fully dynamic and can grow until the available free memory on the device is consumed. + +The test codes in the link (https://github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/src/deviceLib/hipDeviceMalloc.cpp) show how to implement application using malloc and free functions in device kernels. ## Use of Long Double Type