Added initial draft for performance optimizations, started with unpinned memory transfers
Change-Id: Icbce2aec347d015bc66cc0c08f6193057bf36b4c
이 커밋은 다음에 포함됨:
@@ -0,0 +1,42 @@
|
||||
# HIP Performance Optimizations
|
||||
|
||||
Please note that this document lists possible ways for experimenting with HIP stack to gain performance. Performance may vary from platform to platform.
|
||||
|
||||
### Unpinned Memory Transfer Optimizations
|
||||
|
||||
#### On Small BAR Setup
|
||||
|
||||
There are two possible ways to transfer data from Host to Device (H2D) and Device to Host(D2H)
|
||||
* Using Staging Buffers
|
||||
* Using PinInPlace
|
||||
|
||||
#### On Large BAR Setup
|
||||
|
||||
There are two possible ways to transfer data from Host to Device (H2D)
|
||||
* Using Staging Buffers
|
||||
* Using PinInPlace
|
||||
* Direct Memcpy
|
||||
|
||||
And there are two possible ways to transfer data from Device to Host (D2H)
|
||||
* Using Staging Buffers
|
||||
* Using PinInPlace
|
||||
|
||||
Some GPUs may not be able to directly access host memory, and in these cases we need to
|
||||
stage the copy through an optimized pinned staging buffer, to implement H2D and D2H copies.The copy is broken into buffer-sized chunks to limit the size of the buffer and also to provide better performance by overlapping the CPU copies with the DMA copies.
|
||||
|
||||
PinInPlace is another algorithm which pins the host memory "in-place", and copies it with the DMA
|
||||
engine.
|
||||
|
||||
By default staging buffers are used for unpinned memory transfers, however other ways can be used by enabling few environment variables (so no need to build the code again!!!)
|
||||
|
||||
Following environment variables can be used:
|
||||
|
||||
- HIP_PININPLACE - This environment variable forces the use of PinInPlace logic for all unpinned memory copies
|
||||
|
||||
- HIP_OPTIMAL_MEM_TRANSFER- This environment variable enables a hybrid memory copy logic based on thresholds. These thresholds can be managed with following environment variables:
|
||||
- HIP_H2D_MEM_TRANSFER_THRESHOLD_STAGING_OR_PININPLACE - Threshold in bytes for H2D copy. For sizes smaller than threshold staging buffers logic would be used else PinInPlace logic.
|
||||
- HIP_H2D_MEM_TRANSFER_THRESHOLD_DIRECT_OR_STAGING - Threshold in bytes for H2D copy. For sizes smaller than threshold direct copy logic would be used else staging buffers logic.
|
||||
- HIP_D2H_MEM_TRANSFER_THRESHOLD - Threshold in bytes for D2H copy. For sizes smaller than threshold staging buffer logic would be used else PinInPlace logic.
|
||||
|
||||
|
||||
|
||||
새 이슈에서 참조
사용자 차단