Added initial draft for performance optimizations, started with unpinned memory transfers

Change-Id: Icbce2aec347d015bc66cc0c08f6193057bf36b4c
이 커밋은 다음에 포함됨:
Rahul Garg
2016-08-22 11:37:37 +05:30
부모 a362489b27
커밋 778bb2e78a
+42
파일 보기
@@ -0,0 +1,42 @@
# HIP Performance Optimizations
Please note that this document lists possible ways for experimenting with HIP stack to gain performance. Performance may vary from platform to platform.
### Unpinned Memory Transfer Optimizations
#### On Small BAR Setup
There are two possible ways to transfer data from Host to Device (H2D) and Device to Host(D2H)
* Using Staging Buffers
* Using PinInPlace
#### On Large BAR Setup
There are two possible ways to transfer data from Host to Device (H2D)
* Using Staging Buffers
* Using PinInPlace
* Direct Memcpy
And there are two possible ways to transfer data from Device to Host (D2H)
* Using Staging Buffers
* Using PinInPlace
Some GPUs may not be able to directly access host memory, and in these cases we need to
stage the copy through an optimized pinned staging buffer, to implement H2D and D2H copies.The copy is broken into buffer-sized chunks to limit the size of the buffer and also to provide better performance by overlapping the CPU copies with the DMA copies.
PinInPlace is another algorithm which pins the host memory "in-place", and copies it with the DMA
engine.
By default staging buffers are used for unpinned memory transfers, however other ways can be used by enabling few environment variables (so no need to build the code again!!!)
Following environment variables can be used:
- HIP_PININPLACE - This environment variable forces the use of PinInPlace logic for all unpinned memory copies
- HIP_OPTIMAL_MEM_TRANSFER- This environment variable enables a hybrid memory copy logic based on thresholds. These thresholds can be managed with following environment variables:
- HIP_H2D_MEM_TRANSFER_THRESHOLD_STAGING_OR_PININPLACE - Threshold in bytes for H2D copy. For sizes smaller than threshold staging buffers logic would be used else PinInPlace logic.
- HIP_H2D_MEM_TRANSFER_THRESHOLD_DIRECT_OR_STAGING - Threshold in bytes for H2D copy. For sizes smaller than threshold direct copy logic would be used else staging buffers logic.
- HIP_D2H_MEM_TRANSFER_THRESHOLD - Threshold in bytes for D2H copy. For sizes smaller than threshold staging buffer logic would be used else PinInPlace logic.