Граф коммитов

30 Коммитов

Автор SHA1 Сообщение Дата
gilbertlee-amd 096defc1cd [TransferBench] Adding #CUs / RRLW mode to p2p benchmark (#464)
[ROCm/rccl commit: 1c7ef1b790]
2021-11-08 14:36:04 -07:00
gilbertlee-amd bf024320e4 [TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var (#446)
[ROCm/rccl commit: 18246fc191]
2021-10-25 11:23:29 -06:00
gilbertlee-amd b795cc090b TransferBench p2p benchmark mode (#444)
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)

[ROCm/rccl commit: 550d732d6c]
2021-10-21 15:28:16 -06:00
gilbertlee-amd fe4285d002 [TransferBench] Adding comment echoing to help distinguish tests (#438)
[ROCm/rccl commit: f6b7ac693e]
2021-10-13 14:56:57 -06:00
gilbertlee-amd ad1a620333 [TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU (#436)
[ROCm/rccl commit: 269f07fbc3]
2021-10-12 09:32:54 -06:00
gilbertlee-amd 227848b70f [TransferBench] Adding ability to specify suffix for numBytes (#435)
[ROCm/rccl commit: aa917c3fc8]
2021-10-08 16:36:19 -06:00
gilbertlee-amd fef14c1b73 [TransferBench] Fixing advanced config, adding new all-1-hop sample test (#433)
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test

[ROCm/rccl commit: e506d14d18]
2021-10-07 15:57:21 -06:00
gilbertlee-amd 06b0e1c4e2 [TransferBench] ConfigFile parsing fixes, adding additional info (#422)
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix

* [TransferBench] Fixing up NUMA node detection by filtering pools

[ROCm/rccl commit: 51d64894ff]
2021-09-07 15:28:16 -06:00
gilbertlee-amd b0c3a1790f [TransferBench] Removing dependency on hip_fp16 header, fixing swapped output CSV header (#416)
[ROCm/rccl commit: 1ed272e5f0]
2021-08-04 10:53:41 -06:00
gilbertlee-amd f2a72b1e0b [TransferBench] Fixing a typo in TransferBench usage example (#401)
[ROCm/rccl commit: 2b0b608270]
2021-06-22 17:08:57 -06:00
gilbertlee-amd 01a8efbb76 [TransferBench] Adding ability to specify source data pattern (#394)
* [TransferBench] Adding ability to specify source data pattern

[ROCm/rccl commit: ff413be933]
2021-06-15 08:41:57 -06:00
gilbertlee-amd 60c74f63fa [TransferBench] Restore some previous fixes - memory leak, PCIe address (#314)
[ROCm/rccl commit: 62e0447e9a]
2021-02-01 09:48:09 -07:00
gilbertlee-amd c570f09681 [TransferBench] Fixing bug with fine-grained memory allocation (#311)
* Fixing bug with fine-grained memory

[ROCm/rccl commit: 41c35dad48]
2020-12-15 17:37:31 -07:00
gilbertlee-amd 5155abb250 [TransferBench] Adding ability to perform CPU-executed copies, various upgrades (#309)
* Adding CPU based execution, fixing typos, adding Fine-grained mem
* Exposing sampling factor when generating range of data sizes
* Refactoring how Links are launched, now once per thread
* Documentation updates

[ROCm/rccl commit: ae0c4092c7]
2020-12-11 10:21:14 -07:00
gilbertlee-amd 9b48f92d72 [TransferBench] Support multiple of 4 byte sizes, changing default GPU timing mechanism (#307)
* Changing default timing mechanism, adjusting CPU bandwidth calc, adding flag to use combined timing
* Adding support for smaller transfers (byte size must be multiple of 4 instead of 128)

[ROCm/rccl commit: b80ae551b1]
2020-12-04 14:57:13 -07:00
gilbertlee-amd 2931959e6e Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats (#290)
[ROCm/rccl commit: bfab1d3592]
2020-10-27 09:00:33 -06:00
gilbertlee-amd a062c80298 [TransferBench] Displaying PCIe Bus ID (#288)
* Adding PCIe BusID per GPU in topology display

[ROCm/rccl commit: 61e1a71d14]
2020-10-21 16:13:36 -06:00
gilbertlee-amd 0282595de5 TransferBench Typo. Pinned host memory uses C not P (#286)
[ROCm/rccl commit: 769418c5c7]
2020-10-21 12:05:38 -06:00
gilbertlee-amd 5ca117d7cd New TransferBench features (#273)
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars

[ROCm/rccl commit: ee262819a7]
2020-09-25 12:20:48 -06:00
gilbertlee-amd 3e4ddd065b Upgrading various TransferBench features (#257)
[ROCm/rccl commit: ec9af40fcd]
2020-08-19 09:47:19 -06:00
gilbertlee-amd 1a9b00a7fd Fixes to make TransferBench compile for hipclang (#254)
[ROCm/rccl commit: c985478133]
2020-08-13 12:25:28 -06:00
Gilbert Lee eebc6f2844 Adding option to re-use streams instead of re-creating per topology
[ROCm/rccl commit: 339bf9ff19]
2020-04-23 15:53:40 +00:00
Aaron Enye Shi bfbfe370c3 Fix HIP-Clang build with HSA headers
HIP-Clang does not include these HSA headers, and they need to be explicitly added in RCCL.


[ROCm/rccl commit: a95090d981]
2020-04-03 17:58:23 -04:00
Stanley Tsang e5419407c4 Updating copyright notices for 2020.
[ROCm/rccl commit: 20fa04d9b6]
2020-01-29 15:28:08 -08:00
Gilbert Lee 5783917a75 Changing single sync mode to time all iterations instead of just last
[ROCm/rccl commit: e5074ce94d]
2019-12-20 17:08:39 -08:00
gilbertlee-amd a461b6d139 Adding new sleep after sync capability for data fabric profiling (#162)
Fixing missing header include for ROCM 3.0 changes

[ROCm/rccl commit: 2f4269d06d]
2019-12-12 15:20:54 -07:00
gilbertlee-amd 22cbbb9004 Adding interactive mode for profiling purposes (#150)
[ROCm/rccl commit: fd94f4fa25]
2019-11-05 17:10:16 -07:00
gilbertlee-amd f9ef1553aa Single Sync Timing mode (#144)
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info


[ROCm/rccl commit: 2f9edd2432]
2019-11-01 10:18:25 -06:00
Gilbert Lee a99accb2cb Adding ability to switch between fine/coarse grain destination GPU memory
Adding ability to switch between memset/memcpy


[ROCm/rccl commit: 648c1ee7cc]
2019-10-29 12:00:32 -06:00
gilbertlee-amd 8645391260 Adding TransferBench tool (#113)
* Adding standalone TransferBench tool

[ROCm/rccl commit: b8cf48fc16]
2019-08-07 17:21:41 -06:00