corey-derochie-amd
62a6a07d49
Replaced ROCmSoftwarePlatform and RadeonOpenCompute links with ROCm links. ( #1125 )
...
[ROCm/rccl commit: 503a472a25 ]
2024-03-25 16:29:13 -06:00
gilbertlee-amd
5459f1c3f6
Fixing formatting for copywrite ( #638 )
...
[ROCm/rccl commit: 10dbd2a452 ]
2022-10-19 13:43:21 -06:00
gilbertlee-amd
0ca30fb88a
Updating files for missing licenses ( #637 )
...
[ROCm/rccl commit: ebb8b5bf63 ]
2022-10-14 13:49:16 -06:00
gilbertlee-amd
9225ea766e
Removing TransferBench from tools ( #632 )
...
Point to new TransferBench repo
[ROCm/rccl commit: bd7d589446 ]
2022-09-30 11:53:32 -06:00
gilbertlee-amd
c6804778d1
[TransferBench] Syncing with TransferBench v1.02 ( #541 )
...
[ROCm/rccl commit: 685bcea127 ]
2022-04-27 20:43:24 -06:00
gilbertlee-amd
e61ff3ce37
Transfer bench single stream mode ( #531 )
...
- Adding single stream mode
- Removing some unused env vars
- Adding output to CSV mode for p2p benchmark, topology listing modes
[ROCm/rccl commit: def6832287 ]
2022-04-08 15:20:55 -06:00
gilbertlee-amd
4c32c51772
Adding explicit request for coarse-grained host memory due to changes in HipHostMalloc ( #517 )
...
[ROCm/rccl commit: 2d558c9abc ]
2022-03-25 13:05:07 -06:00
gilbertlee-amd
9c3189589f
[TransferBench] Fix for cases with subsets of configured numa nodes ( #495 )
...
[ROCm/rccl commit: f3c2cafd9d ]
2022-02-07 12:16:19 -07:00
gilbertlee-amd
b2deea27f5
TransferBench: Adding ability to reindex GPUs based on PCIe address ( #494 )
...
[ROCm/rccl commit: 84d5fce7dd ]
2022-02-02 08:51:41 -07:00
gilbertlee-amd
a6a9ba1b78
[TransferBench] Updating for 2.11.4. Decoupling from RCCL kernel ( #485 )
...
[ROCm/rccl commit: 2530a2f084 ]
2022-01-05 16:33:25 -07:00
Wenkai Du
618fbfc644
Merge remote-tracking branch 'origin/develop' into 2.11.4
...
[ROCm/rccl commit: 434ecb0e1f ]
2022-01-03 09:54:16 -08:00
gilbertlee-amd
ef1cbb03d2
[TransferBench] Adding more preset benchmarks to filter read mode, cpu vs gpu pairs ( #477 )
...
[ROCm/rccl commit: 1157c2edfe ]
2021-11-24 18:05:37 -07:00
Wenkai Du
f8bd2d0cfa
Merge remote-tracking branch 'nccl/master' into develop
...
[ROCm/rccl commit: 3a919c1f49 ]
2021-11-11 14:22:12 -08:00
gilbertlee-amd
096defc1cd
[TransferBench] Adding #CUs / RRLW mode to p2p benchmark ( #464 )
...
[ROCm/rccl commit: 1c7ef1b790 ]
2021-11-08 14:36:04 -07:00
gilbertlee-amd
bf024320e4
[TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var ( #446 )
...
[ROCm/rccl commit: 18246fc191 ]
2021-10-25 11:23:29 -06:00
gilbertlee-amd
b795cc090b
TransferBench p2p benchmark mode ( #444 )
...
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
[ROCm/rccl commit: 550d732d6c ]
2021-10-21 15:28:16 -06:00
gilbertlee-amd
fe4285d002
[TransferBench] Adding comment echoing to help distinguish tests ( #438 )
...
[ROCm/rccl commit: f6b7ac693e ]
2021-10-13 14:56:57 -06:00
gilbertlee-amd
ad1a620333
[TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU ( #436 )
...
[ROCm/rccl commit: 269f07fbc3 ]
2021-10-12 09:32:54 -06:00
gilbertlee-amd
227848b70f
[TransferBench] Adding ability to specify suffix for numBytes ( #435 )
...
[ROCm/rccl commit: aa917c3fc8 ]
2021-10-08 16:36:19 -06:00
gilbertlee-amd
fef14c1b73
[TransferBench] Fixing advanced config, adding new all-1-hop sample test ( #433 )
...
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
[ROCm/rccl commit: e506d14d18 ]
2021-10-07 15:57:21 -06:00
Gilbert Lee
5be5b37e19
[TransferBench] Update to 2.10.3
...
[ROCm/rccl commit: 68ec3f84e6 ]
2021-08-02 05:53:20 -05:00
gilbertlee-amd
06b0e1c4e2
[TransferBench] ConfigFile parsing fixes, adding additional info ( #422 )
...
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix
* [TransferBench] Fixing up NUMA node detection by filtering pools
[ROCm/rccl commit: 51d64894ff ]
2021-09-07 15:28:16 -06:00
gilbertlee-amd
b0c3a1790f
[TransferBench] Removing dependency on hip_fp16 header, fixing swapped output CSV header ( #416 )
...
[ROCm/rccl commit: 1ed272e5f0 ]
2021-08-04 10:53:41 -06:00
gilbertlee-amd
f2a72b1e0b
[TransferBench] Fixing a typo in TransferBench usage example ( #401 )
...
[ROCm/rccl commit: 2b0b608270 ]
2021-06-22 17:08:57 -06:00
gilbertlee-amd
0a636f20a3
[TransferBench] Switching from little-endian fill pattern to big-endian ( #399 )
...
[ROCm/rccl commit: 720374a767 ]
2021-06-21 14:28:51 -06:00
gilbertlee-amd
01a8efbb76
[TransferBench] Adding ability to specify source data pattern ( #394 )
...
* [TransferBench] Adding ability to specify source data pattern
[ROCm/rccl commit: ff413be933 ]
2021-06-15 08:41:57 -06:00
Gilbert Lee
b954d85935
[TransferBench] Fixing some merge issues
...
[ROCm/rccl commit: f372c53d52 ]
2021-02-05 16:46:20 +00:00
Wenkai Du
ae5779702a
Merge remote-tracking branch 'origin/develop' into 2.8.3
...
[ROCm/rccl commit: ab1e7a0318 ]
2021-02-04 20:02:34 -05:00
Gilbert Lee
1643d05c75
[TransferBench] Updating for 2.8.3
...
[ROCm/rccl commit: 9ce203dd0a ]
2021-02-04 18:58:25 +00:00
gilbertlee-amd
60c74f63fa
[TransferBench] Restore some previous fixes - memory leak, PCIe address ( #314 )
...
[ROCm/rccl commit: 62e0447e9a ]
2021-02-01 09:48:09 -07:00
gilbertlee-amd
c570f09681
[TransferBench] Fixing bug with fine-grained memory allocation ( #311 )
...
* Fixing bug with fine-grained memory
[ROCm/rccl commit: 41c35dad48 ]
2020-12-15 17:37:31 -07:00
gilbertlee-amd
5155abb250
[TransferBench] Adding ability to perform CPU-executed copies, various upgrades ( #309 )
...
* Adding CPU based execution, fixing typos, adding Fine-grained mem
* Exposing sampling factor when generating range of data sizes
* Refactoring how Links are launched, now once per thread
* Documentation updates
[ROCm/rccl commit: ae0c4092c7 ]
2020-12-11 10:21:14 -07:00
gilbertlee-amd
9b48f92d72
[TransferBench] Support multiple of 4 byte sizes, changing default GPU timing mechanism ( #307 )
...
* Changing default timing mechanism, adjusting CPU bandwidth calc, adding flag to use combined timing
* Adding support for smaller transfers (byte size must be multiple of 4 instead of 128)
[ROCm/rccl commit: b80ae551b1 ]
2020-12-04 14:57:13 -07:00
gilbertlee-amd
2931959e6e
Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats ( #290 )
...
[ROCm/rccl commit: bfab1d3592 ]
2020-10-27 09:00:33 -06:00
gilbertlee-amd
a062c80298
[TransferBench] Displaying PCIe Bus ID ( #288 )
...
* Adding PCIe BusID per GPU in topology display
[ROCm/rccl commit: 61e1a71d14 ]
2020-10-21 16:13:36 -06:00
gilbertlee-amd
0282595de5
TransferBench Typo. Pinned host memory uses C not P ( #286 )
...
[ROCm/rccl commit: 769418c5c7 ]
2020-10-21 12:05:38 -06:00
gilbertlee-amd
5ca117d7cd
New TransferBench features ( #273 )
...
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars
[ROCm/rccl commit: ee262819a7 ]
2020-09-25 12:20:48 -06:00
gilbertlee-amd
3e4ddd065b
Upgrading various TransferBench features ( #257 )
...
[ROCm/rccl commit: ec9af40fcd ]
2020-08-19 09:47:19 -06:00
gilbertlee-amd
1a9b00a7fd
Fixes to make TransferBench compile for hipclang ( #254 )
...
[ROCm/rccl commit: c985478133 ]
2020-08-13 12:25:28 -06:00
Gilbert Lee
eebc6f2844
Adding option to re-use streams instead of re-creating per topology
...
[ROCm/rccl commit: 339bf9ff19 ]
2020-04-23 15:53:40 +00:00
Aaron Enye Shi
bfbfe370c3
Fix HIP-Clang build with HSA headers
...
HIP-Clang does not include these HSA headers, and they need to be explicitly added in RCCL.
[ROCm/rccl commit: a95090d981 ]
2020-04-03 17:58:23 -04:00
Stanley Tsang
e5419407c4
Updating copyright notices for 2020.
...
[ROCm/rccl commit: 20fa04d9b6 ]
2020-01-29 15:28:08 -08:00
Gilbert Lee
5783917a75
Changing single sync mode to time all iterations instead of just last
...
[ROCm/rccl commit: e5074ce94d ]
2019-12-20 17:08:39 -08:00
gilbertlee-amd
a461b6d139
Adding new sleep after sync capability for data fabric profiling ( #162 )
...
Fixing missing header include for ROCM 3.0 changes
[ROCm/rccl commit: 2f4269d06d ]
2019-12-12 15:20:54 -07:00
gilbertlee-amd
22cbbb9004
Adding interactive mode for profiling purposes ( #150 )
...
[ROCm/rccl commit: fd94f4fa25 ]
2019-11-05 17:10:16 -07:00
gilbertlee-amd
f9ef1553aa
Single Sync Timing mode ( #144 )
...
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
[ROCm/rccl commit: 2f9edd2432 ]
2019-11-01 10:18:25 -06:00
Gilbert Lee
a99accb2cb
Adding ability to switch between fine/coarse grain destination GPU memory
...
Adding ability to switch between memset/memcpy
[ROCm/rccl commit: 648c1ee7cc ]
2019-10-29 12:00:32 -06:00
gilbertlee-amd
8645391260
Adding TransferBench tool ( #113 )
...
* Adding standalone TransferBench tool
[ROCm/rccl commit: b8cf48fc16 ]
2019-08-07 17:21:41 -06:00