Wenkai Du
37f7eec6b7
Change network plugin name to librccl-net.so ( #266 )
...
[ROCm/rccl commit: d871fceb54 ]
2020-09-18 13:23:30 -07:00
Wenkai Du
f0a303664e
Limit P2P channels on Rome
...
[ROCm/rccl commit: 42955f5f4f ]
2020-09-17 17:20:32 -07:00
lijietang
f6b08ca547
Add rccl bw test script in tools ( #255 )
...
[ROCm/rccl commit: bbe233f8c1 ]
2020-09-11 16:59:03 +08:00
Stanley Tsang
209133fadf
Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos ( #265 )
...
* Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos
* Removing potentially unneccessary dependencies from install script
[ROCm/rccl commit: 8c90aefb6d ]
2020-09-10 17:27:22 -06:00
Wenkai Du
a3402d6aeb
Merge pull request #262 from wenkaidu/alignment
...
Make data alignment requirements matching ISA manual
[ROCm/rccl commit: 60819dcf8d ]
2020-09-08 10:40:42 -07:00
Stanley Tsang
818b44e27d
Adding XNACK flags. ( #264 )
...
* Adding XNACK flags.
[ROCm/rccl commit: f2e5db7bf7 ]
2020-09-08 11:36:30 -06:00
Aaron Enye Shi
0a3a397481
Add RCCL Static Lib Creation with -fgpu-rdc
...
RCCL uses -fgpu-rdc to compile its source objects. When linking
the RCCL static library, the link and archive step must do through
hipcc and uses the flag --emit-static-lib. When compiling
UnitTests, the librccl.a must be consumed through -l and -L.
[ROCm/rccl commit: 958b213428 ]
2020-09-03 11:25:41 -04:00
Wenkai Du
09639a5d54
Fix broken profiling build ( #263 )
...
[ROCm/rccl commit: e2042ccf8a ]
2020-09-02 15:39:52 -07:00
Wenkai Du
81bf52ddee
gtest: add alltoallv test
...
[ROCm/rccl commit: b163a8898f ]
2020-09-02 21:28:32 +00:00
Wenkai Du
cfa1228504
Make data alignment requirements matching ISA manual
...
From https://developer.amd.com/wp-content/resources/Vega_Shader_ISA.pdf
8.1.7. Alignment
For Dword or larger reads or writes, the two LSBs of the byte-address
are ignored, thus forcing Dword alignment.
[ROCm/rccl commit: 4751992231 ]
2020-09-01 21:21:58 +00:00
Wenkai Du
778ab61097
Fix incorrect threads split in sendrecv ( #261 )
...
[ROCm/rccl commit: 4180e6409e ]
2020-08-31 17:33:22 -07:00
Wenkai Du
03bb6bcb54
Increase minimal channels for gfx908 ( #259 )
...
[ROCm/rccl commit: c5cbece6d0 ]
2020-08-26 11:40:11 -07:00
Wenkai Du
0898fea746
Only use software barrier for synchronization ( #258 )
...
[ROCm/rccl commit: b0919dc46c ]
2020-08-25 13:16:34 -07:00
Wenkai Du
5f49a0e088
Add NPS4 support on some models ( #256 )
...
* Add NPS4 support on some models
* Add XML models
[ROCm/rccl commit: 391bbf3f1e ]
2020-08-19 11:03:20 -07:00
gilbertlee-amd
3e4ddd065b
Upgrading various TransferBench features ( #257 )
...
[ROCm/rccl commit: ec9af40fcd ]
2020-08-19 09:47:19 -06:00
Wenkai Du
3d5fb8142e
Add another Rome model ( #249 )
...
* Add another Rome model
* Add gfx908 4P3L models and support
* Revert "Use cached value for detecting GDR support only once"
This reverts commit 0108a1219d .
* Skip using ibverb for GPU direct RDMA detection
* Fine tune one Rome model
[ROCm/rccl commit: a51e4071e3 ]
2020-08-17 10:51:02 -07:00
gilbertlee-amd
1a9b00a7fd
Fixes to make TransferBench compile for hipclang ( #254 )
...
[ROCm/rccl commit: c985478133 ]
2020-08-13 12:25:28 -06:00
saadrahim
67bb880b8b
Adding gfx908 to CI ( #253 )
...
[ROCm/rccl commit: 6d8e19929c ]
2020-08-13 11:07:33 -06:00
Wenkai Du
f242a2f0b0
Collect gcnArch and hipDeviceArch_t in XML ( #252 )
...
[ROCm/rccl commit: 7e3d8a31cc ]
2020-08-12 15:48:38 -07:00
saadrahim
f309fb5b29
Cleaning up CI code be removing overrides ( #251 )
...
[ROCm/rccl commit: 50af2e9b66 ]
2020-08-12 12:38:10 -06:00
Wenkai Du
e5ec2d94d5
Merge pull request #248 from wenkaidu/2.7.8
...
2.7.8
[ROCm/rccl commit: 066223333d ]
2020-08-11 08:20:37 -07:00
Wenkai Du
14ad6ff3b4
Merge remote-tracking branch 'nccl/master' into 2.7.8
...
[ROCm/rccl commit: 7e3f841fab ]
2020-08-10 16:11:00 +00:00
Wenkai Du
26c540abb8
Merge pull request #247 from wenkaidu/rome
...
Additional Rome models support
[ROCm/rccl commit: 3c46cb8ad4 ]
2020-08-07 10:56:12 -07:00
MurtadhaAldallal
f1373612b0
Update rccl_prim_test.cpp ( #246 )
...
Adding doublelocalcopy operation and freeing buffer memory at end.
DoubleLocalCopy Patch Added
[ROCm/rccl commit: 390c63cf0d ]
2020-08-07 08:20:14 -07:00
Wenkai Du
c9815aaa36
Add more Rome 4P2H models
...
[ROCm/rccl commit: 09ef75656a ]
2020-08-06 18:20:02 +00:00
Stanley Tsang
bbc4b72ebe
Adding static library building option. ( #244 )
...
* Adding static library building option.
* Disabling running tests for static build
* Removing static packaging in CI
Co-authored-by: Saad Rahim <saad.rahim@amd.com >
[ROCm/rccl commit: c5d4d9eb76 ]
2020-08-06 11:19:43 -06:00
saadrahim
e5432857db
Download GTest if not found in system ( #237 )
...
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com >
[ROCm/rccl commit: 0dc019e35f ]
2020-08-06 09:36:58 -06:00
Jack Snyder
dca19952fd
Setting type when gpu sub node is discovered
...
[ROCm/rccl commit: de49a77074 ]
2020-08-05 13:39:23 -07:00
Eric Badger
d6a78cb1c7
Don't require NIC devices to have specific PCI class
...
If a PCI node is the parent of a NIC, treat it as such, regardless of
the PCI class code for the device. This allows non-traditional devices
to act as NICs via the net plugin mechanism.
For consistency, treat GPUs similarly.
[ROCm/rccl commit: 700c0e0f24 ]
2020-08-05 12:46:29 -07:00
Wenkai Du
5f96f13e75
Allow setup ring through NCCL_RINGS to facilitate testing
...
[ROCm/rccl commit: 5b03132ace ]
2020-08-04 21:07:00 +00:00
Wenkai Du
22a6211eaf
Improve 4P2H topology on Rome ( #243 )
...
1. Use bi-directional rings
2. GPU search is sorted by PCI device ID to get consistent results
[ROCm/rccl commit: d1e20b4c5e ]
2020-07-28 14:21:44 -07:00
David Addison
4acaffebb3
2.7.8-1
...
Fix collective mismatch error when using ncclSend/ncclRecv
[ROCm/rccl commit: 033d799524 ]
2020-07-27 16:34:09 -07:00
Wenkai Du
487f93b83f
Topology tuning for 4P2H on Rome ( #242 )
...
* Topology tuning for 4P2H on Rome
* Use ncclTopoIdToIndex
[ROCm/rccl commit: e7a10aa0e4 ]
2020-07-27 11:53:57 -07:00
Wenkai Du
49c667ac8a
ib-test: support multiple channels ( #241 )
...
[ROCm/rccl commit: 8d5fb920b6 ]
2020-07-27 11:03:12 -07:00
Sourav Chakraborty
e55a3f20ba
add 4 node 8P6L 1 NIC 2nd Hive model
...
[ROCm/rccl commit: 2475daafee ]
2020-07-22 16:27:15 +00:00
Sourav Chakraborty
b3306d1c13
simplify model definitions in topo expl
...
[ROCm/rccl commit: db55afb014 ]
2020-07-22 16:05:53 +00:00
Wenkai Du
f604fc774e
Add 8P6L multi-node models ( #239 )
...
[ROCm/rccl commit: d5f90e19b5 ]
2020-07-21 14:10:36 -07:00
Stanley Tsang
56d8c7c893
Adding better naming to unit tests for filtering; adding short and full unit test suites ( #235 )
...
[ROCm/rccl commit: 684f3e6af4 ]
2020-07-21 12:19:47 -06:00
Wenkai Du
cb80d93b11
Fix RCCL build package name ( #236 )
...
[ROCm/rccl commit: 35c5a7fe45 ]
2020-07-20 14:43:00 -07:00
saadrahim
f80136bbba
Changing GTest inclusion in cmake to use find_package ( #234 )
...
* GTest is used via find_package. No longer downloaded in cmake.
* Adding error handling
[ROCm/rccl commit: 99a491273f ]
2020-07-15 20:51:48 -06:00
saadrahim
b600c00292
Changing dependency to hip-rocclr ( #228 )
...
[ROCm/rccl commit: 7f93aa7e53 ]
2020-07-14 17:49:56 -06:00
Wenkai Du
3e2c9054cd
Change default channels duplication for chordal ring ( #233 )
...
[ROCm/rccl commit: ab787c767e ]
2020-07-14 15:16:50 -07:00
gilbertlee-amd
b83acc8032
Removing UnitTest as install, removing unused env var ( #231 )
...
[ROCm/rccl commit: f87ba17737 ]
2020-07-10 09:30:28 -06:00
Wenkai Du
47dfed5f01
Revert "Split primitive class to smaller structures" ( #230 )
...
This reverts commit 622b49e80a .
[ROCm/rccl commit: 5215130168 ]
2020-07-08 11:06:50 -07:00
Wenkai Du
4a3b58ac3a
Match RCCL package name to API version ( #229 )
...
[ROCm/rccl commit: 1addf4f196 ]
2020-07-07 13:30:39 -07:00
Riatre Foo
e791f8c8c7
Fix build action order
...
Add $(INCTARGETS) to build dependencies of %.o and $(DEVICELIB).
As there were no dep files during the first build, Make may kick off source
compilation before nccl.h got generated, which leads to occasional build
failures on systems with high core count. The build failure could be
reproduced reliably with a `sleep 5` in $(INCDIR)/nccl.h rule.
[ROCm/rccl commit: 2d8601701d ]
2020-07-07 10:20:51 -07:00
Stanley Tsang
3dd54437fe
Adding appropriate references in rccl-prim-test ( #227 )
...
Adding appropriate references to rccl-prim-test.
[ROCm/rccl commit: 9bd4c14603 ]
2020-07-06 10:15:03 -06:00
Wenkai Du
c7805f224e
Merge remote-tracking branch 'nccl/master' into develop
...
[ROCm/rccl commit: da3b197d6c ]
2020-07-01 16:51:25 -07:00
Wenkai Du
2f99c7bbad
topo_expl: each rank needs to have its own memory for graphs ( #225 )
...
[ROCm/rccl commit: d3548cc474 ]
2020-07-01 15:11:02 -07:00
Wenkai Du
e8da2a0da6
topo_expl: fix broken build ( #224 )
...
[ROCm/rccl commit: a6be82f5ab ]
2020-06-30 11:11:23 -07:00