커밋 그래프

407 커밋

작성자 SHA1 메시지 날짜
Wenkai Du 37f7eec6b7 Change network plugin name to librccl-net.so (#266)
[ROCm/rccl commit: d871fceb54]
2020-09-18 13:23:30 -07:00
Wenkai Du f0a303664e Limit P2P channels on Rome
[ROCm/rccl commit: 42955f5f4f]
2020-09-17 17:20:32 -07:00
lijietang f6b08ca547 Add rccl bw test script in tools (#255)
[ROCm/rccl commit: bbe233f8c1]
2020-09-11 16:59:03 +08:00
Stanley Tsang 209133fadf Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos (#265)
* Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos

* Removing potentially unneccessary dependencies from install script

[ROCm/rccl commit: 8c90aefb6d]
2020-09-10 17:27:22 -06:00
Wenkai Du a3402d6aeb Merge pull request #262 from wenkaidu/alignment
Make data alignment requirements matching ISA manual

[ROCm/rccl commit: 60819dcf8d]
2020-09-08 10:40:42 -07:00
Stanley Tsang 818b44e27d Adding XNACK flags. (#264)
* Adding XNACK flags.

[ROCm/rccl commit: f2e5db7bf7]
2020-09-08 11:36:30 -06:00
Aaron Enye Shi 0a3a397481 Add RCCL Static Lib Creation with -fgpu-rdc
RCCL uses -fgpu-rdc to compile its source objects. When linking
the RCCL static library, the link and archive step must do through
hipcc and uses the flag --emit-static-lib. When compiling
UnitTests, the librccl.a must be consumed through -l and -L.


[ROCm/rccl commit: 958b213428]
2020-09-03 11:25:41 -04:00
Wenkai Du 09639a5d54 Fix broken profiling build (#263)
[ROCm/rccl commit: e2042ccf8a]
2020-09-02 15:39:52 -07:00
Wenkai Du 81bf52ddee gtest: add alltoallv test
[ROCm/rccl commit: b163a8898f]
2020-09-02 21:28:32 +00:00
Wenkai Du cfa1228504 Make data alignment requirements matching ISA manual
From https://developer.amd.com/wp-content/resources/Vega_Shader_ISA.pdf

8.1.7. Alignment
For Dword or larger reads or writes, the two LSBs of the byte-address
are ignored, thus forcing Dword alignment.


[ROCm/rccl commit: 4751992231]
2020-09-01 21:21:58 +00:00
Wenkai Du 778ab61097 Fix incorrect threads split in sendrecv (#261)
[ROCm/rccl commit: 4180e6409e]
2020-08-31 17:33:22 -07:00
Wenkai Du 03bb6bcb54 Increase minimal channels for gfx908 (#259)
[ROCm/rccl commit: c5cbece6d0]
2020-08-26 11:40:11 -07:00
Wenkai Du 0898fea746 Only use software barrier for synchronization (#258)
[ROCm/rccl commit: b0919dc46c]
2020-08-25 13:16:34 -07:00
Wenkai Du 5f49a0e088 Add NPS4 support on some models (#256)
* Add NPS4 support on some models

* Add XML models

[ROCm/rccl commit: 391bbf3f1e]
2020-08-19 11:03:20 -07:00
gilbertlee-amd 3e4ddd065b Upgrading various TransferBench features (#257)
[ROCm/rccl commit: ec9af40fcd]
2020-08-19 09:47:19 -06:00
Wenkai Du 3d5fb8142e Add another Rome model (#249)
* Add another Rome model

* Add gfx908 4P3L models and support

* Revert "Use cached value for detecting GDR support only once"

This reverts commit 0108a1219d.

* Skip using ibverb for GPU direct RDMA detection

* Fine tune one Rome model

[ROCm/rccl commit: a51e4071e3]
2020-08-17 10:51:02 -07:00
gilbertlee-amd 1a9b00a7fd Fixes to make TransferBench compile for hipclang (#254)
[ROCm/rccl commit: c985478133]
2020-08-13 12:25:28 -06:00
saadrahim 67bb880b8b Adding gfx908 to CI (#253)
[ROCm/rccl commit: 6d8e19929c]
2020-08-13 11:07:33 -06:00
Wenkai Du f242a2f0b0 Collect gcnArch and hipDeviceArch_t in XML (#252)
[ROCm/rccl commit: 7e3d8a31cc]
2020-08-12 15:48:38 -07:00
saadrahim f309fb5b29 Cleaning up CI code be removing overrides (#251)
[ROCm/rccl commit: 50af2e9b66]
2020-08-12 12:38:10 -06:00
Wenkai Du e5ec2d94d5 Merge pull request #248 from wenkaidu/2.7.8
2.7.8

[ROCm/rccl commit: 066223333d]
2020-08-11 08:20:37 -07:00
Wenkai Du 14ad6ff3b4 Merge remote-tracking branch 'nccl/master' into 2.7.8
[ROCm/rccl commit: 7e3f841fab]
2020-08-10 16:11:00 +00:00
Wenkai Du 26c540abb8 Merge pull request #247 from wenkaidu/rome
Additional Rome models support

[ROCm/rccl commit: 3c46cb8ad4]
2020-08-07 10:56:12 -07:00
MurtadhaAldallal f1373612b0 Update rccl_prim_test.cpp (#246)
Adding doublelocalcopy operation and freeing buffer memory at end.
DoubleLocalCopy Patch Added

[ROCm/rccl commit: 390c63cf0d]
2020-08-07 08:20:14 -07:00
Wenkai Du c9815aaa36 Add more Rome 4P2H models
[ROCm/rccl commit: 09ef75656a]
2020-08-06 18:20:02 +00:00
Stanley Tsang bbc4b72ebe Adding static library building option. (#244)
* Adding static library building option.

* Disabling running tests for static build

* Removing static packaging in CI

Co-authored-by: Saad Rahim <saad.rahim@amd.com>

[ROCm/rccl commit: c5d4d9eb76]
2020-08-06 11:19:43 -06:00
saadrahim e5432857db Download GTest if not found in system (#237)
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>

[ROCm/rccl commit: 0dc019e35f]
2020-08-06 09:36:58 -06:00
Jack Snyder dca19952fd Setting type when gpu sub node is discovered
[ROCm/rccl commit: de49a77074]
2020-08-05 13:39:23 -07:00
Eric Badger d6a78cb1c7 Don't require NIC devices to have specific PCI class
If a PCI node is the parent of a NIC, treat it as such, regardless of
the PCI class code for the device. This allows non-traditional devices
to act as NICs via the net plugin mechanism.

For consistency, treat GPUs similarly.


[ROCm/rccl commit: 700c0e0f24]
2020-08-05 12:46:29 -07:00
Wenkai Du 5f96f13e75 Allow setup ring through NCCL_RINGS to facilitate testing
[ROCm/rccl commit: 5b03132ace]
2020-08-04 21:07:00 +00:00
Wenkai Du 22a6211eaf Improve 4P2H topology on Rome (#243)
1. Use bi-directional rings
2. GPU search is sorted by PCI device ID to get consistent results

[ROCm/rccl commit: d1e20b4c5e]
2020-07-28 14:21:44 -07:00
David Addison 4acaffebb3 2.7.8-1
Fix collective mismatch error when using ncclSend/ncclRecv


[ROCm/rccl commit: 033d799524]
2020-07-27 16:34:09 -07:00
Wenkai Du 487f93b83f Topology tuning for 4P2H on Rome (#242)
* Topology tuning for 4P2H on Rome

* Use ncclTopoIdToIndex

[ROCm/rccl commit: e7a10aa0e4]
2020-07-27 11:53:57 -07:00
Wenkai Du 49c667ac8a ib-test: support multiple channels (#241)
[ROCm/rccl commit: 8d5fb920b6]
2020-07-27 11:03:12 -07:00
Sourav Chakraborty e55a3f20ba add 4 node 8P6L 1 NIC 2nd Hive model
[ROCm/rccl commit: 2475daafee]
2020-07-22 16:27:15 +00:00
Sourav Chakraborty b3306d1c13 simplify model definitions in topo expl
[ROCm/rccl commit: db55afb014]
2020-07-22 16:05:53 +00:00
Wenkai Du f604fc774e Add 8P6L multi-node models (#239)
[ROCm/rccl commit: d5f90e19b5]
2020-07-21 14:10:36 -07:00
Stanley Tsang 56d8c7c893 Adding better naming to unit tests for filtering; adding short and full unit test suites (#235)
[ROCm/rccl commit: 684f3e6af4]
2020-07-21 12:19:47 -06:00
Wenkai Du cb80d93b11 Fix RCCL build package name (#236)
[ROCm/rccl commit: 35c5a7fe45]
2020-07-20 14:43:00 -07:00
saadrahim f80136bbba Changing GTest inclusion in cmake to use find_package (#234)
* GTest is used via find_package. No longer downloaded in cmake.

* Adding error handling

[ROCm/rccl commit: 99a491273f]
2020-07-15 20:51:48 -06:00
saadrahim b600c00292 Changing dependency to hip-rocclr (#228)
[ROCm/rccl commit: 7f93aa7e53]
2020-07-14 17:49:56 -06:00
Wenkai Du 3e2c9054cd Change default channels duplication for chordal ring (#233)
[ROCm/rccl commit: ab787c767e]
2020-07-14 15:16:50 -07:00
gilbertlee-amd b83acc8032 Removing UnitTest as install, removing unused env var (#231)
[ROCm/rccl commit: f87ba17737]
2020-07-10 09:30:28 -06:00
Wenkai Du 47dfed5f01 Revert "Split primitive class to smaller structures" (#230)
This reverts commit 622b49e80a.

[ROCm/rccl commit: 5215130168]
2020-07-08 11:06:50 -07:00
Wenkai Du 4a3b58ac3a Match RCCL package name to API version (#229)
[ROCm/rccl commit: 1addf4f196]
2020-07-07 13:30:39 -07:00
Riatre Foo e791f8c8c7 Fix build action order
Add $(INCTARGETS) to build dependencies of %.o and $(DEVICELIB).
As there were no dep files during the first build, Make may kick off source
compilation before nccl.h got generated, which leads to occasional build
failures on systems with high core count. The build failure could be
reproduced reliably with a `sleep 5` in $(INCDIR)/nccl.h rule.


[ROCm/rccl commit: 2d8601701d]
2020-07-07 10:20:51 -07:00
Stanley Tsang 3dd54437fe Adding appropriate references in rccl-prim-test (#227)
Adding appropriate references to rccl-prim-test.

[ROCm/rccl commit: 9bd4c14603]
2020-07-06 10:15:03 -06:00
Wenkai Du c7805f224e Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: da3b197d6c]
2020-07-01 16:51:25 -07:00
Wenkai Du 2f99c7bbad topo_expl: each rank needs to have its own memory for graphs (#225)
[ROCm/rccl commit: d3548cc474]
2020-07-01 15:11:02 -07:00
Wenkai Du e8da2a0da6 topo_expl: fix broken build (#224)
[ROCm/rccl commit: a6be82f5ab]
2020-06-30 11:11:23 -07:00