Commit Graph

500 Commits

Author SHA1 Message Date
Wenkai Du dcad0ef7cb Fix incorrect pointer checking for scatter and gather (#285) 2020-10-19 13:27:09 -07:00
gilbertlee-amd 9b3f762b68 Removing unnecessary flags from CI (#278)
* Removing unnecessary flags from CI

* Re-adding HSA_FORCE_FINE_GRAIN_PCIE in CI
2020-10-19 13:08:24 -06:00
saadrahim 49aa6d7afe Updating copyright for documentation (#282) 2020-10-19 13:07:15 -06:00
Wenkai Du a7deecb104 Merge pull request #279 from wenkaidu/nccl_sync
Sync up with latest NCCL master branch
2020-10-16 11:21:35 -07:00
Eiden Yoshida 205b5507b4 Update sramecc and xnack to ANY (#284)
Co-authored-by: Tony <Tony.Tye@amd.com>
Co-authored-by: Wenkai Du<Wenkai.Du@amd.com>
2020-10-16 00:25:18 -06:00
Wenkai Du c835d8263a Merge remote-tracking branch 'nccl/master' into nccl_sync 2020-10-15 18:42:38 -04:00
gilbertlee-amd 84a2541e01 Revert "Initial support for clique-based kernels (#276)" (#280)
This reverts commit 2b8184808d.
2020-10-15 11:30:18 -07:00
Sylvain Jeaugey 0e14394c5f Fix affinity move 2020-10-13 16:58:05 -07:00
Sylvain Jeaugey c6dbdb0084 Make sure proxy threads inherit the CPU affinity. 2020-10-13 16:37:52 -07:00
Wenkai Du 33babcb5e2 Update Rome single node models (#277) 2020-10-13 13:33:09 -07:00
gilbertlee-amd 2b8184808d Initial support for clique-based kernels (#276)
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Wenkai Du ae008fd2db Rework Rome detection and add multiple network ports models (#274)
* Rework Rome detection and add multiple network ports models

* Remove unused opCount in p2p transport
2020-10-07 13:37:36 -07:00
Wenkai Du 88a062342b Don't download GTest unless building unit test (#275) 2020-10-02 15:25:40 -07:00
Wenkai Du b871ea3c0c Add Alltoallv RCCL kernel implementation (#269)
* Add alltoallv API and implementation

* Extend Rome P2P channel limit to multinode and alltoall kernels

* topo_expl: fix compilation and sync up with main

* gtest: use RCCL alltoallv API

* Code review changes
2020-09-30 16:25:36 -07:00
nunnikri aa985bfb7e SWDEV-253325 : Chaning amdgpu-target to cuda-gpu-arch (#268) 2020-09-25 15:44:56 -06:00
Stanley Tsang acca2ae20a Updating inline asm to not require explicit L1 cache invalidation (#270) 2020-09-25 13:46:26 -06:00
gilbertlee-amd ee262819a7 New TransferBench features (#273)
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars
2020-09-25 12:20:48 -06:00
gilbertlee-amd 01bd2573db Changes to topology based on XGMI (#272)
* Alterations to topology search to improve XGMI-enabled nodes
2020-09-25 12:20:09 -06:00
Wenkai Du 44fcde7835 Ensure all ranks on same send/receive or alltoall kernel path (#271) 2020-09-24 08:25:04 -07:00
Wenkai Du d871fceb54 Change network plugin name to librccl-net.so (#266) 2020-09-18 13:23:30 -07:00
Wenkai Du 45a8f09e97 Merge pull request #267 from wenkaidu/p2p
Limit P2P channels on Rome
2020-09-18 11:35:35 -07:00
Wenkai Du 42955f5f4f Limit P2P channels on Rome 2020-09-17 17:20:32 -07:00
lijietang bbe233f8c1 Add rccl bw test script in tools (#255) 2020-09-11 16:59:03 +08:00
Stanley Tsang 8c90aefb6d Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos (#265)
* Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos

* Removing potentially unneccessary dependencies from install script
2020-09-10 17:27:22 -06:00
Wenkai Du 60819dcf8d Merge pull request #262 from wenkaidu/alignment
Make data alignment requirements matching ISA manual
2020-09-08 10:40:42 -07:00
Stanley Tsang f2e5db7bf7 Adding XNACK flags. (#264)
* Adding XNACK flags.
2020-09-08 11:36:30 -06:00
Aaron Enye Shi 958b213428 Add RCCL Static Lib Creation with -fgpu-rdc
RCCL uses -fgpu-rdc to compile its source objects. When linking
the RCCL static library, the link and archive step must do through
hipcc and uses the flag --emit-static-lib. When compiling
UnitTests, the librccl.a must be consumed through -l and -L.
2020-09-03 11:25:41 -04:00
Wenkai Du e2042ccf8a Fix broken profiling build (#263) 2020-09-02 15:39:52 -07:00
Wenkai Du b163a8898f gtest: add alltoallv test 2020-09-02 21:28:32 +00:00
Wenkai Du 4751992231 Make data alignment requirements matching ISA manual
From https://developer.amd.com/wp-content/resources/Vega_Shader_ISA.pdf

8.1.7. Alignment
For Dword or larger reads or writes, the two LSBs of the byte-address
are ignored, thus forcing Dword alignment.
2020-09-01 21:21:58 +00:00
Wenkai Du 4180e6409e Fix incorrect threads split in sendrecv (#261) 2020-08-31 17:33:22 -07:00
Wenkai Du c5cbece6d0 Increase minimal channels for gfx908 (#259) 2020-08-26 11:40:11 -07:00
Wenkai Du b0919dc46c Only use software barrier for synchronization (#258) 2020-08-25 13:16:34 -07:00
Wenkai Du 391bbf3f1e Add NPS4 support on some models (#256)
* Add NPS4 support on some models

* Add XML models
2020-08-19 11:03:20 -07:00
gilbertlee-amd ec9af40fcd Upgrading various TransferBench features (#257) 2020-08-19 09:47:19 -06:00
Wenkai Du a51e4071e3 Add another Rome model (#249)
* Add another Rome model

* Add gfx908 4P3L models and support

* Revert "Use cached value for detecting GDR support only once"

This reverts commit 67c8e72ce3.

* Skip using ibverb for GPU direct RDMA detection

* Fine tune one Rome model
2020-08-17 10:51:02 -07:00
gilbertlee-amd c985478133 Fixes to make TransferBench compile for hipclang (#254) 2020-08-13 12:25:28 -06:00
saadrahim 6d8e19929c Adding gfx908 to CI (#253) 2020-08-13 11:07:33 -06:00
Wenkai Du 7e3d8a31cc Collect gcnArch and hipDeviceArch_t in XML (#252) 2020-08-12 15:48:38 -07:00
saadrahim 50af2e9b66 Cleaning up CI code be removing overrides (#251) 2020-08-12 12:38:10 -06:00
Wenkai Du 066223333d Merge pull request #248 from wenkaidu/2.7.8
2.7.8
2020-08-11 08:20:37 -07:00
Wenkai Du 7e3f841fab Merge remote-tracking branch 'nccl/master' into 2.7.8 2020-08-10 16:11:00 +00:00
Wenkai Du 3c46cb8ad4 Merge pull request #247 from wenkaidu/rome
Additional Rome models support
2020-08-07 10:56:12 -07:00
MurtadhaAldallal 390c63cf0d Update rccl_prim_test.cpp (#246)
Adding doublelocalcopy operation and freeing buffer memory at end.
DoubleLocalCopy Patch Added
2020-08-07 08:20:14 -07:00
Wenkai Du 09ef75656a Add more Rome 4P2H models 2020-08-06 18:20:02 +00:00
Stanley Tsang c5d4d9eb76 Adding static library building option. (#244)
* Adding static library building option.

* Disabling running tests for static build

* Removing static packaging in CI

Co-authored-by: Saad Rahim <saad.rahim@amd.com>
2020-08-06 11:19:43 -06:00
saadrahim 0dc019e35f Download GTest if not found in system (#237)
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
2020-08-06 09:36:58 -06:00
Jack Snyder de49a77074 Setting type when gpu sub node is discovered 2020-08-05 13:39:23 -07:00
Sylvain Jeaugey 3d63f89068 Merge pull request #364 from badgerious/net-class
Add GPUs and NICs based on XML sub tags instead of PCI class.
2020-08-05 12:52:38 -07:00
Eric Badger 700c0e0f24 Don't require NIC devices to have specific PCI class
If a PCI node is the parent of a NIC, treat it as such, regardless of
the PCI class code for the device. This allows non-traditional devices
to act as NICs via the net plugin mechanism.

For consistency, treat GPUs similarly.
2020-08-05 12:46:29 -07:00