rocm-systems

Автор	SHA1	Сообщение	Дата
David Addison	4acaffebb3	2.7.8-1 Fix collective mismatch error when using ncclSend/ncclRecv [ROCm/rccl commit: `033d799524`]	2020-07-27 16:34:09 -07:00
Sylvain Jeaugey	89072c82e5	2.7.3-1 Add support for A100 GPU and related platforms. Add support for CUDA 11. Add support for send/receive operations (beta). [ROCm/rccl commit: `5949d96f36`]	2020-06-08 09:31:44 -07:00
Sylvain Jeaugey	40adc74496	2.6.4-1 Add support for network collectives. Add support for XML topology dump/injection. Add text values for GDR and P2P Levels, including "NVL". Add speed detection for PCI, Infiniband and Ethernet cards. Add CPU detection for ARM and AMD CPUs. Add support for adaptive routing on Infiniband. Change NET plugin API to v3 : merge PCI path and GPU pointer capability into a single structure and add other properties. [ROCm/rccl commit: `b221128eca`]	2020-03-20 14:58:36 -07:00
Christian Sigg	ff74ebdcea	Fix clang build (#274 ) The attribute is called `optnone`, not `noopt`. [ROCm/rccl commit: `3899f6e0f2`]	2019-12-09 09:31:13 -08:00
Sylvain Jeaugey	e5a17ee58d	Fix clang compilation [ROCm/rccl commit: `aa15dfb29c`]	2019-12-06 09:55:54 -08:00
Christian Sigg	4984d5ce0b	Fix clang build (#271 ) Clang doesn't understand `optimize("O0")`. It has `noopt`, which GCC doesn't understand. Wrap the difference in a macro. [ROCm/rccl commit: `8c564e9b57`]	2019-12-06 09:14:55 -08:00
Sylvain Jeaugey	71560fd67b	2.5.6-1 (#255 ) Add LL128 Protocol. Rewrite the topology detection and tree/ring creation (#179). Improve tree performance by sending/receiving from different GPUs. Add model-based tuning to switch between the different algorithms and protocols. Rework P2P/SHM detection in containers (#155, #248). Detect duplicated devices and return an error (#231). Add tuning for GCP [ROCm/rccl commit: `299c554dcc`]	2019-11-19 14:57:39 -08:00
David Addison	d57c0b0f92	Updated PR#196 to use a common hash function [ROCm/rccl commit: `fad079a8ae`]	2019-08-14 10:08:39 -07:00
David Addison	bb5b11fa23	Merge branch 'shm' of git://github.com/lowintelligence/nccl into lowintelligence-shm [ROCm/rccl commit: `01d1836668`]	2019-08-14 09:45:45 -07:00
Ke Wen	3c13a4d1bb	Merge branch 'master' into HEAD [ROCm/rccl commit: `8e04d80382`]	2019-06-25 13:39:08 -07:00
Ke Wen	b91d8170f8	2.4.8-1 Fix #209: improve socket transport performance Split transfers over multiple sockets Launch multiple threads to drive sockets Detect AWS NICs and set nsockets/nthreads accordingly [ROCm/rccl commit: `7c72dee660`]	2019-06-25 13:22:47 -07:00
Felix Abecassis	d2f579ba8b	Fix out-of-bounds read in ncclStrToCpuset (#233 ) The affinityStr string was not null-terminated but was passed to strlen(3). Signed-off-by: Felix Abecassis <fabecassis@nvidia.com> [ROCm/rccl commit: `37e4f8729e`]	2019-06-21 10:25:08 +02:00
David Addison	17c8317cb1	NCCL 2.4.6-1 Added detection of IBM/Power NVLink bridge device. Add NUMA support to PCI distance calculations. Added NCCL_IGNORE_CPU_AFFINITY env var. Fix memory leaks; GithubIssue#180 Compiler warning fix; GithubIssue#178 Replace non-standard variable length arrays. GithubIssue#171 Fix Tree+Shared Memory crash. GithubPR#185 Fix LL cleanup hang during long running DL jobs. Fix NCCL_RINGS environment variable handling. Added extra checks to catch repeat calls to ncclCommDestroy() GithubIssue#191 Improve bootstrap socket connection reliability at scale. Fix hostname hashing issue. GithubIssue#187 Code cleanup to rename all non device files from .cu to .cc [ROCm/rccl commit: `f40ce73e89`]	2019-04-05 13:05:45 -07:00

13 Коммитов