Gráfico de commits

199 Commits

Autor SHA1 Mensaje Fecha
Wenkai Du c424979c14 ll_latency_test: fix time calculation (#825)
* ll_latency_test: fix time calculation

* Measure time after barrier

* Read time stamp only from thread 0
2023-07-27 09:04:35 -07:00
Wenkai Du 1c1ec096e2 tools: Add LL latency test (#820)
* Add LL latency test

* Correct name in usage
2023-07-25 20:08:04 -07:00
Bertan Dogancay 8bab4f04b7 Implement RCCL Replayer (#817)
* Implement RCCL Replayer
2023-07-24 16:26:22 -06:00
Wenkai Du a7fcd58a97 Enable gfx94x (#808) (#816)
(cherry picked from commit 94da229a7788d74685d1591a4e75a8341de64f41)
2023-07-21 07:31:27 -07:00
Ziyue Yang b1cddcaf9a Add GPU P2P ping-pong latency test tool (#804)
* Add GPU P2P ping-pong latency test tool

* Address comments

* Fix IPC issue in gfx94x
2023-07-14 07:41:29 -07:00
Wenkai Du f41ea11444 rccl-prim-test: calculate iterations' standard deviation (#803)
* rccl-prim-test: calculate iterations' standard deviation

* Add default ring configuration for gfx940

* Use hipDeviceMallocUncached on gfx94x
2023-07-13 11:05:50 -07:00
Wenkai Du 43f13cd25a rccl-prim-test: calculate throughput standard deviations (#802) 2023-07-12 10:04:40 -07:00
Wenkai Du abd0615351 Merge remote-tracking branch 'nccl/master' into develop 2023-06-26 22:51:56 +00:00
Bertan Dogancay 0c77c66221 Disable Colltrace for --fast option (#778)
* Disable Colltrace for --fast option

* Limit nprocs for CI
2023-06-21 14:16:09 -06:00
Bertan Dogancay f35777e9b0 improve compilation time and create timetrace plot (#773)
* improve compilation time and create time-trace plot

* set default value for nproc
2023-06-14 09:17:51 -06:00
akolliasAMD 9cdac774ea Wall clock update and npkit trace script Update (#771)
* changed builtin clock to wall_clock64
* updated npkit_Trace_generator to the new version of npkit
2023-06-07 17:47:10 -06:00
gilbertlee-amd 20b567caac Updating NOTICES.txt and LICENSE.txt (#770) 2023-06-07 09:45:03 -06:00
Wenkai Du 3af90902c8 Add NCCL_NCHANNELS_PER_PEER override (#767)
Also fix topol_expl build issue
2023-06-06 08:41:38 -07:00
akolliasAMD 2b1efa9e9a added time results on npkit generator (#749) 2023-05-30 12:57:25 -06:00
akolliasAMD c88475462b added modified npkit_trace_generator.py to scripts (#738)
* added modified npkit_trace_generator.py to scripts
2023-05-09 10:11:35 -06:00
Wenkai Du addbf4bd90 rccl-prim-test: minor update (#718) 2023-04-03 07:30:04 -07:00
Ziyue Yang e3b2342f39 MSCCL: Improve executor and integrate scheduler (#694)
* MSCCL: improve executor and add scheduler for testing

* Use external scheduler

* Fix cmake error

* Address comments

* Fix thread safe issue

* Make MSCCL lifecycle APIs thread safe

* Make MSCCL internal scheduler aware of topology hint

* Revise error message
2023-03-14 14:34:25 -07:00
Wenkai Du e1cb45ff22 Merge remote-tracking branch 'nccl/master' into HEAD 2023-02-04 01:44:43 +00:00
Wenkai Du a0dd8e0b84 topo_expl: fix broken build by adding hipify steps (#670) 2023-01-06 07:29:40 -08:00
Ziyue Yang adafc0f759 Add MSCCL Support (#658)
* Add MSCCL support

* Add alignment and message size checking

* Fix nRanks checking, in-place and out-of-place tests and group call handling

* Fix hipGraph unit test

* Change MSCCL init warning to INFO

* Revise license info
2022-12-12 15:51:04 -08:00
gilbertlee-amd faed69f9fc Graph unit tests (#656)
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Wenkai Du 94ad7f6f51 Update tuning table and fix topo_expl 2022-11-07 18:24:24 +00:00
Wenkai Du 4f0e223db4 Merge remote-tracking branch 'nccl/master' into develop 2022-10-20 15:41:29 +00:00
Wenkai Du fc554a2428 topo_expl: fix compilation error (#639) 2022-10-19 14:19:50 -07:00
gilbertlee-amd 10dbd2a452 Fixing formatting for copywrite (#638) 2022-10-19 13:43:21 -06:00
gilbertlee-amd ebb8b5bf63 Updating files for missing licenses (#637) 2022-10-14 13:49:16 -06:00
gilbertlee-amd bd7d589446 Removing TransferBench from tools (#632)
Point to new TransferBench repo
2022-09-30 11:53:32 -06:00
Wen-Heng (Jack) Chung 84054c3b30 Tweak unroll factors. 2022-09-22 13:03:04 -05:00
Gilbert Lee 009e79623f Merge branch 'develop' into 2.13.4 2022-09-09 23:07:04 +00:00
gilbertlee-amd dd56135a9a Updating stream caching (#614)
- Adding non-captured hipStream for use in setup
2022-09-09 16:30:15 -06:00
gilbertlee-amd 65d78e9a1d GraphBench (#613)
Adding simple GraphBench tool for comparing RCCL hipGraph performance
2022-09-09 12:12:25 -06:00
Wenkai Du a79d9e3586 Merge remote-tracking branch 'nccl/master' into develop 2022-09-09 16:05:38 +00:00
akolliasAMD 06bce9d0c9 added stream synch after hipMemset (#609) 2022-08-30 16:18:37 -06:00
arvindcheru 2cb2f9493a HIP Path default updated to ROCM_PATH (reorg path) (#592)
Updated default path for hip to ROCM_PATH (/opt/rocm instead of /opt/rocm/hip) as per new/current structure.
2022-08-04 13:38:41 -04:00
Edgar 0336ffdf70 Introduce multi-rank support per device.
This is a single commit of the source code changes required to
introduce support for multiple ranks per device.
A new interface (ncclCommRankInitMulti) has to be used to make use of
this new feature.
2022-06-10 14:23:12 +00:00
Wenkai Du ef499c4810 Add another Rome model (#553)
* Add another Rome model

* Add option to force enable intranet on single node

* Limit p2p channels to number of ranks

* Refine p2p channels handling
2022-05-31 11:31:30 -07:00
Wenkai Du c5b77121f0 Update Rome model (#552) 2022-05-26 09:59:23 -07:00
akolliasAMD 98f0809a39 Added creation of new tree and added switch for using treesplit for specific cases (#551) 2022-05-25 18:55:14 -04:00
Wenkai Du 283dc86a73 Refine and add new Rome models (#548) 2022-05-17 08:23:59 -07:00
gilbertlee-amd 685bcea127 [TransferBench] Syncing with TransferBench v1.02 (#541) 2022-04-27 20:43:24 -06:00
Wenkai Du 063da25563 topo_expl: fix build and add tuning support (#539) 2022-04-26 15:40:07 -07:00
Wenkai Du d28e1cb44f Merge remote-tracking branch 'nccl/master' into develop 2022-04-18 11:15:25 -07:00
Wenkai Du 2151c79d14 Add new Rome model (#536) 2022-04-13 11:45:40 -07:00
Wenkai Du ba4c165bf3 Add new Rome model (#535) 2022-04-12 13:27:32 -07:00
gilbertlee-amd def6832287 Transfer bench single stream mode (#531)
- Adding single stream mode
- Removing some unused env vars
- Adding output to CSV mode for p2p benchmark, topology listing modes
2022-04-08 15:20:55 -06:00
Wenkai Du bbe780ca6c Support multiple tuning tables (#522)
* Support multiple tuning tables

* [UnitTests] Skip managed memory testing
2022-03-31 17:09:21 -07:00
gilbertlee-amd 2d558c9abc Adding explicit request for coarse-grained host memory due to changes in HipHostMalloc (#517) 2022-03-25 13:05:07 -06:00
Wenkai Du cd17cf6dce Update Rome model matching and add new models (#516)
* Update Rome model matching and add new models

* Add missing file

* Models update
2022-03-21 10:54:40 -07:00
Ziyue Yang b569c0a1db Add Pivot AllToAll algorithm for Rome model (#503)
* add a2a pivot interface

* remove debug info

* address comments

* fix bug

* remove custom script

* address comments

* fix bug
2022-02-20 21:09:47 -08:00
gilbertlee-amd f3c2cafd9d [TransferBench] Fix for cases with subsets of configured numa nodes (#495) 2022-02-07 12:16:19 -07:00