* Support gfx950 in topo_expl
* Fix dependencies and fetch fmt from sources
* Remove third_party folder in make clean
* Add empty target when fmt is found
* Add MI350 example
* Update README.md
---------
Co-authored-by: isaki001 <ioannissakiotis@gmail.com>
[ROCm/rccl commit: dfad51e3c9]
* Internal RCCL/NCCL functionality exposed when RCCL_EXPOSE_STATIC is enabled
* Algo/protocol/max channels can be obtained with the new RCCL API
* Introduce rccl_static and rccl_static_inline macros to work around invisible functions in core source files like enqueue.cc
* Add usage example in topo-explorer tool
[ROCm/rccl commit: 82afb2bcfe]
* Fix compiler issues due to broken compatibility
* Fix segfault and pass rank instead of busid and add a pointer to cover a new algorithm
[ROCm/rccl commit: aace4e27f8]
* removed gfx940 and gfx941
* removed gfx940 and gfx941
* Update "gfx94" to "gfx942" in init.cc
* Updated remaining "gfx94" updates to "gfx942"
* Update filenames and variables from gfx940 to gfx942
---------
Co-authored-by: akolliasAMD <akollias@amd.com>
[ROCm/rccl commit: 6505639cf4]
* Add Topologies for 16-GPU gfx942 SuperNode
- Add GigaIO topologies to tools/topo_expl for dev and testing
- Add GigaIO Columba 16 GPU romeModel and adjust topology
matching algorithm in rome_models for 16 GPU system
- Fix bug which failed to match Rome Model when using subsets
of system resources (i.e. ROCR_VISIBLE_DEVICES is set)
- Fixes for topo_expl
* Fix bug w/ 1H16P
[ROCm/rccl commit: a05329bd0d]
* Add another rome model and override
* Fix bug
* Fix typo
* Add ring
* Update ring
* Fix model matching
* Clean up
* Clean up
* Reverse rings for NCCL_RINGS input
* Only reverse NCCL_RINGS for ring graph
* Fix mapping issue when using NCCL_RINGS
* Add NCCL_RINGS_REMAP to handle inconsistant net names
[ROCm/rccl commit: 532b70afb6]
- Modifies the ring creation algorithm to be friendlier to rail-optimized topologies (should not affect classic fabric topologies)
[ROCm/rccl commit: 4cb62f999a]
* initial checkin
* resolve cr comments
* resolve the build issue
* fix the data correctless issue
* update fp8 header file and update the unit test for fp8 support
* remove fp16 from fp8 headers
* fix ut issue and catch up the latest code from develop
* udate according to cr comments
* update ut according to cr comments
* update num floats for each SumPostDiv from 4 to 6
* update fp8 header file name
* fix the typo
[ROCm/rccl commit: 6777e65c1d]
We use CMake to determine if we're compiling against a version of ROCm that supports gcnArchName and handles architecture checking appropriately. It includes a few helper functions as drop ins for the functionality we used gcnArch for before; sometimes to enable flags, and sometimes to set frequencies.
[ROCm/rccl commit: e58ec78d35]