* Support gfx950 in topo_expl
* Fix dependencies and fetch fmt from sources
* Remove third_party folder in make clean
* Add empty target when fmt is found
* Add MI350 example
* Update README.md
---------
Co-authored-by: isaki001 <ioannissakiotis@gmail.com>
* Internal RCCL/NCCL functionality exposed when RCCL_EXPOSE_STATIC is enabled
* Algo/protocol/max channels can be obtained with the new RCCL API
* Introduce rccl_static and rccl_static_inline macros to work around invisible functions in core source files like enqueue.cc
* Add usage example in topo-explorer tool
* initial checkin
* resolve cr comments
* resolve the build issue
* fix the data correctless issue
* update fp8 header file and update the unit test for fp8 support
* remove fp16 from fp8 headers
* fix ut issue and catch up the latest code from develop
* udate according to cr comments
* update ut according to cr comments
* update num floats for each SumPostDiv from 4 to 6
* update fp8 header file name
* fix the typo
This is a single commit of the source code changes required to
introduce support for multiple ranks per device.
A new interface (ncclCommRankInitMulti) has to be used to make use of
this new feature.
* Add alltoallv API and implementation
* Extend Rome P2P channel limit to multinode and alltoall kernels
* topo_expl: fix compilation and sync up with main
* gtest: use RCCL alltoallv API
* Code review changes