* Add initial commit to increase tb size to 512
* Fix LL perf issue when subset of NCCL_MAX_NTHREADS is used
Adding a constant to barrier_generic logic from using fallback logic when nthreads < NCCL_MAX_NTHREADS and nthreads == blockDim.X
* Adjust nthreads for LL
* Opt threads for reduce_scatter upper small range
* Add macro for single node
* Restrict MSCCL to 256 threads to prevent mem access fault
* Support pre-MI350 compatibility
* Partially refactor threadblock size override
* Use const macros instead of numerals
* opt out of unused function
[ROCm/rccl commit: 12f51ba8bf]
* Add initial commit to increase tb size to 512
* Fix LL perf issue when subset of NCCL_MAX_NTHREADS is used
Adding a constant to barrier_generic logic from using fallback logic when nthreads < NCCL_MAX_NTHREADS and nthreads == blockDim.X
* Adjust nthreads for LL
* Opt threads for reduce_scatter upper small range
* Add macro for single node
* Restrict MSCCL to 256 threads to prevent mem access fault
* Support pre-MI350 compatibility
* Partially refactor threadblock size override
* Use const macros instead of numerals
* opt out of unused function
Added json and csv output format support for newly
added xgmi link_status. Aligned legend.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* SWDEV-533237 Add initial support for hipOccupancyAvailableDynamicSMemPerBlock API
* SWDEV-533237 Add hipOccupancyAvailableDynamicSMemPerBlock wrapper for nvidia
* SWDEV-533237 Add implementation of hipOccupancyAvailableDynamicSMemPerBlock API
* SWDEV-533237 Add LDSAlignment field in Isa table
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* Add examples to tools folder
* Correct P2P memory access section
* Sync poriting guide
* Add HIP Graph tutorial
* Add hint about using amdgpu-dkms for IPC API
* Add a few more env variables
* Added amd-smi set --pcie command
* Removed current pcie level due to it not being static
* Added pcie information to static --bus
---------
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
* Added amd-smi set --pcie command
* Removed current pcie level due to it not being static
* Added pcie information to static --bus
---------
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 9e3537d778]
- **Added evicted_time metric for kfd processes**.
- Time that queues are evicted on a GPU in milliseconds
- Added to CLI in `amd-smi monitor -q` and `amd-smi process`
- Added to C API and Python API:
- amdsmi_get_gpu_process_list()
- amdsmi_get_gpu_compute_process_info()
- amdsmi_get_gpu_compute_process_info_by_pid()
---------
Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
- **Added evicted_time metric for kfd processes**.
- Time that queues are evicted on a GPU in milliseconds
- Added to CLI in `amd-smi monitor -q` and `amd-smi process`
- Added to C API and Python API:
- amdsmi_get_gpu_process_list()
- amdsmi_get_gpu_compute_process_info()
- amdsmi_get_gpu_compute_process_info_by_pid()
---------
Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
[ROCm/amdsmi commit: 2144cfbba4]
somehow the test whether we requested MPI support or not stopped
working, although no obvious code change can be located.
Make the if-statement more stringent by explicitely testing whether
USE_MPI_SUPPORT is "ON".
[ROCm/rocshmem commit: c0285ac0ce]
somehow the test whether we requested MPI support or not stopped
working, although no obvious code change can be located.
Make the if-statement more stringent by explicitely testing whether
USE_MPI_SUPPORT is "ON".
* Implements casting key loads and stores to address_space(1) so that vector global load and store instructions are emitted by the compiler instead of more costly flat loads and stores
* Removes nontemporality from some key stores for gfx950.
[ROCm/rccl commit: e69b11eba5]
* Implements casting key loads and stores to address_space(1) so that vector global load and store instructions are emitted by the compiler instead of more costly flat loads and stores
* Removes nontemporality from some key stores for gfx950.
* Updated CODEOWNERS to instead use RCCL-Reviewers team
* Apply suggestion from @nileshnegi
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
---------
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
[ROCm/rccl commit: f290e302d3]
* Updated CODEOWNERS to instead use RCCL-Reviewers team
* Apply suggestion from @nileshnegi
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
---------
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>