Files
rocm-systems/tests/functional_tests/tester_arguments.cpp
T
Anatolii Rozanov d0c8380650 Add host API for *_on_stream operations (#340)
* Add functional test for barrier_all_on_stream

* Add rocshmem_barrier_all_on_stream support for GDA and RO backends

Implements rocshmem_barrier_all_on_stream operation for
GPU Direct Access and Reverse Offload backends.

Previously, rocshmem_barrier_all_on_stream was only supported for IPC backend.

* Add functional test for rocshmem_broadcastmem_on_stream

* Add host-side rocshmem_broadcastmem_on_stream API

Implement stream-based broadcast collective operation

- Add rocshmem_broadcastmem_on_stream host API and kernel implementation
- Add functional test TeamBroadcastmemOnStreamTester with multi-stream
  support and correctness verification
- Use per-workgroup contexts to avoid contention across parallel streams

API:
rocshmem_broadcastmem_on_stream(team, dest, source, nelems, pe_root, stream)

* Add functional test for rocshmem_getmem_on_stream

* Add host-side rocshmem_getmem_on_stream API

Implement stream-based point-to-point RMA get operation

- Add rocshmem_getmem_on_stream host API and kernel implementation
- Support for asynchronous getmem operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group collective getmem for efficient memory transfer

API:
rocshmem_getmem_on_stream(dest, source, nelems, pe, stream)

(AI Assist)

* Add host-side rocshmem_putmem_on_stream API

- Add rocshmem_putmem_on_stream for asynchronous remote writes
- Support for concurrent RMA operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group device collective operation

API:
rocshmem_putmem_on_stream(dest, source, bytes, pe, stream)

(AI Assist)

* Add functional test for rocshmem_putmem_on_stream

* Add host-side rocshmem_putmem_signal_on_stream API

Enables asynchronous putmem operations with signaling on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_putmem_signal_kernel
- Host interface putmem_signal_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Public API

Function signature:
void rocshmem_putmem_signal_on_stream(void *dest, const void *source,
                                      size_t bytes, uint64_t *sig_addr,
                                      uint64_t signal, int sig_op,
                                      int pe, hipStream_t stream);

* Add functional test for rocshmem_putmem_signal_on_stream

* Add host-side rocshmem_signal_wait_until_on_stream API

Enables asynchronous signal wait operations on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_signal_wait_until_kernel
- Host interface signal_wait_until_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Native uint64_t support in wait_until API (generated from P2P_SYNC.py)

Function signature:
void rocshmem_signal_wait_until_on_stream(uint64_t *sig_addr, int cmp,
                                          uint64_t cmp_value,
                                          hipStream_t stream);

(AI Assist)

* Add functional test for rocshmem_signal_wait_until_on_stream

* Add documentation for stream API functions

This commit adds API documentation for the following host-side
stream functions:

- rocshmem_barrier_all_on_stream (collective routines)
- rocshmem_broadcastmem_on_stream (collective routines)
- rocshmem_getmem_on_stream (RMA operations)
- rocshmem_putmem_on_stream (RMA operations)
- rocshmem_putmem_signal_on_stream (signaling operations)
- rocshmem_signal_wait_until_on_stream (point-to-point sync)

The documentation includes function signatures, parameter descriptions,
and detailed explanations of asynchronous behavior and stream handling.

(AI Assist)

* Rename "bytes" -> "nelems"

* Add "_TEST_" to the variables used in tests

* Remove incorrect hipStreamDefault usage

hipStreamDefault is not a default stream. This is a flag.

If stream == nullptr, then just pass it to kernel. It will launch the kernel on the default stream
2025-12-09 08:55:46 -06:00

226 wiersze
6.9 KiB
C++

/******************************************************************************
* Copyright (c) Advanced Micro Devices, Inc. All rights reserved.
*
* SPDX-License-Identifier: MIT
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*****************************************************************************/
#include "tester_arguments.hpp"
#include <cstdlib>
#include <iostream>
#include <rocshmem/rocshmem.hpp>
#include "tester.hpp"
using namespace rocshmem;
TesterArguments::TesterArguments(int argc, char *argv[]) {
for (int i = 1; i < argc; i++) {
std::string arg = argv[i];
if (arg == "-t") {
i++;
num_threads = atoi(argv[i]);
} else if (arg == "-w") {
i++;
num_wgs = atoi(argv[i]);
} else if (arg == "-s") {
i++;
max_msg_size = atoll(argv[i]);
} else if (arg == "-a") {
i++;
algorithm = atoi(argv[i]);
} else if (arg == "-z") {
i++;
wg_size = atoi(argv[i]);
} else if (arg == "-c") {
i++;
coal_coef = atoi(argv[i]);
} else if (arg == "-o") {
i++;
op_type = atoi(argv[i]);
} else if (arg == "-ta") {
i++;
thread_access = atoi(argv[i]);
} else if (arg == "-x") {
i++;
shmem_context = atoi(argv[i]);
} else if (arg == "-m") {
int atomics_addr_mode = atoi(argv[i]);
if(atomics_addr_mode >= static_cast<int>(AddrMode::PerGrid) &&
atomics_addr_mode <= static_cast<int>(AddrMode::PerBlock)) {
addr_mode = static_cast<AddrMode>(atomics_addr_mode);
}
i++;
} else if (arg == "-n") {
i++;
loop = atoi(argv[i]);
loop_large = loop;
} else if (arg == "-nloop") {
i++;
loop = atoi(argv[i]);
} else if (arg == "-nlarge") {
i++;
loop_large = atoi(argv[i]);
} else if (arg == "-nskip") {
i++;
skip = atoi(argv[i]);
} else {
show_usage(argv[0]);
exit(-1);
}
}
TestType type = (TestType)algorithm;
switch (type) {
case AMO_FAddTestType:
case AMO_AddTestType:
case AMO_SetTestType:
case AMO_SwapTestType:
case AMO_FetchAndTestType:
case AMO_AndTestType:
case AMO_FetchOrTestType:
case AMO_OrTestType:
case AMO_FetchXorTestType:
case AMO_XorTestType:
case AMO_FCswapTestType:
case AMO_CswapTestType:
case AMO_FIncTestType:
case AMO_IncTestType:
case AMO_FetchTestType:
case BarrierAllTestType:
case WAVEBarrierAllTestType:
case WGBarrierAllTestType:
case TeamBarrierTestType:
case TeamWAVEBarrierTestType:
case TeamWGBarrierTestType:
case BarrierAllOnStreamTestType:
case SyncAllTestType:
case WAVESyncAllTestType:
case WGSyncAllTestType:
case TeamSyncTestType:
case SignalWaitUntilOnStreamTestType:
min_msg_size = 8;
max_msg_size = 8;
break;
case PingPongTestType:
case ShmemPtrTestType:
min_msg_size = 4;
max_msg_size = 4;
break;
case RandomAccessTestType:
case TeamAlltoallmemOnStreamTestType:
case TeamBroadcastmemOnStreamTestType:
min_msg_size = 4;
break;
case TeamFCollectTestType:
case TeamAllToAllTestType:
case TeamBroadcastTestType:
min_msg_size = 8;
break;
case TeamCtxInfraTestType:
case TeamCtxInfraTestSingleType:
case TeamCtxInfraTestBlockType:
case TeamCtxInfraTestOddEvenType:
max_msg_size = min_msg_size;
break;
case PutNBIMRTestType:
min_msg_size = max_msg_size;
break;
case PTestType:
case GTestType:
min_msg_size = 1;
max_msg_size = 1;
default:
break;
}
}
void TesterArguments::show_usage(std::string executable_name) {
std::cout << "Usage: " << executable_name << std::endl;
std::cout << "\t-t <number of rocshmem service threads>\n";
std::cout << "\t-w <number of workgroups>\n";
std::cout << "\t-s <maximum message size (in bytes)>\n";
std::cout << "\t-a <algorithm number to test>\n";
std::cout << "\t-z <WorkGroup Size>\n";
std::cout << "\t-c <Coalescing Coefficient>\n";
std::cout << "\t-o <Operation type for the random_access test>\n";
std::cout << "\t-ta <Number of Thread Accessing the communication>\n";
std::cout << "\t-x <shmem context>\n";
std::cout << "\t-m Atomics Address mode\n";
std::cout << "\t-n Set both loop and loop_large count\n";
std::cout << "\t-nloop Set loop count\n";
std::cout << "\t-nlarge Set loop_large count\n";
std::cout << "\t-nskip Set skip/warmup count\n";
}
void TesterArguments::get_arguments() {
numprocs = rocshmem_n_pes();
myid = rocshmem_my_pe();
TestType type = (TestType)algorithm;
// Check if test requires exactly 2 PEs
// Tests that support arbitrary number of PEs are excluded
bool requires_two_pes = true;
switch (type) {
// Collective/barrier tests - support any number of PEs
case BarrierAllTestType:
case WAVEBarrierAllTestType:
case WGBarrierAllTestType:
case SyncAllTestType:
case WAVESyncAllTestType:
case WGSyncAllTestType:
case TeamSyncTestType:
case TeamWAVESyncTestType:
case TeamWGSyncTestType:
case TeamAllToAllTestType:
case TeamFCollectTestType:
case TeamReductionTestType:
case TeamBroadcastTestType:
case PingAllTestType:
case TeamBarrierTestType:
case TeamWAVEBarrierTestType:
case TeamWGBarrierTestType:
case TeamCtxInfraTestBlockType:
case TeamCtxInfraTestOddEvenType:
// On-stream tests - support any number of PEs
case TeamAlltoallmemOnStreamTestType:
case BarrierAllOnStreamTestType:
case TeamBroadcastmemOnStreamTestType:
case GetmemOnStreamTestType:
case PutmemOnStreamTestType:
case PutmemSignalOnStreamTestType:
case SignalWaitUntilOnStreamTestType:
requires_two_pes = false;
break;
default:
break;
}
if (requires_two_pes && numprocs != 2) {
if (myid == 0) {
std::cerr << "This test requires exactly two processes, we have "
<< numprocs << "\n";
}
exit(-1);
}
}