* add relaxed_ordering option
add an environment variable that allows to control setting the
IBV_ACCESS_RELAXED_ORDERING flag when registering memory with the
ibv_reg_mr* functions.
* missed a spot
* add support for compiling all backends
also include the logic to select backends either based on user requests
or through some heuristics
* checkpoint for compiling all backends
* final checkpoint
all tests seem to pass when compiling all three backends simultaneasly
and forcing to use any of the three Backends.
* update PR to new envvar system
* Revamp findibverbs to find ionic again
* gda ionic: rename ionic_sq_buf ionic_cq_buf
Avoid duplicating member names used by mlx5 gda.
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
* gda: move spin lock to util.hpp
Move spin lock out of ionic gda to util.hpp.
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
* gda ionic: assume latest fwabi changes
There is no firmware abi compatibility in this ionic gda code yet, so
assume we are using the latest firmware abi as of now.
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
* gda ionic: allow doorbell with incomplete wqes
Use spin lock to ensure doorbell is only written with an increasing
producer index. Ring the doorbell after this wave has initialized its
wqes. Wqes of other waves might not be fully initialized, but firmware
will not process them until the phase/color flag is updated in the
respecitve wqes.
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
* gda ionic: poll cq for additional completions
Keep polling the cq for more than just the minimum number of completions
for this wave of threads to make progress, as long as the cq is not
empty. A part of wave-optimized cq polling, at the expense of one wave
polling additional completions, it was observed that nearly all other
waves avoid taking the cq lock at all.
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
* gda: max_rd_atomic in rts transition
In modify_qp(RTS), specify max_rd_atomic, not max_dest_rd_atomic.
By not speicfying max_rd_atomic (rather, max_rd_atomic=zero), the local
nic may get stuck transmitting the first read or atomic request. One
read or atomic request is greater than the initiator depth of zero.
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
* gda ionic: allow specifying traffic class
Allow specifying a traffic class. The network might have a specific
traffic class configured as no-drop, for example.
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
* gda ionic: tweak uxdma assignment
The ideal arrangement will have an equal number of QPs active on each
uxdma pipeline.
Pre-rebase, the better arrangement for rocshmem funcitonal test
benchmarks was [0, 1], [1, 0], [0, 1], [1, 0], ...
Now, following changes that add 'ROCSHMEM_GDA_ALTERNATE_QP_PORTS=1' by
default, the better arrangement is [0, 1], [0, 1], [0, 1], [0, 1], ...
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
---------
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Co-authored-by: Aurelien Bouteiller <abouteil@amd.com>
Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>
* Add environment variable configuration infrastructure
- Namespace rocshmem::envvar
- Track all config env vars in per-category lists
- Remove duplicates from list of allowed env var types
- Reject negative inputs for unsigned integer types
- Accept empty strings for std::string
- Print error source location using C++20 std::source_location
- Unit tests
* Port environment variables
- ROCSHMEM_UNIQUEID_WITH_MPI
- ROCSHMEM_RO_DISABLE_IPC
- ROCSHMEM_BOOTSTRAP_TIMEOUT
- ROCSHMEM_BOOTSTRAP_HOSTID
- ROCSHMEM_BOOTSTRAP_SOCKET_IFNAME
- ROCSHMEM_RO_PROGRESS_DELAY
- ROCSHMEM_BOOTSTRAP_SOCKET_FAMILY
- ROCSHMEM_MAX_NUM_CONTEXTS
+ Merge the independent per-backend copies into a single variable
that is used by all three backends (IPC, RO, GDA).
+ Set default to 32 (for GDA); prior default for IPC and RO was 1024.
- ROCSHMEM_MAX_NUM_HOST_CONTEXTS
- ROCSHMEM_MAX_WF_BUFFERS
- ROCSHMEM_SQ_SIZE
- ROCSHMEM_RO_NET_CPU_QUEUE
+ Renamed from RO_NET_CPU_QUEUE
+ Change env var input type to bool, default to false
+ Invert code logic: setting RO_NET_CPU_QUEUE to anything
would /disable/ a variable gpu_queue, which defaulted to true.
Variable is now named config::ro::net_cpu_queue,
with all prior checks for gpu_queue inverted.
- ROCSHMEM_USE_IB_HCA
- ROCSHMEM_HEAP_SIZE
+ Defaults to 1L << 30 i.e. 1 GiB,
from default heap size in memory/heap_memory.hpp.
- ROCSHMEM_MAX_NUM_TEAMS
+ Unlike other env vars, this can be referenced from devices.
+ Function currently narrows from size_t to int: uses need to be audited
for safety and correctness in using size_t directly.
- ROCSHMEM_GDA_ALTERNATE_QP_PORTS
* New env var ROCSHMEM_DEBUG
- Debug levels:
+ NONE
+ VERSION
+ WARN
+ INFO
+ TRACE
- Currently unused - will be added later
- Mirrors RCCL debug control
* Remove rocshmem::rocshmem_env_config
* Change interface for GetClosestNicToGpu
to accept const char** instead of char**:
the pointed-to string does not need to be modified
- Files were not audited for inclusion of util.hpp only for env vars
---------
Signed-off-by: Omri Mor <Omri.Mor@amd.com>