* add gfx1100 support
Add support for Radeon 7900 GPUs (RX and PRO), and 7800 PRO.
I was contemplating to add gfx1101 and gfx1102 GPUs as well, but those are the lower end models that are more unlikely to be used for compute intensive jobs. In addition, I do not have access to them to test the support.
* update WF_SIZe for different options
Radeon systems use a WarpSize of 32, unlike current Instinct systems,
which use a warp size of 64. For the device side, a gfx specific ifdef
is sufficient. For the host side, we need to query the device
properties.
* adjust functional tests to wf_size of 32
* update unit tests to handle wf_size of 32
* address reviewer comments
[ROCm/rocshmem commit: d0c2845031]
* Allocate default context buffers and initialize queue for management
- Allocated the status flag, g return, and atomic return buffers for
the default context.
- Initialized `AtomicWFQueueProxy` instances to manage these buffers
efficiently for concurrent access.
* Update `BlockHandle` with default context buffers
* Add default context flag and update buffer retrieval functions
- Added a flag to distinguish the default context from other contexts.
- Modified return buffer functionns and `get_status_flag` function to accommodate
the default context
* Add default context primitive tests
- get, put, get_nbi, put_nbi, g, and p APIs.
[ROCm/rocshmem commit: 867519e1d0]