* Added the ability to compile for Local gpu by environment variable
* adding gfx950 on default only on rocm 7.0 and above
* Updated docs
* removed xnack+ on specific gfx targets
---------
Co-authored-by: Yiltan Hassan Temucin <yiltan.temucin@amd.com>
[ROCm/rocshmem commit: be630d9b93]