1. Create a set of mini numa interface.
In Linux, the interface is based on system call rather than libnuma.
In Windows, the interface can also work, but the policy class is dummy.
Different from Linux, Windows doesn't provide numactl tool or numa lib to setup numa policy, thus
the default policy is followed in Windows, that is, using the closest host numa node to allocate
pinned host memory in hipHostMalloc().
To get the closest host numa node of a GPU device, you need query the new attribute
hipDeviceAttributeHostNumaId. Then you can create a thread with CPU affinity on the numa node.
For example, reference the test in hip-tests/catch/perftests/memory/hipPerfHostNumaAllocWin.cc.
2. Remove pfnSetThreadGroupAffinity and pfnGetNumaNodeProcessorMaskEx as the functions have been exposed since Win7 and Win server 2008.
3. Other minor fixes.
* SWDEV-546485 Port and clean up for hipPerfBufferCopyRectSpeed
* SWDEV-546485 Port and clean up for hipPerfDevMemReadSpeed
* SWDEV-546485 Port and clean up for hipPerfDevMemWriteSpeed
* SWDEV-546485 Port and clean up for hipPerfHostNumaAlloc
* SWDEV-546485 Port and clean up for hipPerfMemcpy
* SWDEV-546485 Port and clean up for hipPerfMemMallocCpyFree
* SWDEV-546485 Port and clean up for hipPerfMemset
* SWDEV-546485 Port and clean up for hipPerfSampleRate
* SWDEV-546485 Port and clean up for hipPerfSharedMemReadSpeed
* SWDEV-546485 Ported and fixed up segfault for hipPerfMemFill
* SWDEV-545485 Returning to unedited stage
[ROCm/hip-tests commit: 04469c0cde]