SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 1)
1. The log macros is turned off for release build. So log functions has zero impact to release build.
2. The log macros have level, mask, condition control. So we can have more control to avoid log flooding.
I also adjusted some existing log to use new log functions.
1. To excercise and test the new log functions.
2. To improve performance slightly.
3. The change is mainly for HIP-ROCM, we can move more in next phases for PAL or ORCA.
4. I make these log feature unavailable for release build. We can revert to old log functions for release build in a case by case method.
Tests:
1. http://ocltc.amd.com:8111/viewModification.html?modId=128289&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=128358&personal=true&tab=vcsModificationBuilds
2. release build, run hip program, there is no log
3. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967295
There was a lot of logs.
4. fastdebug build, run hip program,
export LOG_LEVEL=2
export GPU_LOG_MASK=4294967295
There was no logs.
5. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967294
There was much less logs.
6. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=47102
There was even much less logs. The logs was expected according to the mask.
7. Tested step 2 to 6 similarily in Windows and Linux
ReviewBoard: http://ocltc.amd.com/reviews/r/18215
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#46 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_stream.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hiprtc_internal.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/comgrctx.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#137 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#91 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/commandqueue.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/runtime.cpp#40 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#323 edit
SWDEV-184710 - Support hipLaunchCooperativeKernelMultiDevice()
- Add support for multi grid launch in hip
- Detect the new hidden argument and pass the required information for the kernel launch
- Memory for synchronization is allocated as a single object and then the offset for each GPU is found
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#343 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.hpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#42 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#90 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#99 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#97 edit
SWDEV-204782 - store extra information per HSA queue
The new struct QueueInfo is used to store metadata about each HSA
queue. For hostcall, this structure will eventually contain a pointer to
the hostcall buffer allocated to each HSA queue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#135 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#41 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#88 edit
SWDEV-189650 - [HIP-CLANG][HIP/VDI/PAL] Hangs on test hip_threadfence_system
1. In HIP + VDI + ROCm, allow SVM atomic in VEGA10 and later ASIC. GFX8 (Tonga) was enabled before.
2. In HIP + VDI + PAL Linux driver, allow SVM atomic in VEGA10 and later ASIC.
Tests:
1. In HIP + VDI + ROCm, hip_threadfence_system test passed.
2. In HIP + VDI + PAL + Linux , hip_threadfence_system test passed.
3. OpenCL + PAL, clinfo and ocltest runtime test pass.
4. OpenCL + ROCM, clinfo and ocltest runtime test pass.
5. Windows 10, VEGA 10, clinfo and and ocltest runtime test pass. hip_threadfence_system test passed by skipping the test.
Teamcity presubmission test:
http://ocltc.amd.com:8111/viewModification.html?modId=127083&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=127076&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18077/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#73 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmemory.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#320 edit
SWDEV-193423 - HIP/VDI - Support for lazy hsa queue creation
- Add queue pool support for HSA HW queues. GPU_MAX_HW_QUEUES controls the pool size. The current default value is 4 (the number of active pipes on GPU).
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#132 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#81 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#314 edit
SWDEV-144570 - Adding entries on to P2P Access devices in RocM, to create deviceMemories_ for P2P devices too.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#129 edit
SWDEV-180872 - Runtime support changes for Cooperative Group Features
- Initial implementation of the core functionality. Disabled by default. Use GPU_ENABLE_COOP_GROUPS=1 to enable the feature.
- Runtime uses device queue for cooperative executions with a synchronization on the launched queue.
- The current implementation is pure runtime change and it can work if only one app uses this feature. No ROCr/KFD support was added or tested
- Only inline assembler was tested
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device_runtime.cpp#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.def.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.map.in#17 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#338 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#606 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.hpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#142 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#39 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palschedcl.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#135 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#61 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.hpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#37 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocschedcl.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#75 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#94 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#92 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#311 edit
SWDEV-189140 - Add P2P support in PAL path
- PAL requires P2P resource open on the usage device. Add the new interface to open the resource
- Add a hidden P2P device object creation into amd::Memory. It can be activated with OCL context that has a single device.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_p2p_amd.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#337 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#28 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#133 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#126 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#93 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.hpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#306 edit
SWDEV-168145 - Add ECC target feature to OpenCL runtime
- hard coded SRAM ECC target feature for now since ROCr disable sram-ecc reporting via ISA until HCC is fixed
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#123 edit
SWDEV-180407 - Observed failure while running OCL 2.0 conformance API : min_max_device_version
- revert CL1739455 to use OCL version 1.2 as default to avoid this issue for ROCm 2.2 release
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#117 edit
SWDEV-134107 - Add support for respecting target's xnack setting
- Enable the XNACK feature for all the APU system and remove the xnackEnabled_ field in AMDDeviceInfo struct
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#332 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdefs.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#116 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#98 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocsettings.cpp#41 edit
SWDEV-178313 - Properly enable OpenCL 2.0 on ROCm/LC path for Vega10+.
OPENCL_VERSION_STR is 2.1, but we only enable 2.0 since we don't have compiler's support for 2.1.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#115 edit
SWDEV-178313 - Enable OpenCL 2.0 on ROCm/LC path for Vega10+
Doorbell self-ring doesn't work for Fiji, so we enable 2.0 only for Vega10+ for now.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#114 edit
SWDEV-172202 - Back out changelist 1730757.
Failure in OCLDynamic tests in various TC Sanity tests.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#111 edit
SWDEV-79445 - Back out changelist 1722556
- More changes are necessary on ROCm backend to support a dynamic switch between HSAIL and LC
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#107 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Allow ROCM build within the same workspace as PAL. Please note that ROCM defualt path in this case will be HSAIL.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#105 edit
SWDEV-161959 - [ROCm QA][RAVEN] QCDGPU-S test is having ERROR -61: (clCreateKernel failed) on RAVEN
SWDEV-161983 - [ROCm QA][RAVEN] Cachebench test is failing with CL_INVALID_BUFFER_SIZE issue
SWDEV-161978 - [ROCm QA][RAVEN] PCIeBW is failing on -with error : 61, OpenCL error creating buffer !
SWDEV-161962 - [ROCm QA][RAVEN] rodinia->nw test has ERROR: clCreateBuffer input_item_set (size:67125249) => -61
- We make the global memory size at least 1G byte. This is to avoid issue/regression if sysconf API misbehaves.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15660/
Tests:
1. ocltst -m oclruntime.so -A oclruntime.exclude - PASS except SVM test (non regression)
2. TeamCity presubmission test (OpenCL) - PASS
3. Run test qcdgpu-s.sh : PASS
4. Run test cachebench-ocl : PASS
5. Run test PCIeBandwidth -c 0 -g 0 : PASS
6. Run test Rodinia/opencl/nw/run : PASS
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#98 edit
SWDEV-161959 - [ROCm QA][RAVEN] QCDGPU-S test is having ERROR -61: (clCreateKernel failed) on RAVEN
SWDEV-161983 - [ROCm QA][RAVEN] Cachebench test is failing with CL_INVALID_BUFFER_SIZE issue
SWDEV-161978 - [ROCm QA][RAVEN] PCIeBW is failing on -with error : 61, OpenCL error creating buffer !
SWDEV-161962 - [ROCm QA][RAVEN] rodinia->nw test has ERROR: clCreateBuffer input_item_set (size:67125249) => -61
- Adjust Global memory size as half of the system physical memory size in APU for Rocm
- Similar to current DGPU calculation, environment GPU_SINGLE_ALLOC_PERCENT can be used to adjust max memory allocation size.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15659/
Tests:
1. ocltst -m oclruntime.so -A oclruntime.exclude - PASS except SVM test (non regression)
2. TeamCity presubmission test (OpenCL) - PASS
http://ocltc.amd.com:8111/viewModification.html?modId=106628&personal=true&init=1&tab=vcsModificationBuilds
3. Run test qcdgpu-s.sh : PASS
4. Run test cachebench-ocl : PASS
5. Run test PCIeBandwidth -c 0 -g 0 : PASS
6. Run test Rodinia/opencl/nw/run : PASS
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#97 edit