SWDEV-245290 / SWDEV-246220 [HIPPerf] Port OCLPerfDevMemWriteSpeed/OCLPerfDevMemReadSpeed into hip performance subtests
Change-Id: I5dc323c75cebbc17596dcb4ed9492e18c5246868
hipSetDevice is not used correctly to allocate on multiple devices in mGPU setup.
Due to which hipMalloc was called on the same device on multiple threads leading to out of memory issues on some devices with lesser memory.
Change-Id: I0e5b1bc028b9ecb11bd40c3a5edf715f8bd721ff
Tests heq2, hne2, hle2, hge2, hlt2, hgt2 APIs for functionality
and NaN tests
SWDEV-238517 for enhancing hip unit tests
Change-Id: I88a9a8ead0d00a1261f3d650361d655f2f397e48
-Scenario-1:: This test case is used to verify if the callback function
called through hipStreamAddCallback() api completes the
execution in order as hipStreamAddCallback() api queued
in their respective streams: hipStreamACb_AltEnqueue.cpp
-Scenario-2:: This test case tests if Host thread continues with next
command after hipStreamAddCallback() api or wait for
callback() call to finish. Ideally Host thread should not
wait for callback to
finish: hipStreamACb_ThrdBehaviour.cpp
-Scenario-3:: Streams are launched in individual GPUs with different
kernel Verify that all the kernels queued are executed
before the callback is hit: hipStreamACb_MStrm_Mgpu.cpp
-Scenario-4:: Checks the callback execution in the same order it was
added. Also, it checks if the number of callbacks
executed are same as the number of callbacks added:
hipStreamACb_order.cpp
-Scenario-5:: This test case checks whether hipStreamSynchronize() is
taking less time than the time taken by Callback()
function launched by hipStreamAddCallback() api :
hipStreamACb_StrmSyncTiming.cpp
-Scenario-6:: This test case is used to check if the runtime is ok when
hipStreamAddCallback() is called back to back multiple
calls: hipStreamACb_MultiCalls.cpp
-Scenario-7:: This test case is used to check the behaviour of HIP when
multiple hipStreaAddCallback() are called over multiple
Threads:hipStreamACb_MultiThread.cpp
(Currently disabled)
SWDEV-238517 for enhancing hip unit tests
Change-Id: I9c7b7df6766c728b2b201df18726b9fbdd434c06
This makes hipLaunchKernelGGL take a variable argument list, that will be
expanded before being fed to hipLaunchKernelGGLInternal.
This is different from 961717879d.
We try to accomodate the case when a kernel template has multiple
type parameters.
Change-Id: I87577d402c92b0f3b51e298f8293f4065e1f6de8
1.hipMemcpyWithStream with one stream
2.hipMemcpyWithStream with two streams
3.hipMemcpyWithStream multiGPU with one stream
4.hipMemcpyWithStream with kind DtoH
5.hipMemcpyWithStream with HtoH
6.hipMemcpyWithStream with DtoD
7.hipMemcpyWithStream with Default kind
8.hipMemcpyWithStream with Default kind on same device
9.hipMemcpyWithStream with DtoD on same device
SWDEV-238517 for enhancing hip unit tests
Change-Id: I5f55a12bdd7c8d28fcb06db94a491c2ad5ee3004
This makes hipLaunchKernelGGL take a variable argument list, that will be
expanded before being fed to hipLaunchKernelGGLInternal.
Change-Id: Id76e2bf91acd5d68f56a24fc39f219f2eeb06d33
- Test with one, two streams
- Test Multi-gpu (one stream per gpu)
- Test D-D (on same device/different devices). Can discover issues
when devices are on same or different root complexes.
- Test H-D/Default
SWDEV-238517 for enhancing hip unit tests
Change-Id: I8031a7eebe2f9c8c0e0996e2c7accb09ac0b96d4
1.Added hipModuleLaunchKernel multithreaded multi GPU scenario.
2.removed hipCtxCreate API from earlier test as it is deprecated.
SWDEV-238517 for enhancing hip unit tests
Change-Id: Id102d80887b6ff61a59938dbeb9fa2a26a3275b2
Similar to HCC, link with compiler-rt to support __fp16 and _Float16 type conversions in ONNX models. This should resolve SWDEV-238491.
Change-Id: Iad8dcff568831719f501f562a04023326ae8036c
When the original size is devided accross all GPUs rounding can
occur, causing incorrect validation. Readjust the final value
for comparison to the new size accordingly.
Change-Id: I9b42149e33dfcb328de7419e546a0202a69a8610
This technique should never be used, and only accessed through
__builtins.
There's currently no builtin for groupstaticsize. I left ds_swizzle
since for some reason it switches to the builtin based on __HCC__ or
not.
Change-Id: If1e1394221dba83ea4add6db5e94d6b715552044
Support performance tests while direct tests commands keep unchanged.
To build performance tests, run "make build_perf".
To run all performance testis, run "make perf".
To run specific tests, for example, run
/usr/bin/ctest -C performance -R performance_tests/perfDispatch --verbose
To run individual test, for example, run
performance_tests/memory/hipPerfMemMallocCpyFree
Change-Id: I168c1b9ef1ec21b392d48648d0c71e8fbd37d57b