For addMarker, assume T1 comes in first and enqueues a command C1.
Before T1 grabs the event_::lock_ it gets preempted. At this time,
T2 comes in, enqueues C2 and grabs the lock_ and updates event_. Now T1
wakes up and updates a older command C1 for the event.
Change-Id: Ia423782b23026302c40976385623cfdede32d70b
Add device_id_ in hip::event to match cuda behaviour in
hipEventQuery() and hipEventRecord().
Enable hipEventElapsedTime test on AMD platform.
Workarround sporadic crash of hipEventIpc test due to
some bug of event ipc.
Add missing hipEventDestroy() in some event tests.
Fix some logic code errors.
Fix typo in comment.
Change-Id: I9ec74c475161b3e31df48d193449023e921f2924
HIP assumes that image width is in bytes, but OCL/ROCclr assumes that
it's in pixels. AtoD/DtoA need to account for this.
Change-Id: I275bd41d8b03e141caaf951bc6b714e51ca72dfc
[dtest] Additional tests for Memcpy
APIs tested:
hipMemcpy, hipMemcpyAsync,
hipMemcpyHtoD, hipMemcpyHtoDAsync,
hipMemcpyDtoH, hipMemcpyDtoHAsync,
hipMemcpyDtoD, hipMemcpyDtoDAsync
1: The aim of this test case is to cover all
the negative test cases for 8 hipMemcpy apis
2: This test launches NUM_THREADS threads.
Each thread in turn tests the working of
8 hipmemcpy apis
3: This test case verifies the working of
Memcpy apis for range of memory sizes from
smallest one unit transfer to 1GB.
Change-Id: If5c99527a78e817bafab2e1bd9b686a9ff916184
On Windows there's something fundamentally broken about redirecting IO
into a file and then restoring that said IO to it's original state. Even
though no syscalls would fail, the output would sometimes either go into
CLI or straight up nowhere.
Simply using pipes instead of a temporary file magically resolves the
above issue ¯\_(ツ)_/¯
Unfortunately the max pipe size on Linux is 1Mb, which is not enough to
store all the data printed by the kernel. This leads to a softhang in
vprintf().
Stick to using a temporary file on Linux, but switch to pipes on
Windows. Slightly refactor the CaptureStream struct to accomadate this
difference.
Change-Id: Id8e68f150df47815a4f652ee2bcd6cfb7c3e3bac
The following snippets has different behaviour based on platform.
printf("%p", 0x123abc);
Linux -> 0x123abc
Windows -> 123ABC
printf("%p", nullptr);
Linux -> (nil)
Windows -> 0000000000000000
%p specifier according to C spec is implementation defined, so we need
to adjust the reference string to be correct on Windows.
Change-Id: I7059fa0f6cde611718bd76655637670fcbccf43c
The current implementation in ROCclr for callback is
based on OCL specification. If in HIP the same command
could get multiple callbacks, then ROCclr will process them in
a reverse order. Unique Markers for each callback will make
sure it won't happen.
Add a dependency wait for callbacks, since HSA signal callback
doesn't guarantee the order.
Change-Id: I9d514734e258312fe9a74d48132361eb17c52d67
With direct dispatch enqueue occurs before callback update and
it can't be tracked in the device backend
Change-Id: Ie8793e3ddb68cc5bb36348f7a8dcdbdc87a2487c
Some upstream will load hip multiple times thus Rocclr static lib will
be linked to hip::amdhip64 many times. This will bring some unexpected
issues such as in hipBLAS. The patch prevents his happening.
Change-Id: I6bb27659f74371dae6e59c59fd6bb2022cc062ff
Flags of NVCC_OPTIONS need be sent to linker. Because compiler and
linker flager are mixed in NVCC_OPTIONS.
Change-Id: I3db37b962808566ea145e3cbdefa66d373e2d360