Граф коммитов

40 Коммитов

Автор SHA1 Сообщение Дата
Payam f134b90199 SWDEV-257937 - ROC_BARRIER_SYNC fix for missing SDMA flush
Change-Id: I93e8902bfcb16bac8ea594e16ea397b1ceafbd79
2020-12-15 00:54:33 -05:00
Saleel Kudchadker 59c6cb0268 Use barrier packets for event profiling
Use barrier packets for every profile marker that gets submitted
and use the completion signal to get GPU ts. This gives most accurate
dispatch time. Club cache flushes with profile marker if there is a
pending dispatch that needs cache flush. This optimization saves on
extra barrier and helps wall time

Change-Id: Ib62d6d7aabf4743827b561be6c9c5afa813203da
2020-12-03 13:45:14 -05:00
German Andryeyev bd340d8cbf Correct reported info in ROC profiler
OCL can't distinguish different copy types, but ROC profiler
expects SDMA transfer visibility. Add extra code to detect
a transfer with the host memory and substitute OCL command

Change-Id: I5290acd0e10bc082e00c1d4ae1474a075de7f165
2020-10-23 18:29:48 -04:00
German Andryeyev a5661192b6 Reduce the number of allocated signals
Enable this optimization when the barrier is disabled, since
reuse requires a signal wait.
Use the size of pending AQL signals as the size of signal pool.

Change-Id: I2754a0f8b67e19d2601c58945e10fdf0e8be1624
2020-10-15 16:39:33 -04:00
German Andryeyev d9397590de Add option to skip AQL barrier
The change reuses HSA signals for dispatches as a wait signal.
Skipping the barrier requires to  disable L2 cache for sysmem
allocations and extra tracking for HDP access with the large bar.
ROC_BARRIER_SYNC=0 activates the new logic. Barrier sync is
still used by default.
ROC_ACTIVE_WAIT=1 enables unconditional active wait in ROCr.
The change also consolidated ROCr wait logic under single function.

Change-Id: I6bd1be30aa88258da1b1f9de319ef5a45852afd8
2020-10-06 08:37:12 -04:00
German Andryeyev af8426b0e4 Revert "Reduce the default size of the signal pool"
This reverts commit e68d671a51.

Reason for revert: a regression

Change-Id: I78180ba011f45af9a4cce110b14f379aa10f7d3a
2020-10-01 09:56:05 -04:00
Aryan Salmanpour 2e199bd492 Fix a crash when printf used in a kernel launched on a stream with custom CU mask
SWDEV-249719 - root cause: queues with custom CU mask are not inserted
into queuePool_ (i.e., queue of reusable HSA queues) of ROC device class
causing a crash when creating hostcall buffers for printf

Change-Id: Ieee7005d9a5a30b3113394ce23ee65927126d0d6
2020-09-25 09:25:19 -04:00
German Andryeyev e68d671a51 Reduce the default size of the signal pool
Implement dynamic signal pool grow per allocated queue

Change-Id: Ie8b17937d72c29cc49e59639c4b2023ea984b14c
2020-09-09 09:53:52 -04:00
Alex Xie 2c2665665d SWDEV-250136 - [LNX][Navi21][OCL over ROCr] OpenCL-GL sharing failed
Change-Id: Id61f649f035964d14f6399dbea03137c11f8eaea
2020-09-06 10:40:56 -04:00
Jason Tang 19d1497fa2 SWDEV-239502 - fix interop regression
When header==0, the legitimate packet->header is wiped out, so also add an assert.

Change-Id: I6b3037d4618719262b0d7c1792bd54f768a63660
2020-08-25 18:11:18 -04:00
Aryan Salmanpour d2b9d267b2 SWDEV-248499 Fix a crash when printf is used with cooperative kernels
root cause - cooperative queue is not inserted into queuePool_ (HSA queues) of ROC device calss causing a crash when creating hostcall buffers for printf

Change-Id: I3f9aceb4e5fe6a7c7a2a549a4bb0a3511fe02799
2020-08-25 16:51:34 -04:00
German Andryeyev 6e69258b69 Enable prefetch async functionality
Fix a typo with the name define, when compilation wasn't enabled.
Force CPU prefetch if system was forced in runtime

Change-Id: Id4b578f9fa44a45426fdb5d8ecb1da803aa42313
2020-08-13 11:09:10 -04:00
Jason Tang 152a2dfb5a SWDEV-247463 - Fix regression: ocltst segfaults
Change-Id: Iadb55ba45d6c8ade0757fd970ac4c6cde1805de3
2020-08-09 11:28:09 -04:00
German Andryeyev 0dc47d55d2 Sync the current queue for P2P staging
P2P staging uses device queues for transfer, hence the current
queue must be in sync

Change-Id: I8372a60590eed9dde62cb4c67ef4df5df82a8e8d
2020-08-07 14:36:50 -04:00
Saleel Kudchadker ec73340348 Add Queue profling param and toggle for HIP
Use signal timestamps if NDRange command takes forceProfile flag.

Change-Id: Ib7f187d781fd78a7346818afb3344a9378f4c104
2020-08-06 03:09:53 -04:00
Jason Tang 8ef5da00c7 SWDEV-246687 - Do not use std::vector reference as class member cuMask_
The current implementation creates default reference in the stack and assigns it to class member cuMasks_, so whenever the content of the stack changes, cuMask_ would change.

Change-Id: Iefab63c335d504b83c4ae90bd34ae76c6afb8f3c
2020-08-05 16:57:36 -04:00
German Andryeyev 91a25df04f Process cache coherency before mem dependency tracker
Optimizaiton to remove extra syncs uncovered a bug with the cache
coherency layer, there runtime could lose the track of mem address
if coherency layer performed a sync.

Change-Id: I25647cfa4a4be9cdbd8577ff076a740bbdac79c8
2020-08-04 16:33:18 -04:00
Tao Sang fdef6f722f Apply constexpr on global constant varaibles
When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be
referenced but they are not initialized in that early time. The patch
will use constexpr to initialze global constant varables in compile
time.

Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef
2020-07-22 22:14:13 -04:00
Jatin Chaudhary 48690f29e9 Adding AnyOrder Flag
Change-Id: I6baaef42b98adfbc8cf2605e175ec007e008045f
2020-07-22 00:25:04 -04:00
Matt Arsenault 5577eabcea Fix -Wmissing-braces
Change-Id: I2394b6923c789f36e72242f4b196844cc0ee90ba
2020-07-15 16:51:03 -04:00
Jatin Chaudhary cd1e364911 Replacing deprecated HSA API calls with newer ones
Change-Id: Iebe2c00e717ab0e47c61611752b717966c719994
2020-07-08 00:32:24 -04:00
Aryan Salmanpour 4a901f3dd3 Always print error message with the returned error code before abort
Change-Id: I8479abc586937a50c90b2785c4ce7364e6e9732b
2020-07-07 16:28:30 -04:00
Dittakavi Satyanvesh 7a3b8c6dd2 SWDEV-240566 adds error message before abort
Change-Id: I4dbd089daa5e6fde5e8722dc2395225dd822561c
2020-06-22 10:12:49 -04:00
German Andryeyev c5afd5d412 Initial HMM support
- Expose ROCclr interfaces for HIP usage
- ROCr interfaces aren't available in staging, thus control the
build with AMD_HMM_SUPPORT define

Change-Id: Iadc2bcc230e78d3b0dc22b235189c8cc80843446
2020-06-12 09:06:07 -04:00
Saleel Kudchadker 2b771d2f5f Add logging support for AQL packet
Use AMD_LOG_LEVEL=4 and AMD_LOG_MASK=8 to print AQL log
explicitly
Change-Id: I4209d91b460e64be44261d3ab773580067e47c29
2020-06-10 14:04:47 -07:00
German Andryeyev 86e0f337fc Add the sync of the current queue
Make sure runtime waits for the current queue before
synching with device queue

Change-Id: I753b6fc0bb15a3a3d4bf03fef1152842550850c0
2020-06-05 11:57:59 -04:00
Aryan Salmanpour b5552aa97f Add support for setting queue priority for ROCm backend
Change-Id: I67ed5a6868af79538f7f4522d8d11c043cdf3c1e
2020-06-04 20:16:32 -04:00
German Andryeyev 2ce6bbebc4 Fix async mem clear
Optimization for the fence release removed a sync for mem fill.
Add simple const buffer management forr the filled pattern to avoid
pattern overwriting with the async fills.

Change-Id: I63773ac09ceec31d5396d24570e4647ff096326b
2020-05-20 11:13:41 -04:00
Aryan Salmanpour fed94b8604 Add support for setting CU mask on ROCclr for ROCm backend
Change-Id: I0dbe2eeb33467fc0f24b26929119c10e9b455da7
2020-05-15 14:23:43 -04:00
Christophe Paquot 6a5af4056e Use system scope for packet following sdma copies
SWDEV-234947
SWDEV-236298
Instead of forcing a barrier packet, just inject system scope on the next packet.

Change-Id: If9bcee23e08dfe5db731235e2fcb30582cbd4c1c
2020-05-15 12:20:06 -04:00
German Andryeyev 7302ebcfbc Optimize synch operations
- Stall the queue only for HSA copy operations

Change-Id: Ia3debcc0f36284c5f8cd2776d31674f3aeed04ea
2020-04-30 11:17:48 -04:00
Christophe Paquot b54c3f7db9 Couple of cleanups.
Remove queue limitation since we loop through HW queues now.
Add a DevLogError if we fail to create the hsa_queue. A ticket showed a regression there.

Change-Id: I4f58e405f88e75600a762f6d6352838c969cdb5e
2020-04-29 09:18:07 -07:00
German Andryeyev 89133a7301 SWDEV-232807
[ROCm][TCT][HIP] cooperative stream test case is failing.

Make sure lockXfer() in the blit manager returns a valid value.
Port the latest PAL backend logic into the ROCr backend.
This change doesn't fix the issue, reported in the ticket.

Change-Id: I54101a824f49a2dcfbbf5414cb5b3af41745306d
2020-04-23 15:01:02 -04:00
Michael LIAO 97f55b5c7f [vdi] Add device assertion support.
- Once device assertion occurs, abort the host execution as well.
- TODO: This's the initial support. As we need to drain hostcall queue
  to ensure device assertion message being flushed out, hostcall
  listener needs an interface to explicitly drain its queue.

Change-Id: I8a04400aa7109bfd054ae5777c41a4abbf0db4a9
2020-04-22 10:03:55 -04:00
Alex Xie 43b9863e17 SWDEV-229731 - [Lnx][Rocm][Navi]Support images in full Opencl Conformance tests
1. Enable pitch workaround
2. When we use copy image, we don't need to create the custom pitch image
3. wrtBackImageBuffer_ stores device memory object, not amd image object.

Tests:
conformance kernel read / write test pass with this code change.

Change-Id: I7dca3127adde6ac83e78dd270a2256ebed55c60d
2020-04-04 09:43:03 -04:00
German Andryeyev 7ef8dfdfe7 SWDEV-184709 - support hipLaunchCooperativeKernel()
Add ROCr cooperative queue allocation

Change-Id: I1384482692f4080d31255b09e0f68a21ccad3da8
2020-03-30 16:09:09 -04:00
German Andryeyev 374f612b7c SWDEV-193956
[hipclang-vdi-rocm][perf]~45% to 50% of Performance drop on
rocBLAS_int8 test

- Enable AMD_OPT_FLUSH optimization by default to match HCC
- Disable CPU writes to GPU memory on boards with large bar,
because it requires HDP flush tracking.
- Enable L2 cache on kernel arguments, because L2 will be
invalidated on memory reuse .

Change-Id: I124cf250bdd4d19c523ce542c163813828f8fbdc
2020-02-18 14:26:00 -05:00
Laurent Morichetti d9d9c69399 Replace cl_* integral types with standard types.
cl_bool -> bool
cl_int -> int32_t
cl_uint -> uint32_t
cl_long -> int64_t
cl_ulong -> uint64_t
cl_float -> float
cl_double -> double
cl_bitfield -> uint64_t

Change-Id: I840c8993b55f98f5b745d21e27f5f28233647a58
2020-02-12 13:16:06 -08:00
Laurent Morichetti b4c6143a2f Update copyright info
Change-Id: Ia4f9ff0f5f873b4223a8cca154188bb0d2f1abba
2020-02-04 09:26:14 -08:00
Laurent Morichetti 20c7173849 Merge branch 'origin/pghafari/vdi-prototype' into lmoriche/amd-master
Change-Id: Id3b833d405596735becb3346f3b08c6da57033fe
2020-01-30 20:12:13 -08:00