Commit graph

57 Commits

Autor SHA1 Nachricht Datum
Saleel Kudchadker b2b8545eb6 SWDEV-260345 - Manage constant buffer for blit
- Leverage managed buffer that would use chunks for fill pattern. Use a
different chunk for the next fill to avoid wait

Change-Id: I254483c867e112f66564ffd8f55e0a605d8896c9


[ROCm/clr commit: 175ad024d3]
2022-07-12 12:41:02 -04:00
Saleel Kudchadker 7fd80925cd SWDEV-335626 - Use ROCr copy for IPC
Detect IPC buffer and use ROCr copy api instead of blit

Change-Id: Ie6bdd6fc45dbd7457611011d81570b53d5fd5276


[ROCm/clr commit: faaa41aab8]
2022-07-08 13:32:19 -04:00
Ajay 9fcc7a7219 SWDEV-332522 - streamOpsWrite & streamOpsWait to accept memory offset
Change-Id: I4b6ecb4d80c093d038d86616a637c4bb465ae24e


[ROCm/clr commit: d2f837d25f]
2022-04-25 14:59:36 -04:00
Jason Tang 7bdbf61a9d SWDEV-324411 - Use blit kernel for copyBufferRect if atomic is not supported
Change-Id: I2e110fd3418117ee9c7ede379244d2c6c4f248b7


[ROCm/clr commit: ed7737564e]
2022-04-24 11:41:16 -04:00
kjayapra-amd 31c0525344 SWDEV-305527 - Changes to handle memset blit kernel that takes width, height and depth. This also fixes SWDEV-317261.
Change-Id: Ic85f63a95d9d8f48884fc8c7fd95cbb496dfbbca


[ROCm/clr commit: 7fb80a027a]
2022-03-31 09:02:33 -04:00
Satyanvesh Dittakavi acfa45bd5c SWDEV-326397 - P2P copies to take SDMA path if there is no pending dispatch
Change-Id: I50cfb8d77f7882151a20a1de7aaf5219b1695b7d


[ROCm/clr commit: c1b95b09bf]
2022-03-29 14:59:11 +00:00
German Andryeyev a7be0eb56a SWDEV-316824 - Fix P2P compute copy path
Use device memory object for the GPU VA address look-up.

Change-Id: I76bf58b29205f7b3ba1bf68e9fcca69421267203


[ROCm/clr commit: 3fd4a67670]
2022-02-15 13:20:13 -05:00
Satyanvesh Dittakavi 85c2cac111 SWDEV-306939 - Fix vdi errors/warnings by CppCheck
Change-Id: I56d910f8363787f1050d5d7e8064ed553c5827fd


[ROCm/clr commit: e20dd61932]
2022-01-12 00:22:16 -05:00
German Andryeyev 5ad02b78c4 SWDEV-305016 - Improve MGPU scaling in Tensorflow
Add a threshold for ROCR/SDMA P2P transfers. ROCR copy path
requires extra barriers in compute for synchronization. That costs
extra performance with tiny transfers.
Reduce active wait time to 10us. Tensorflow uses extra thread
per GPU with constant hipEventQuery() calls. Longer active waits
in ROCr affect CPU performance.

Change-Id: I9020358438615fa2d4617f862f00a562f0a588e7


[ROCm/clr commit: 008133cf41]
2021-12-08 11:59:37 -05:00
kjayapra-amd f75cfb049a SWDEV-312822 - Fix the globalWorkSize to number of sizeof(var) instead of bytes.
Change-Id: Ic6b2bbb2e8d4cb6aa8d906d4b93cd06a176160d8


[ROCm/clr commit: d4ad981c0c]
2021-11-29 17:36:11 -05:00
kjayapra-amd f74515778c SWDEV-312822 - Revert "SWDEV-310187 - Change flag to keep track of aligned sizes instead of expanded patterns."
This reverts commit 7220267211.

Change-Id: I022c2a8375f9929e9723cec66e1e0b960263fc39


[ROCm/clr commit: 2e9bc8f793]
2021-11-28 23:39:40 -05:00
German Andryeyev b0b0c3049f SWDEV-313126 - Use data() method for the base array address
Reference for the first element can trigger an assert with
_GLIBCXX_ASSERTIONS build

Change-Id: I59c63c052831307edfe5dcc6384798a43e9596dd


[ROCm/clr commit: 6f2e7c3199]
2021-11-26 09:51:57 -05:00
kjayapra-amd 7220267211 SWDEV-310187 - Change flag to keep track of aligned sizes instead of expanded patterns.
Change-Id: I763feda8688bb1b7b11033a2a8cba0f69f07167d


[ROCm/clr commit: 8307886644]
2021-11-19 10:32:40 -05:00
Bing Ma 213b5dffd1 SWDEV-306602 - [SANITIZER_AMDGPU] Force copyBuffer to use ROCr functions when ASAN is ON
Change-Id: I04a4cdd5ab8c5543f2a0f08c139c45ac7aebe64a


[ROCm/clr commit: 02f939a40d]
2021-10-14 12:55:27 -04:00
kjayapra-amd 6f62f832cb SWDEV-232903 - Move hipmemset Dword optimization to ROCclr.
Change-Id: I3eae61720cbc6364f1aaac4865bfd8b6ded08097


[ROCm/clr commit: 88ed58735d]
2021-10-13 11:32:15 -04:00
Jason Tang e4db6ef66a SWDEV-306697 - Fix OCLGlobalOffset segfaults
If we don't create the __amd_rocclr_gwsInit kernel, we still want
to create the rest of the image related blit kernels.

Change-Id: I8bc4645f9f9116eeecbb8b22e981ac4d520f3121


[ROCm/clr commit: 55a0cf0b0c]
2021-10-12 15:13:28 -04:00
kjayapra-amd cfb15c6c5d SWDEV-294420 - Ignore Image blit kernels if image instructions are not supported.
Change-Id: I145172672b0b032aa722649b0c4ca9267e3e5c85


[ROCm/clr commit: 7413b7f79b]
2021-10-05 18:12:44 -04:00
Sourabh 936e0836a8 SWDEV-292525 - [vdi] Path to streamOps shaders
Implementation to use a blit kernel to perform
a hipStreamWait/write instead of an AQL packet.

Change-Id: I462671ed5cec37144dfe97ff66439249196117c1


[ROCm/clr commit: cbb8d82bdb]
2021-09-27 13:59:35 -04:00
Saleel Kudchadker 36ec8c8871 SWDEV-297448 - Add 64bit and 16bit write support
For the fillBuffer shader, if there are two 32bit writes to a MMIO
register, it can get dropped. It has to be a single 64bit write.
Add optimization to fillBuffer to write 64bit and 16bit writes.

Change-Id: I3aa78e027898f8ae01e9c8f09004615673720c2b


[ROCm/clr commit: 21ba34d0fe]
2021-09-08 12:30:04 -04:00
Sarbojit Sarkar 45953e81dd SWDEV-300655 - Added thread ID to hip trace
Change-Id: I9234d4ec93e7687cd0a5d1bd930bd4f80936311b


[ROCm/clr commit: 42d33029dc]
2021-09-06 00:22:42 -04:00
agunashe 49f0546637 SWDEV-293742 - Update copyright end year VDI repo
Change-Id: I69d2fea4a7a43adf96ccea794270e4af991c5261


[ROCm/clr commit: d96481fb36]
2021-08-22 23:56:07 -07:00
German Andryeyev 3e36acd579 SWDEV-278894 - Use GPU waits for HIP events
Save HW events in amd::Event.
Use HW events for synchronization

Change-Id: I98cf9c2d0ec3c7fcaf254b749ac6c568d7270ae0


[ROCm/clr commit: fa2e154a8b]
2021-05-25 13:41:15 -04:00
Saleel Kudchadker 6c1f022834 SWDEV-280773 - Additional logging for signals
Cleanup new lines in debug log

Change-Id: I6862c332eb9457b51e23cf4e9db9ba3f870d0c39


[ROCm/clr commit: 42b8236f93]
2021-04-30 15:05:57 -07:00
Saleel Kudchadker 6c304e4027 SWDEV-276120 - Remove support for barrier sync
ROC_BARRIER_SYNC will not work with direct dispatch.
Remove and cleanup.

Change-Id: I81368b2e65039477bd0343bb92708dab48867db6


[ROCm/clr commit: aa38af8c96]
2021-04-07 17:08:39 -04:00
Ravi C Akkenapally 6629930067 SWDEV-179105 - Stream Operations: Add support for Wait and Write
Change-Id: Ibffa1d6d573826b64763da280074a77271d66808


[ROCm/clr commit: 0a5f9a3b10]
2021-02-15 17:02:38 -08:00
Payam 72b49f0800 SWDEV-257937 - Updated fix for ROC_BARRIER_SYNC=0
Change-Id: I7e28e541b654db57fb0890d7dbb7519cfb2d93db


[ROCm/clr commit: a2e0b0495c]
2021-02-11 14:01:45 -05:00
Saleel Kudchadker 7a08212ce1 SWDEV-257787 - Add log for tracking copy signals
Change-Id: I713e8463916a85a634a1ec2309bbd46a11c461a8


[ROCm/clr commit: 629a2d8ef3]
2021-01-28 13:25:49 -05:00
German Andryeyev f96e973378 SWDEV-257787 - Add engine tracking per signal
- The logic will trace compute, sdma read/write operations and
apply signals when necessary
- ROC_CPU_WAIT_FOR_SIGNAL, ROC_SYSTEM_SCOPE_SIGNAL
and ROC_SKIP_COPY_SYNC were added to control the tracking

Change-Id: I9e8e6174c63bf7784f7ab00964e2918c8667d364


[ROCm/clr commit: dbc7abaecf]
2021-01-25 12:34:45 -05:00
German Andryeyev 1d26696235 SWDEV-257787 - Reset active signal if ROCR call failed
- ROCR fails the call for some reason, then the signal will
become invalid and can hang on a wait. The logic will reset the
active signal in such cases

Change-Id: Ia131420200f1bbd7c9a162b8f1b06db8cecf41c6


[ROCm/clr commit: ce2e5eba6b]
2021-01-21 17:29:34 -05:00
German Andryeyev 1086195745 SWDEV-268381 - Enable wait on CPU before SDMA transfer
- There is a performance regression with a HW wait for HSA signal
on ROCr async operation. For now move the logic back to CPU wait.

- Fix profiling issue with multiple HSA signal per single timestamp
object. Some copies require multiple ROCR calls and if profiling is
required, then the execution time is derived from all used signals.

Change-Id: Id003e4abb8c2de378eedc152a7e389500fc6f4ce


[ROCm/clr commit: 5a8946190a]
2021-01-19 18:24:21 -05:00
Tony Tye 902cf1a239 Update code object handling for GSL, PAL and ROCm
- Correct GSL path to report targets using the TargetID syntax.

- Correct GSL path to check compatibility of code objects when
  loading.

- Add concept of an device isa and create a registery used by ROCm,
  PAL and GSL.

- Support XNACK and SRAMECC target features consistently for PAL and ROCm.

- Correct logic for NullDevices and asserts to avoid memory coruption.

- Allow all NullDevices to be created for HIP.

- Numerous other code improvements.

Change-Id: I40abf3d2b22249c1492d1af5919665f8184f4e0e


[ROCm/clr commit: c7e8d91e14]
2021-01-14 11:11:51 -05:00
German Andryeyev 30cf81fc93 Add HSA signal global tracking logic.
Implement the global class for signals tracking per device queue.
Switch to the new tracking mechanism.

Change-Id: I3c4dda04b34e6d18d6a95510d84102909633b415


[ROCm/clr commit: 8698aeef0d]
2021-01-08 12:57:33 -05:00
German Andryeyev 5bc740fc5e Update comments in the code
Make sure the comments in the code match the actual behavior.
HDP read has internal HDP read cache and doesn't use L2.

Change-Id: I667a4643b0e0d6529008f5e1a0a3269456c55b4e


[ROCm/clr commit: d524514f6a]
2020-12-17 09:43:23 -05:00
Payam 53d3c09599 SWDEV-257937 - ROC_BARRIER_SYNC fix for missing SDMA flush
Change-Id: I93e8902bfcb16bac8ea594e16ea397b1ceafbd79


[ROCm/clr commit: f134b90199]
2020-12-15 00:54:33 -05:00
German Andryeyev f7cf40fc02 Add L2 flush/invalidate after CPU copy
CPU read updates L2 with the latest values and requires
invalidation after, because SDMA doesn't use L2 and data can become
out of sync.

Change-Id: I98d1c91ca78a103fa5409e638f97485d62d5b11e


[ROCm/clr commit: 18a821acde]
2020-12-11 23:05:49 -05:00
German Andryeyev 7df8e0bcb3 Correct reported info in ROC profiler
OCL can't distinguish different copy types, but ROC profiler
expects SDMA transfer visibility. Add extra code to detect
a transfer with the host memory and substitute OCL command

Change-Id: I5290acd0e10bc082e00c1d4ae1474a075de7f165


[ROCm/clr commit: bd340d8cbf]
2020-10-23 18:29:48 -04:00
German Andryeyev 2c21a44b40 Add option to skip AQL barrier
The change reuses HSA signals for dispatches as a wait signal.
Skipping the barrier requires to  disable L2 cache for sysmem
allocations and extra tracking for HDP access with the large bar.
ROC_BARRIER_SYNC=0 activates the new logic. Barrier sync is
still used by default.
ROC_ACTIVE_WAIT=1 enables unconditional active wait in ROCr.
The change also consolidated ROCr wait logic under single function.

Change-Id: I6bd1be30aa88258da1b1f9de319ef5a45852afd8


[ROCm/clr commit: d9397590de]
2020-10-06 08:37:12 -04:00
Alex Xie e0cb881d91 SWDEV-249516 - [Lnx][Navi][rocm]conformance image read write tests data error
Change-Id: Ie1c4fda953198b49ed66fea9da23e62c686d9cea


[ROCm/clr commit: 7e8f7b5927]
2020-09-01 17:20:58 -04:00
Tao Sang 44eb207f8d Apply constexpr on global constant varaibles
When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be
referenced but they are not initialized in that early time. The patch
will use constexpr to initialze global constant varables in compile
time.

Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef


[ROCm/clr commit: fdef6f722f]
2020-07-22 22:14:13 -04:00
Jatin Chaudhary 260a83c546 Replacing deprecated HSA API calls with newer ones
Change-Id: Iebe2c00e717ab0e47c61611752b717966c719994


[ROCm/clr commit: cd1e364911]
2020-07-08 00:32:24 -04:00
Vlad Sytchenko 0b3fc7bc5d Fix some -Wunused-but-set-variable warnings
Change-Id: I281583b5abdfc09d5dd8b7dfb20b8821581db193


[ROCm/clr commit: 5b9af8f28d]
2020-06-15 17:51:01 -04:00
German Andryeyev e20f40119c Fix async mem clear
Optimization for the fence release removed a sync for mem fill.
Add simple const buffer management forr the filled pattern to avoid
pattern overwriting with the async fills.

Change-Id: I63773ac09ceec31d5396d24570e4647ff096326b


[ROCm/clr commit: 2ce6bbebc4]
2020-05-20 11:13:41 -04:00
Jason Tang f94e958680 Add major/minor/stepping to device layer
Change-Id: If82ea55a46b166b243a98089a6e9c40ccfdb479f


[ROCm/clr commit: cd2a713d63]
2020-05-17 12:57:34 -04:00
Christophe Paquot 23e520003c Use system scope for packet following sdma copies
SWDEV-234947
SWDEV-236298
Instead of forcing a barrier packet, just inject system scope on the next packet.

Change-Id: If9bcee23e08dfe5db731235e2fcb30582cbd4c1c


[ROCm/clr commit: 6a5af4056e]
2020-05-15 12:20:06 -04:00
Christophe Paquot b6ecb9ce82 Add gpu().hasPendingDispatch() in the SDMA path
SWDEV-234947

Change-Id: I8aa501f8755d136708b0d12ee3c30229c238660d


[ROCm/clr commit: 2a02026696]
2020-05-08 18:19:51 -04:00
Michael LIAO b785d25506 Clear executable permission.
Change-Id: Ia0d363b1ba89d7947e5b5a55cb67edba86f0515e


[ROCm/clr commit: 503ef06555]
2020-05-07 10:38:58 -04:00
Alex Xie 1e75874295 SWDEV-234684 - hipmemcpy optimization does not work in tests
Change-Id: I899d172c5b2af88c796fe9a36f97d15ac45caf94


[ROCm/clr commit: bfbc8cd09b]
2020-05-05 15:58:03 -04:00
German Andryeyev 6610976c7d Optimize synch operations
- Stall the queue only for HSA copy operations

Change-Id: Ia3debcc0f36284c5f8cd2776d31674f3aeed04ea


[ROCm/clr commit: 7302ebcfbc]
2020-04-30 11:17:48 -04:00
Alex Xie a97194485a SWDEV-232894 Port hipMemcpy optimizations from HCC to VDI
Apply the optimization to change for OpenCL too.
Clean up some unnecessary checks.

Change-Id: I840261fe35baeeadeba7388e86779d482f509aad


[ROCm/clr commit: 6c5a42b33c]
2020-04-30 11:06:28 -04:00
Saleel Kudchadker 6b7c6748b1 Add a threshold for forcing ROCr to take blit path
This workaround is to avoid performance penalty of SDMA engine
taking a while to clock up from a lower DPM state. Add env var
GPU_FORCE_BLIT_COPY_SIZE (1024 by default for HIP in KB). Forcing
Src and Dst agent to be amdgpu makes ROCr take blit copy path for
what otherwise should have been SDMA copy

Change-Id: I222f687155f86000d17d66d25182e490b6710463


[ROCm/clr commit: 5f64e6e7ad]
2020-04-28 17:11:24 -04:00