rocm-systems

Autor	SHA1	Nachricht	Datum
Saleel Kudchadker	b2b8545eb6	SWDEV-260345 - Manage constant buffer for blit - Leverage managed buffer that would use chunks for fill pattern. Use a different chunk for the next fill to avoid wait Change-Id: I254483c867e112f66564ffd8f55e0a605d8896c9 [ROCm/clr commit: `175ad024d3`]	2022-07-12 12:41:02 -04:00
Saleel Kudchadker	7fd80925cd	SWDEV-335626 - Use ROCr copy for IPC Detect IPC buffer and use ROCr copy api instead of blit Change-Id: Ie6bdd6fc45dbd7457611011d81570b53d5fd5276 [ROCm/clr commit: `faaa41aab8`]	2022-07-08 13:32:19 -04:00
Ajay	9fcc7a7219	SWDEV-332522 - streamOpsWrite & streamOpsWait to accept memory offset Change-Id: I4b6ecb4d80c093d038d86616a637c4bb465ae24e [ROCm/clr commit: `d2f837d25f`]	2022-04-25 14:59:36 -04:00
Jason Tang	7bdbf61a9d	SWDEV-324411 - Use blit kernel for copyBufferRect if atomic is not supported Change-Id: I2e110fd3418117ee9c7ede379244d2c6c4f248b7 [ROCm/clr commit: `ed7737564e`]	2022-04-24 11:41:16 -04:00
kjayapra-amd	31c0525344	SWDEV-305527 - Changes to handle memset blit kernel that takes width, height and depth. This also fixes SWDEV-317261. Change-Id: Ic85f63a95d9d8f48884fc8c7fd95cbb496dfbbca [ROCm/clr commit: `7fb80a027a`]	2022-03-31 09:02:33 -04:00
Satyanvesh Dittakavi	acfa45bd5c	SWDEV-326397 - P2P copies to take SDMA path if there is no pending dispatch Change-Id: I50cfb8d77f7882151a20a1de7aaf5219b1695b7d [ROCm/clr commit: `c1b95b09bf`]	2022-03-29 14:59:11 +00:00
German Andryeyev	a7be0eb56a	SWDEV-316824 - Fix P2P compute copy path Use device memory object for the GPU VA address look-up. Change-Id: I76bf58b29205f7b3ba1bf68e9fcca69421267203 [ROCm/clr commit: `3fd4a67670`]	2022-02-15 13:20:13 -05:00
Satyanvesh Dittakavi	85c2cac111	SWDEV-306939 - Fix vdi errors/warnings by CppCheck Change-Id: I56d910f8363787f1050d5d7e8064ed553c5827fd [ROCm/clr commit: `e20dd61932`]	2022-01-12 00:22:16 -05:00
German Andryeyev	5ad02b78c4	SWDEV-305016 - Improve MGPU scaling in Tensorflow Add a threshold for ROCR/SDMA P2P transfers. ROCR copy path requires extra barriers in compute for synchronization. That costs extra performance with tiny transfers. Reduce active wait time to 10us. Tensorflow uses extra thread per GPU with constant hipEventQuery() calls. Longer active waits in ROCr affect CPU performance. Change-Id: I9020358438615fa2d4617f862f00a562f0a588e7 [ROCm/clr commit: `008133cf41`]	2021-12-08 11:59:37 -05:00
kjayapra-amd	f75cfb049a	SWDEV-312822 - Fix the globalWorkSize to number of sizeof(var) instead of bytes. Change-Id: Ic6b2bbb2e8d4cb6aa8d906d4b93cd06a176160d8 [ROCm/clr commit: `d4ad981c0c`]	2021-11-29 17:36:11 -05:00
kjayapra-amd	f74515778c	SWDEV-312822 - Revert "SWDEV-310187 - Change flag to keep track of aligned sizes instead of expanded patterns." This reverts commit `7220267211`. Change-Id: I022c2a8375f9929e9723cec66e1e0b960263fc39 [ROCm/clr commit: `2e9bc8f793`]	2021-11-28 23:39:40 -05:00
German Andryeyev	b0b0c3049f	SWDEV-313126 - Use data() method for the base array address Reference for the first element can trigger an assert with _GLIBCXX_ASSERTIONS build Change-Id: I59c63c052831307edfe5dcc6384798a43e9596dd [ROCm/clr commit: `6f2e7c3199`]	2021-11-26 09:51:57 -05:00
kjayapra-amd	7220267211	SWDEV-310187 - Change flag to keep track of aligned sizes instead of expanded patterns. Change-Id: I763feda8688bb1b7b11033a2a8cba0f69f07167d [ROCm/clr commit: `8307886644`]	2021-11-19 10:32:40 -05:00
Bing Ma	213b5dffd1	SWDEV-306602 - [SANITIZER_AMDGPU] Force copyBuffer to use ROCr functions when ASAN is ON Change-Id: I04a4cdd5ab8c5543f2a0f08c139c45ac7aebe64a [ROCm/clr commit: `02f939a40d`]	2021-10-14 12:55:27 -04:00
kjayapra-amd	6f62f832cb	SWDEV-232903 - Move hipmemset Dword optimization to ROCclr. Change-Id: I3eae61720cbc6364f1aaac4865bfd8b6ded08097 [ROCm/clr commit: `88ed58735d`]	2021-10-13 11:32:15 -04:00
Jason Tang	e4db6ef66a	SWDEV-306697 - Fix OCLGlobalOffset segfaults If we don't create the __amd_rocclr_gwsInit kernel, we still want to create the rest of the image related blit kernels. Change-Id: I8bc4645f9f9116eeecbb8b22e981ac4d520f3121 [ROCm/clr commit: `55a0cf0b0c`]	2021-10-12 15:13:28 -04:00
kjayapra-amd	cfb15c6c5d	SWDEV-294420 - Ignore Image blit kernels if image instructions are not supported. Change-Id: I145172672b0b032aa722649b0c4ca9267e3e5c85 [ROCm/clr commit: `7413b7f79b`]	2021-10-05 18:12:44 -04:00
Sourabh	936e0836a8	SWDEV-292525 - [vdi] Path to streamOps shaders Implementation to use a blit kernel to perform a hipStreamWait/write instead of an AQL packet. Change-Id: I462671ed5cec37144dfe97ff66439249196117c1 [ROCm/clr commit: `cbb8d82bdb`]	2021-09-27 13:59:35 -04:00
Saleel Kudchadker	36ec8c8871	SWDEV-297448 - Add 64bit and 16bit write support For the fillBuffer shader, if there are two 32bit writes to a MMIO register, it can get dropped. It has to be a single 64bit write. Add optimization to fillBuffer to write 64bit and 16bit writes. Change-Id: I3aa78e027898f8ae01e9c8f09004615673720c2b [ROCm/clr commit: `21ba34d0fe`]	2021-09-08 12:30:04 -04:00
Sarbojit Sarkar	45953e81dd	SWDEV-300655 - Added thread ID to hip trace Change-Id: I9234d4ec93e7687cd0a5d1bd930bd4f80936311b [ROCm/clr commit: `42d33029dc`]	2021-09-06 00:22:42 -04:00
agunashe	49f0546637	SWDEV-293742 - Update copyright end year VDI repo Change-Id: I69d2fea4a7a43adf96ccea794270e4af991c5261 [ROCm/clr commit: `d96481fb36`]	2021-08-22 23:56:07 -07:00
German Andryeyev	3e36acd579	SWDEV-278894 - Use GPU waits for HIP events Save HW events in amd::Event. Use HW events for synchronization Change-Id: I98cf9c2d0ec3c7fcaf254b749ac6c568d7270ae0 [ROCm/clr commit: `fa2e154a8b`]	2021-05-25 13:41:15 -04:00
Saleel Kudchadker	6c1f022834	SWDEV-280773 - Additional logging for signals Cleanup new lines in debug log Change-Id: I6862c332eb9457b51e23cf4e9db9ba3f870d0c39 [ROCm/clr commit: `42b8236f93`]	2021-04-30 15:05:57 -07:00
Saleel Kudchadker	6c304e4027	SWDEV-276120 - Remove support for barrier sync ROC_BARRIER_SYNC will not work with direct dispatch. Remove and cleanup. Change-Id: I81368b2e65039477bd0343bb92708dab48867db6 [ROCm/clr commit: `aa38af8c96`]	2021-04-07 17:08:39 -04:00
Ravi C Akkenapally	6629930067	SWDEV-179105 - Stream Operations: Add support for Wait and Write Change-Id: Ibffa1d6d573826b64763da280074a77271d66808 [ROCm/clr commit: `0a5f9a3b10`]	2021-02-15 17:02:38 -08:00
Payam	72b49f0800	SWDEV-257937 - Updated fix for ROC_BARRIER_SYNC=0 Change-Id: I7e28e541b654db57fb0890d7dbb7519cfb2d93db [ROCm/clr commit: `a2e0b0495c`]	2021-02-11 14:01:45 -05:00
Saleel Kudchadker	7a08212ce1	SWDEV-257787 - Add log for tracking copy signals Change-Id: I713e8463916a85a634a1ec2309bbd46a11c461a8 [ROCm/clr commit: `629a2d8ef3`]	2021-01-28 13:25:49 -05:00
German Andryeyev	f96e973378	SWDEV-257787 - Add engine tracking per signal - The logic will trace compute, sdma read/write operations and apply signals when necessary - ROC_CPU_WAIT_FOR_SIGNAL, ROC_SYSTEM_SCOPE_SIGNAL and ROC_SKIP_COPY_SYNC were added to control the tracking Change-Id: I9e8e6174c63bf7784f7ab00964e2918c8667d364 [ROCm/clr commit: `dbc7abaecf`]	2021-01-25 12:34:45 -05:00
German Andryeyev	1d26696235	SWDEV-257787 - Reset active signal if ROCR call failed - ROCR fails the call for some reason, then the signal will become invalid and can hang on a wait. The logic will reset the active signal in such cases Change-Id: Ia131420200f1bbd7c9a162b8f1b06db8cecf41c6 [ROCm/clr commit: `ce2e5eba6b`]	2021-01-21 17:29:34 -05:00
German Andryeyev	1086195745	SWDEV-268381 - Enable wait on CPU before SDMA transfer - There is a performance regression with a HW wait for HSA signal on ROCr async operation. For now move the logic back to CPU wait. - Fix profiling issue with multiple HSA signal per single timestamp object. Some copies require multiple ROCR calls and if profiling is required, then the execution time is derived from all used signals. Change-Id: Id003e4abb8c2de378eedc152a7e389500fc6f4ce [ROCm/clr commit: `5a8946190a`]	2021-01-19 18:24:21 -05:00
Tony Tye	902cf1a239	Update code object handling for GSL, PAL and ROCm - Correct GSL path to report targets using the TargetID syntax. - Correct GSL path to check compatibility of code objects when loading. - Add concept of an device isa and create a registery used by ROCm, PAL and GSL. - Support XNACK and SRAMECC target features consistently for PAL and ROCm. - Correct logic for NullDevices and asserts to avoid memory coruption. - Allow all NullDevices to be created for HIP. - Numerous other code improvements. Change-Id: I40abf3d2b22249c1492d1af5919665f8184f4e0e [ROCm/clr commit: `c7e8d91e14`]	2021-01-14 11:11:51 -05:00
German Andryeyev	30cf81fc93	Add HSA signal global tracking logic. Implement the global class for signals tracking per device queue. Switch to the new tracking mechanism. Change-Id: I3c4dda04b34e6d18d6a95510d84102909633b415 [ROCm/clr commit: `8698aeef0d`]	2021-01-08 12:57:33 -05:00
German Andryeyev	5bc740fc5e	Update comments in the code Make sure the comments in the code match the actual behavior. HDP read has internal HDP read cache and doesn't use L2. Change-Id: I667a4643b0e0d6529008f5e1a0a3269456c55b4e [ROCm/clr commit: `d524514f6a`]	2020-12-17 09:43:23 -05:00
Payam	53d3c09599	SWDEV-257937 - ROC_BARRIER_SYNC fix for missing SDMA flush Change-Id: I93e8902bfcb16bac8ea594e16ea397b1ceafbd79 [ROCm/clr commit: `f134b90199`]	2020-12-15 00:54:33 -05:00
German Andryeyev	f7cf40fc02	Add L2 flush/invalidate after CPU copy CPU read updates L2 with the latest values and requires invalidation after, because SDMA doesn't use L2 and data can become out of sync. Change-Id: I98d1c91ca78a103fa5409e638f97485d62d5b11e [ROCm/clr commit: `18a821acde`]	2020-12-11 23:05:49 -05:00
German Andryeyev	7df8e0bcb3	Correct reported info in ROC profiler OCL can't distinguish different copy types, but ROC profiler expects SDMA transfer visibility. Add extra code to detect a transfer with the host memory and substitute OCL command Change-Id: I5290acd0e10bc082e00c1d4ae1474a075de7f165 [ROCm/clr commit: `bd340d8cbf`]	2020-10-23 18:29:48 -04:00
German Andryeyev	2c21a44b40	Add option to skip AQL barrier The change reuses HSA signals for dispatches as a wait signal. Skipping the barrier requires to disable L2 cache for sysmem allocations and extra tracking for HDP access with the large bar. ROC_BARRIER_SYNC=0 activates the new logic. Barrier sync is still used by default. ROC_ACTIVE_WAIT=1 enables unconditional active wait in ROCr. The change also consolidated ROCr wait logic under single function. Change-Id: I6bd1be30aa88258da1b1f9de319ef5a45852afd8 [ROCm/clr commit: `d9397590de`]	2020-10-06 08:37:12 -04:00
Alex Xie	e0cb881d91	SWDEV-249516 - [Lnx][Navi][rocm]conformance image read write tests data error Change-Id: Ie1c4fda953198b49ed66fea9da23e62c686d9cea [ROCm/clr commit: `7e8f7b5927`]	2020-09-01 17:20:58 -04:00
Tao Sang	44eb207f8d	Apply constexpr on global constant varaibles When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be referenced but they are not initialized in that early time. The patch will use constexpr to initialze global constant varables in compile time. Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef [ROCm/clr commit: `fdef6f722f`]	2020-07-22 22:14:13 -04:00
Jatin Chaudhary	260a83c546	Replacing deprecated HSA API calls with newer ones Change-Id: Iebe2c00e717ab0e47c61611752b717966c719994 [ROCm/clr commit: `cd1e364911`]	2020-07-08 00:32:24 -04:00
Vlad Sytchenko	0b3fc7bc5d	Fix some -Wunused-but-set-variable warnings Change-Id: I281583b5abdfc09d5dd8b7dfb20b8821581db193 [ROCm/clr commit: `5b9af8f28d`]	2020-06-15 17:51:01 -04:00
German Andryeyev	e20f40119c	Fix async mem clear Optimization for the fence release removed a sync for mem fill. Add simple const buffer management forr the filled pattern to avoid pattern overwriting with the async fills. Change-Id: I63773ac09ceec31d5396d24570e4647ff096326b [ROCm/clr commit: `2ce6bbebc4`]	2020-05-20 11:13:41 -04:00
Jason Tang	f94e958680	Add major/minor/stepping to device layer Change-Id: If82ea55a46b166b243a98089a6e9c40ccfdb479f [ROCm/clr commit: `cd2a713d63`]	2020-05-17 12:57:34 -04:00
Christophe Paquot	23e520003c	Use system scope for packet following sdma copies SWDEV-234947 SWDEV-236298 Instead of forcing a barrier packet, just inject system scope on the next packet. Change-Id: If9bcee23e08dfe5db731235e2fcb30582cbd4c1c [ROCm/clr commit: `6a5af4056e`]	2020-05-15 12:20:06 -04:00
Christophe Paquot	b6ecb9ce82	Add gpu().hasPendingDispatch() in the SDMA path SWDEV-234947 Change-Id: I8aa501f8755d136708b0d12ee3c30229c238660d [ROCm/clr commit: `2a02026696`]	2020-05-08 18:19:51 -04:00
Michael LIAO	b785d25506	Clear executable permission. Change-Id: Ia0d363b1ba89d7947e5b5a55cb67edba86f0515e [ROCm/clr commit: `503ef06555`]	2020-05-07 10:38:58 -04:00
Alex Xie	1e75874295	SWDEV-234684 - hipmemcpy optimization does not work in tests Change-Id: I899d172c5b2af88c796fe9a36f97d15ac45caf94 [ROCm/clr commit: `bfbc8cd09b`]	2020-05-05 15:58:03 -04:00
German Andryeyev	6610976c7d	Optimize synch operations - Stall the queue only for HSA copy operations Change-Id: Ia3debcc0f36284c5f8cd2776d31674f3aeed04ea [ROCm/clr commit: `7302ebcfbc`]	2020-04-30 11:17:48 -04:00
Alex Xie	a97194485a	SWDEV-232894 Port hipMemcpy optimizations from HCC to VDI Apply the optimization to change for OpenCL too. Clean up some unnecessary checks. Change-Id: I840261fe35baeeadeba7388e86779d482f509aad [ROCm/clr commit: `6c5a42b33c`]	2020-04-30 11:06:28 -04:00
Saleel Kudchadker	6b7c6748b1	Add a threshold for forcing ROCr to take blit path This workaround is to avoid performance penalty of SDMA engine taking a while to clock up from a lower DPM state. Add env var GPU_FORCE_BLIT_COPY_SIZE (1024 by default for HIP in KB). Forcing Src and Dst agent to be amdgpu makes ROCr take blit copy path for what otherwise should have been SDMA copy Change-Id: I222f687155f86000d17d66d25182e490b6710463 [ROCm/clr commit: `5f64e6e7ad`]	2020-04-28 17:11:24 -04:00

1 2

57 Commits