2
0
Gráfico de cometimentos

66 Cometimentos

Autor(a) SHA1 Mensagem Data
jatang b798c85272 SWDEV-380792 - Fix floating point exception when maxEngineClockFrequency_ is 0
Change-Id: Ic443ceae586c4c84995ed2abef9bd7f32f8b60f9
2023-02-07 11:43:10 -05:00
German Andryeyev b23c759746 SWDEV-372790 - Copy AQL packet from runtime setup
Scheduler in device queue requires relaunching itself. Make sure
scheduler uses exactly the same AQL packet as the host launch.

Change-Id: I4eb03c4c91bf2408a6d4607731f081a2e2c2c8ae
2023-01-24 10:25:45 -05:00
Jaydeep Patel 1e4a4162ff SWDEV-378157 - Correct log message
Change-Id: I6297693f67ae78a8874b976ac03353a81b728b1d
2023-01-23 12:06:18 -05:00
Saleel Kudchadker 033d4c0463 SWDEV-345213 - Fix staged line-by-line copy path
- Address an old bug in offset calculation that was causing out of bound
access.
- Improve logging

Change-Id: Iebdf34dddaa5e987cc72184a2152918adc6a96e0
2023-01-16 11:04:30 -05:00
Anusha GodavarthySurya 274f2de391 SWDEV-364576 - initialize device malloc heap state using blit kernel
Change-Id: I5d0172aff7d2c04b322a4d828b8a2b438158b80f
2023-01-07 06:53:53 +00:00
Jaydeep Patel 070ae4e6d4 SWDEV-374370 - Propogate element size to blit kernel.
Change-Id: I06d1ae6feebd238e9a63c617eb4c4dcf485d9ee0
2022-12-26 09:33:50 +00:00
Saleel Kudchadker e0384f9f6b SWDEV-373334 - Use copyMetadata for blit decisions
- Check isAsync flag for small host copies on large bar as it synchronizes
- Use CopyEngine Preference hint if HMM is enabled.

Change-Id: I1ffc4b2604ed03cf5979cdc454178648c5ae5cba
2022-12-15 17:09:02 -05:00
Ioannis Assiouras 72b45e2a1f SWDEV-369581 - Convey copy API metadata to ROCclr
Change-Id: I569462d6d268700d419510255e201bf7d80d6714
2022-12-09 00:27:15 -05:00
Saleel Kudchadker feca11d5e3 SWDEV-301667 - Improve logging
Change-Id: Ifa6da876b85cb503967cf09aac6d477b10db8e63
2022-11-04 18:23:18 -04:00
Saleel Kudchadker 175ad024d3 SWDEV-260345 - Manage constant buffer for blit
- Leverage managed buffer that would use chunks for fill pattern. Use a
different chunk for the next fill to avoid wait

Change-Id: I254483c867e112f66564ffd8f55e0a605d8896c9
2022-07-12 12:41:02 -04:00
Saleel Kudchadker faaa41aab8 SWDEV-335626 - Use ROCr copy for IPC
Detect IPC buffer and use ROCr copy api instead of blit

Change-Id: Ie6bdd6fc45dbd7457611011d81570b53d5fd5276
2022-07-08 13:32:19 -04:00
Ajay d2f837d25f SWDEV-332522 - streamOpsWrite & streamOpsWait to accept memory offset
Change-Id: I4b6ecb4d80c093d038d86616a637c4bb465ae24e
2022-04-25 14:59:36 -04:00
Jason Tang ed7737564e SWDEV-324411 - Use blit kernel for copyBufferRect if atomic is not supported
Change-Id: I2e110fd3418117ee9c7ede379244d2c6c4f248b7
2022-04-24 11:41:16 -04:00
kjayapra-amd 7fb80a027a SWDEV-305527 - Changes to handle memset blit kernel that takes width, height and depth. This also fixes SWDEV-317261.
Change-Id: Ic85f63a95d9d8f48884fc8c7fd95cbb496dfbbca
2022-03-31 09:02:33 -04:00
Satyanvesh Dittakavi c1b95b09bf SWDEV-326397 - P2P copies to take SDMA path if there is no pending dispatch
Change-Id: I50cfb8d77f7882151a20a1de7aaf5219b1695b7d
2022-03-29 14:59:11 +00:00
German Andryeyev 3fd4a67670 SWDEV-316824 - Fix P2P compute copy path
Use device memory object for the GPU VA address look-up.

Change-Id: I76bf58b29205f7b3ba1bf68e9fcca69421267203
2022-02-15 13:20:13 -05:00
Satyanvesh Dittakavi e20dd61932 SWDEV-306939 - Fix vdi errors/warnings by CppCheck
Change-Id: I56d910f8363787f1050d5d7e8064ed553c5827fd
2022-01-12 00:22:16 -05:00
German Andryeyev 008133cf41 SWDEV-305016 - Improve MGPU scaling in Tensorflow
Add a threshold for ROCR/SDMA P2P transfers. ROCR copy path
requires extra barriers in compute for synchronization. That costs
extra performance with tiny transfers.
Reduce active wait time to 10us. Tensorflow uses extra thread
per GPU with constant hipEventQuery() calls. Longer active waits
in ROCr affect CPU performance.

Change-Id: I9020358438615fa2d4617f862f00a562f0a588e7
2021-12-08 11:59:37 -05:00
kjayapra-amd d4ad981c0c SWDEV-312822 - Fix the globalWorkSize to number of sizeof(var) instead of bytes.
Change-Id: Ic6b2bbb2e8d4cb6aa8d906d4b93cd06a176160d8
2021-11-29 17:36:11 -05:00
kjayapra-amd 2e9bc8f793 SWDEV-312822 - Revert "SWDEV-310187 - Change flag to keep track of aligned sizes instead of expanded patterns."
This reverts commit 8307886644.

Change-Id: I022c2a8375f9929e9723cec66e1e0b960263fc39
2021-11-28 23:39:40 -05:00
German Andryeyev 6f2e7c3199 SWDEV-313126 - Use data() method for the base array address
Reference for the first element can trigger an assert with
_GLIBCXX_ASSERTIONS build

Change-Id: I59c63c052831307edfe5dcc6384798a43e9596dd
2021-11-26 09:51:57 -05:00
kjayapra-amd 8307886644 SWDEV-310187 - Change flag to keep track of aligned sizes instead of expanded patterns.
Change-Id: I763feda8688bb1b7b11033a2a8cba0f69f07167d
2021-11-19 10:32:40 -05:00
Bing Ma 02f939a40d SWDEV-306602 - [SANITIZER_AMDGPU] Force copyBuffer to use ROCr functions when ASAN is ON
Change-Id: I04a4cdd5ab8c5543f2a0f08c139c45ac7aebe64a
2021-10-14 12:55:27 -04:00
kjayapra-amd 88ed58735d SWDEV-232903 - Move hipmemset Dword optimization to ROCclr.
Change-Id: I3eae61720cbc6364f1aaac4865bfd8b6ded08097
2021-10-13 11:32:15 -04:00
Jason Tang 55a0cf0b0c SWDEV-306697 - Fix OCLGlobalOffset segfaults
If we don't create the __amd_rocclr_gwsInit kernel, we still want
to create the rest of the image related blit kernels.

Change-Id: I8bc4645f9f9116eeecbb8b22e981ac4d520f3121
2021-10-12 15:13:28 -04:00
kjayapra-amd 7413b7f79b SWDEV-294420 - Ignore Image blit kernels if image instructions are not supported.
Change-Id: I145172672b0b032aa722649b0c4ca9267e3e5c85
2021-10-05 18:12:44 -04:00
Sourabh cbb8d82bdb SWDEV-292525 - [vdi] Path to streamOps shaders
Implementation to use a blit kernel to perform
a hipStreamWait/write instead of an AQL packet.

Change-Id: I462671ed5cec37144dfe97ff66439249196117c1
2021-09-27 13:59:35 -04:00
Saleel Kudchadker 21ba34d0fe SWDEV-297448 - Add 64bit and 16bit write support
For the fillBuffer shader, if there are two 32bit writes to a MMIO
register, it can get dropped. It has to be a single 64bit write.
Add optimization to fillBuffer to write 64bit and 16bit writes.

Change-Id: I3aa78e027898f8ae01e9c8f09004615673720c2b
2021-09-08 12:30:04 -04:00
Sarbojit Sarkar 42d33029dc SWDEV-300655 - Added thread ID to hip trace
Change-Id: I9234d4ec93e7687cd0a5d1bd930bd4f80936311b
2021-09-06 00:22:42 -04:00
agunashe d96481fb36 SWDEV-293742 - Update copyright end year VDI repo
Change-Id: I69d2fea4a7a43adf96ccea794270e4af991c5261
2021-08-22 23:56:07 -07:00
German Andryeyev fa2e154a8b SWDEV-278894 - Use GPU waits for HIP events
Save HW events in amd::Event.
Use HW events for synchronization

Change-Id: I98cf9c2d0ec3c7fcaf254b749ac6c568d7270ae0
2021-05-25 13:41:15 -04:00
Saleel Kudchadker 42b8236f93 SWDEV-280773 - Additional logging for signals
Cleanup new lines in debug log

Change-Id: I6862c332eb9457b51e23cf4e9db9ba3f870d0c39
2021-04-30 15:05:57 -07:00
Saleel Kudchadker aa38af8c96 SWDEV-276120 - Remove support for barrier sync
ROC_BARRIER_SYNC will not work with direct dispatch.
Remove and cleanup.

Change-Id: I81368b2e65039477bd0343bb92708dab48867db6
2021-04-07 17:08:39 -04:00
Ravi C Akkenapally 0a5f9a3b10 SWDEV-179105 - Stream Operations: Add support for Wait and Write
Change-Id: Ibffa1d6d573826b64763da280074a77271d66808
2021-02-15 17:02:38 -08:00
Payam a2e0b0495c SWDEV-257937 - Updated fix for ROC_BARRIER_SYNC=0
Change-Id: I7e28e541b654db57fb0890d7dbb7519cfb2d93db
2021-02-11 14:01:45 -05:00
Saleel Kudchadker 629a2d8ef3 SWDEV-257787 - Add log for tracking copy signals
Change-Id: I713e8463916a85a634a1ec2309bbd46a11c461a8
2021-01-28 13:25:49 -05:00
German Andryeyev dbc7abaecf SWDEV-257787 - Add engine tracking per signal
- The logic will trace compute, sdma read/write operations and
apply signals when necessary
- ROC_CPU_WAIT_FOR_SIGNAL, ROC_SYSTEM_SCOPE_SIGNAL
and ROC_SKIP_COPY_SYNC were added to control the tracking

Change-Id: I9e8e6174c63bf7784f7ab00964e2918c8667d364
2021-01-25 12:34:45 -05:00
German Andryeyev ce2e5eba6b SWDEV-257787 - Reset active signal if ROCR call failed
- ROCR fails the call for some reason, then the signal will
become invalid and can hang on a wait. The logic will reset the
active signal in such cases

Change-Id: Ia131420200f1bbd7c9a162b8f1b06db8cecf41c6
2021-01-21 17:29:34 -05:00
German Andryeyev 5a8946190a SWDEV-268381 - Enable wait on CPU before SDMA transfer
- There is a performance regression with a HW wait for HSA signal
on ROCr async operation. For now move the logic back to CPU wait.

- Fix profiling issue with multiple HSA signal per single timestamp
object. Some copies require multiple ROCR calls and if profiling is
required, then the execution time is derived from all used signals.

Change-Id: Id003e4abb8c2de378eedc152a7e389500fc6f4ce
2021-01-19 18:24:21 -05:00
Tony Tye c7e8d91e14 Update code object handling for GSL, PAL and ROCm
- Correct GSL path to report targets using the TargetID syntax.

- Correct GSL path to check compatibility of code objects when
  loading.

- Add concept of an device isa and create a registery used by ROCm,
  PAL and GSL.

- Support XNACK and SRAMECC target features consistently for PAL and ROCm.

- Correct logic for NullDevices and asserts to avoid memory coruption.

- Allow all NullDevices to be created for HIP.

- Numerous other code improvements.

Change-Id: I40abf3d2b22249c1492d1af5919665f8184f4e0e
2021-01-14 11:11:51 -05:00
German Andryeyev 8698aeef0d Add HSA signal global tracking logic.
Implement the global class for signals tracking per device queue.
Switch to the new tracking mechanism.

Change-Id: I3c4dda04b34e6d18d6a95510d84102909633b415
2021-01-08 12:57:33 -05:00
German Andryeyev d524514f6a Update comments in the code
Make sure the comments in the code match the actual behavior.
HDP read has internal HDP read cache and doesn't use L2.

Change-Id: I667a4643b0e0d6529008f5e1a0a3269456c55b4e
2020-12-17 09:43:23 -05:00
Payam f134b90199 SWDEV-257937 - ROC_BARRIER_SYNC fix for missing SDMA flush
Change-Id: I93e8902bfcb16bac8ea594e16ea397b1ceafbd79
2020-12-15 00:54:33 -05:00
German Andryeyev 18a821acde Add L2 flush/invalidate after CPU copy
CPU read updates L2 with the latest values and requires
invalidation after, because SDMA doesn't use L2 and data can become
out of sync.

Change-Id: I98d1c91ca78a103fa5409e638f97485d62d5b11e
2020-12-11 23:05:49 -05:00
German Andryeyev bd340d8cbf Correct reported info in ROC profiler
OCL can't distinguish different copy types, but ROC profiler
expects SDMA transfer visibility. Add extra code to detect
a transfer with the host memory and substitute OCL command

Change-Id: I5290acd0e10bc082e00c1d4ae1474a075de7f165
2020-10-23 18:29:48 -04:00
German Andryeyev d9397590de Add option to skip AQL barrier
The change reuses HSA signals for dispatches as a wait signal.
Skipping the barrier requires to  disable L2 cache for sysmem
allocations and extra tracking for HDP access with the large bar.
ROC_BARRIER_SYNC=0 activates the new logic. Barrier sync is
still used by default.
ROC_ACTIVE_WAIT=1 enables unconditional active wait in ROCr.
The change also consolidated ROCr wait logic under single function.

Change-Id: I6bd1be30aa88258da1b1f9de319ef5a45852afd8
2020-10-06 08:37:12 -04:00
Alex Xie 7e8f7b5927 SWDEV-249516 - [Lnx][Navi][rocm]conformance image read write tests data error
Change-Id: Ie1c4fda953198b49ed66fea9da23e62c686d9cea
2020-09-01 17:20:58 -04:00
Tao Sang fdef6f722f Apply constexpr on global constant varaibles
When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be
referenced but they are not initialized in that early time. The patch
will use constexpr to initialze global constant varables in compile
time.

Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef
2020-07-22 22:14:13 -04:00
Jatin Chaudhary cd1e364911 Replacing deprecated HSA API calls with newer ones
Change-Id: Iebe2c00e717ab0e47c61611752b717966c719994
2020-07-08 00:32:24 -04:00
Vlad Sytchenko 5b9af8f28d Fix some -Wunused-but-set-variable warnings
Change-Id: I281583b5abdfc09d5dd8b7dfb20b8821581db193
2020-06-15 17:51:01 -04:00