Граф коммитов

135 Коммитов

Автор SHA1 Сообщение Дата
German 5ed568998f SWDEV-349794 - Fix time accumulation
If the execution command had a split into multiple HW operations, then runtime has to accumulate time for all operations

Change-Id: Iaba31e96250918d8190bf63adb4c07730fdfefbf


[ROCm/clr commit: 24f5362296]
2022-08-24 09:53:54 -04:00
Maneesh Gupta 92f6e1a0d2 SWDEV-350289 - Fix build warnings due to file re-org
Change-Id: I0066fa163b9f25fdde4c5b3baed1ef0654390c06


[ROCm/clr commit: 289062682a]
2022-08-10 03:05:56 -04:00
Sarbojit Sarkar a0981a092b SWDEV-343921 - added Max stack size
Change-Id: I5c1a088e05215ca951afc9d92f8d298c5e3a65f1


[ROCm/clr commit: 27a08a132f]
2022-08-02 07:13:18 -04:00
German Andryeyev 110e3e68a0 SWDEV-340703 - Use different status value for the callback event
Change-Id: Ida725df53abfbf348b18e24c19edf011dc9192dd


[ROCm/clr commit: 6844b8c7e0]
2022-06-30 11:03:02 -04:00
Saleel Kudchadker b3ad41f6e4 SWDEV-335780 - Indicate if handler is queued
Maintain status of handler callback. For event records we no longer
submit callbacks to reduce the load on the async handler thread. However
without a callback we leak command memory/decrement refcounts. Indicate
status of the handler which we can use to queue a callback when
finish is called.

Change-Id: I89fd02f3d047a0e8162664ee17581a14795f1928


[ROCm/clr commit: 5df34a2f7a]
2022-06-14 20:55:06 -04:00
German Andryeyev acf2856677 SWDEV-339296 - Delay hidden heap allocation till the usage
Move hidden heap creation to the kernel launch to make sure it's
allocated on the actual first usage.

Change-Id: I1b65a82fc06d9129ed45a69765bf14ea3d945b04


[ROCm/clr commit: 4975f69337]
2022-06-14 12:18:34 -04:00
Sarbojit Sarkar ee5bcf6444 SWDEV-331066 - support for LimitStackSize
Change-Id: Ie6ae74f008b4f72de83663194aafb0ebdddfc8b6


[ROCm/clr commit: 51a00aeefe]
2022-05-19 00:24:06 -04:00
kjayapra-amd ae0b32126b SWDEV-331355 - Fixing the surface object on fillMemory function call.
Change-Id: Ieaa359ea8f31b0251d54b720469cdefde202579f


[ROCm/clr commit: 643ee46f28]
2022-05-04 14:24:03 -04:00
Saleel Kudchadker d9c2aee526 SWDEV-334152 - Set release as systemscope
Set release scope as system for dispatch AQL when events are passed to
hip*LaunchKernelGGL*

Change-Id: I93b91591e0ab023f1ecc5247f7905eca26147358


[ROCm/clr commit: 02566677cf]
2022-04-29 13:19:29 -04:00
German Andryeyev d5bc650de9 SWDEV-307184 - Fix a regression from dafc64ea
Disable hostcall buffer in OCL for now. COv5 can add hostcallbuffer
metadata for unknown reason. OCL may fail the buffer allocation
and kernel launch.

Change-Id: I34a6a45bac86c57422b764c0d69760c96920d6c5


[ROCm/clr commit: 934149ff0a]
2022-04-28 11:57:48 -04:00
Ajay 9fcc7a7219 SWDEV-332522 - streamOpsWrite & streamOpsWait to accept memory offset
Change-Id: I4b6ecb4d80c093d038d86616a637c4bb465ae24e


[ROCm/clr commit: d2f837d25f]
2022-04-25 14:59:36 -04:00
Jason Tang 7bdbf61a9d SWDEV-324411 - Use blit kernel for copyBufferRect if atomic is not supported
Change-Id: I2e110fd3418117ee9c7ede379244d2c6c4f248b7


[ROCm/clr commit: ed7737564e]
2022-04-24 11:41:16 -04:00
sdashmiz dafc64ea0a SWDEV-204804 - Detecing pcie atomic support
- check pcie atomci support for printf functionality
- if not enabled printf wont work

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: Ib366e8e71772b02210c4a830bca4bd8cc7a11664


[ROCm/clr commit: 15f1632dfa]
2022-04-22 08:53:16 -04:00
Julia Jiang 1320312a62 SWDEV-330164 - Fix in conformance svm_enqueue_api crash
Change-Id: I12eca6ca3e8d722b7534047fca79b289604aa2b0


[ROCm/clr commit: b1611e0123]
2022-04-20 13:20:18 -04:00
Saleel Kudchadker b306843e26 SWDEV-332512 - Signal pool changes
Create a new signal if the next set of signals are busy

Change-Id: I5108e68c88fe41e3a45bad4495ebdf3742e76dcd


[ROCm/clr commit: 9ec8a7306d]
2022-04-18 15:58:38 -04:00
Saleel Kudchadker cad3dfe4ec SWDEV-301667 - Separate scope from marker_ts_
Change-Id: I19f4d394e898bfb8c9d9a2c2edf9d5bf5def3b08


[ROCm/clr commit: b6cbfaf499]
2022-04-16 19:26:31 -04:00
German Andryeyev 4b4137ae63 SWDEV-332512 - Add ROC_SIGNAL_POOL_SIZE
Default value is 32 HSA signals in the pool.

Change-Id: Icb69413d3ff6ef228d9a9e22fd024e72c6d8ebe4


[ROCm/clr commit: 7975a07112]
2022-04-14 17:32:00 -04:00
Saleel Kudchadker 3d0100c5ab SWDEV-301667 - Add cache state for a device
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker

Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0


[ROCm/clr commit: 8eeaa998c0]
2022-04-12 12:27:31 -04:00
Maxime Chambonnet 38928e85c1 SWDEV-1 - ROC CLR typos
This is cherry-picked from this github issue:
https://github.com/ROCm-Developer-Tools/ROCclr/issues/28

Change-Id: I236f4f25a2dabe05883159af0fab0bad06ab0fd0


[ROCm/clr commit: d45794e985]
2022-04-11 14:24:39 -04:00
German Andryeyev 4715a87d44 SWDEV-307184 - Report 1 for unused dimensions
Remove assert for kernel arg size, because COv5 reports a value
bigger than the actual usage in the most of cases

Change-Id: I8e15bc45a9e21b58a5894f9977511ca84408ce61


[ROCm/clr commit: 2be0b1e612]
2022-04-08 13:43:37 -04:00
kjayapra-amd ba0119e933 SWDEV-331104 - Size passed to fillBuffer should not be 0.
Change-Id: Ifbc6047fafa0e55b5ab956cf3b7254c7e20b1e88


[ROCm/clr commit: b3b88ef926]
2022-04-08 09:29:55 -04:00
German Andryeyev e09245ceae SWDEV-307184 - Move local size calculation
With COv5 local size calculation must occur before
runtime programs kernel arguments

Change-Id: I0726c6529bde69b8fcf5360aa83986cf84e04168


[ROCm/clr commit: caa6110c29]
2022-04-05 11:19:51 -04:00
kjayapra-amd 2ab9ef0915 SWDEV-325776 - Adding device release scope for kernel dispatch packet
Change-Id: I8ea763f4c0239c410143b748c05822e9f6694412
(cherry picked from commit ec4894f8a27a3330b895a0ded385ab96f5ef242d)


[ROCm/clr commit: 378a427d8c]
2022-04-01 08:17:29 -04:00
kjayapra-amd 31c0525344 SWDEV-305527 - Changes to handle memset blit kernel that takes width, height and depth. This also fixes SWDEV-317261.
Change-Id: Ic85f63a95d9d8f48884fc8c7fd95cbb496dfbbca


[ROCm/clr commit: 7fb80a027a]
2022-03-31 09:02:33 -04:00
Saleel Kudchadker f99304adcd SWDEV-322225 - Use numa_allocate_bitmask
- Fix a crash with AMD_CPU_AFFINITY=1 as numa_bitmask_alloc isnt the
right api to allocate bitmask
- Do not set affinity for ROCr thread. It worsens performance rather
than any improvement.
- Fix regression from my previous change for event handler.

Change-Id: I3ea75adc2a6333f29752283eddd5b555e9b58cc5


[ROCm/clr commit: 802c2c8a9f]
2022-03-26 13:24:51 -04:00
Saleel Kudchadker 4dbec887a2 SWDEV-301667 - Selectively queue handler
- Queue handler for hipEventRecord(aka marker_ts_) only if there is a
callback associated with it.

Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a


[ROCm/clr commit: 3c3c0ca4c5]
2022-03-24 19:46:28 -04:00
German Andryeyev 7d5ed33e8f SWDEV-307185 - Create heap for device memory allocator
Pass the allocated heap with the kernel arguments

Change-Id: Icdec09b7f937845c39e21cbca7071dc3ba791af9


[ROCm/clr commit: 7b114a2b8b]
2022-03-04 00:44:41 -05:00
German Andryeyev c52280ae72 SWDEV-323702 - Use active queue for transfer
Pass active queue for transfers in the cache coherency layer.
That will allow to use device transfer queue only for
cases when active queue isn't available, because using device
transfer queue from another active queue may cause a deadlock

Change-Id: Ifbe7e0303b77dbf6eeda3939ffbc25a3df7472de


[ROCm/clr commit: 95d55fdfa8]
2022-02-18 09:10:53 -05:00
German Andryeyev 7c8a7ddf5e SWDEV-323364 - Fix a typo
Change-Id: I2031296ab9451342d5930b8b2d3d2e6277946647


[ROCm/clr commit: fbf531398a]
2022-02-17 20:50:29 -05:00
Saleel Kudchadker de9a5438b9 SWDEV-322605 - Fix infinite loop condition
If GlobalMemCacheLine reported is 0, runtime may run into an
infinite loop as the KernelSegmentAlignment is chosen as size of the
cache line.

Change-Id: Ide547940cc0407f16fab10ee210b4fd3ae4eaafc


[ROCm/clr commit: 041ddc0c1c]
2022-02-16 13:16:18 -05:00
German Andryeyev bd96ef9a34 SWDEV-307184 - Add support for the new metadata
Metadata in Codeobject version 5 is the extension of CO3 and CO4.
Add the detection of the new fields and program them in
the setup of the kernel arguments.

Change-Id: I27e58df77320ad00f4f16d35912668db803826af


[ROCm/clr commit: be6a06384e]
2022-02-07 14:05:58 -05:00
Satyanvesh Dittakavi 85c2cac111 SWDEV-306939 - Fix vdi errors/warnings by CppCheck
Change-Id: I56d910f8363787f1050d5d7e8064ed553c5827fd


[ROCm/clr commit: e20dd61932]
2022-01-12 00:22:16 -05:00
Saleel Kudchadker 500f6a6513 SWDEV-313306 - Fix Co-operative groups dtests
Add a state indicator to retain ExternalSignals when needed.
Co-operative group launch uses external signals to indicate a dependency
to the next command.

Change-Id: I6d0daa006e2377c3bbf4aeca0fd5b63c7ac8fbbb


[ROCm/clr commit: 1fbd75b825]
2021-12-17 12:41:37 -08:00
Saleel Kudchadker 42625f0527 SWDEV-313306 - Clear external signals
Crash was due to the fact that external signal structure was stale even
after destroyign the command. That is because we skipped wait due to a
missing check.
Detect external signals and dispatch a barrier in ReleaseGpuMemoryFence.
Also clear external_signals_ at ProfilingBegin.

Change-Id: I991387edcfe928b511bf5e780988ee131321ed5a


[ROCm/clr commit: 3239222516]
2021-12-13 23:03:33 -08:00
German Andryeyev 5ad02b78c4 SWDEV-305016 - Improve MGPU scaling in Tensorflow
Add a threshold for ROCR/SDMA P2P transfers. ROCR copy path
requires extra barriers in compute for synchronization. That costs
extra performance with tiny transfers.
Reduce active wait time to 10us. Tensorflow uses extra thread
per GPU with constant hipEventQuery() calls. Longer active waits
in ROCr affect CPU performance.

Change-Id: I9020358438615fa2d4617f862f00a562f0a588e7


[ROCm/clr commit: 008133cf41]
2021-12-08 11:59:37 -05:00
German Andryeyev 861b9fb84c SWDEV-294669 - Avoid stall when the new signal was created
Stall in the host thread could occur earlier than the app expects.
Make sure rutnime can grow the signals to the queue size without
any stall. Also adding a new signal to the end of the pool could
break the dependency chain on signal reuse. The new logic will
insert the new signal after current to keep the chain intact.

Change-Id: I9c90b98515907db8b677528263c3e88cd9581a14


[ROCm/clr commit: 102c19adf3]
2021-11-29 10:08:06 -05:00
German Andryeyev b0b0c3049f SWDEV-313126 - Use data() method for the base array address
Reference for the first element can trigger an assert with
_GLIBCXX_ASSERTIONS build

Change-Id: I59c63c052831307edfe5dcc6384798a43e9596dd


[ROCm/clr commit: 6f2e7c3199]
2021-11-26 09:51:57 -05:00
German Andryeyev c116411e00 SWDEV-294669 - Avoid queue drain
Use slot wait logic for direct dispatch

Change-Id: I431ba1418eb4aa066b9881934f4055b3d338ce3a


[ROCm/clr commit: 8e4101b4fd]
2021-11-18 13:06:12 -05:00
kjayapra-amd 2fdfb47092 SWDEV-309657 - Align Virtual queue size to sizeof(uint64_t).
Change-Id: Ia55d7316693bd13938875ce53f7849d5eb658e8c


[ROCm/clr commit: 7e32d6d909]
2021-11-12 10:35:36 -05:00
German Andryeyev 7821cddb3e SWDEV-257789 - Initial change to skip kernel arg copy
The optimization is controlled with ROCR_SKIP_KERNEL_ARG_COPY.
This is initial check-in for experiments. Extra changes are
necessary for full support:
- handle graph capture with the original sysmem alloc
- avoid memobject references, otherwise there is a race condition with
reusage of the arg buffer
- Remove arg setup from hip

Change-Id: Ib0af710f93e79834711fa4049a7c66093711e68b


[ROCm/clr commit: 7e12cf6318]
2021-10-28 20:35:35 -04:00
German Andryeyev d8201bc1ce SWDEV-303567 - Add chunks for the pool of kernel arguments
The kernel arg pool will be divided into 8 chunks to avoid long stalls,
when the pool will be reused.

Change-Id: I228e6ca1c09e428c1775f1e5b685220a9a5d71af


[ROCm/clr commit: f78b3a8919]
2021-10-26 16:31:37 -04:00
Sarbojit Sarkar 548bcfb23b SWDEV-306302 - Fix for OCLCreateImage test failure
Change-Id: I781504bd1ff599ed75c5ea730be03b71f69761b2


[ROCm/clr commit: c06c9f7b93]
2021-10-07 19:52:58 +00:00
German Andryeyev 51f7944fcb SWDEV-303567 - Increase the size of AQL queue
ROC_AQL_QUEUE_SIZE will control the size of AQL queue.
The current sefault value is 4096.

Change-Id: Icd2a4ee3ba554c06aa05b08defd922d2c63e43fd


[ROCm/clr commit: 7fe696b6ef]
2021-10-06 08:27:36 -04:00
Sarbojit Sarkar c053c7d17c SWDEV-301823 - Optimize hipMemset2D/3D
Change-Id: Ibe560149a263c2ac6b08e4eb1a1d331d2aeac78c


[ROCm/clr commit: 22a847f3ce]
2021-09-27 14:10:06 -04:00
Sourabh 936e0836a8 SWDEV-292525 - [vdi] Path to streamOps shaders
Implementation to use a blit kernel to perform
a hipStreamWait/write instead of an AQL packet.

Change-Id: I462671ed5cec37144dfe97ff66439249196117c1


[ROCm/clr commit: cbb8d82bdb]
2021-09-27 13:59:35 -04:00
German Andryeyev 28c4d9c0df SWDEV-294669 - Keep one more slot for HW processing
The original logic left only one slot for HW processing in the queue.
For some reason there is a race condition on CPU overwrite of the slot
before the current active. The workaround is to avoid the previous to
the current active slot for possible unfinished HW processing.

Change-Id: I565495a8feeaedffc9fc8a505edbee5ff5816975


[ROCm/clr commit: 65ddfcc6a8]
2021-09-13 13:56:05 -04:00
Jason Tang e94aec09bd SWDEV-1 - Some 'delete' clean up
Change-Id: I02564f0f0e349375bde1471e9f82df268703367b


[ROCm/clr commit: 73967c3b17]
2021-09-09 12:12:40 -04:00
Sarbojit Sarkar 45953e81dd SWDEV-300655 - Added thread ID to hip trace
Change-Id: I9234d4ec93e7687cd0a5d1bd930bd4f80936311b


[ROCm/clr commit: 42d33029dc]
2021-09-06 00:22:42 -04:00
Satyanvesh Dittakavi c4bba2456b SWDEV-298985 - hipMemPrefetchAsync should prefetch the data to the specified destination device
Pass the device agent specified by the user to the ROCr api instead of passing the device agent attached to the specified stream

Change-Id: I86c98935b9dc404eaa6d47ccdd082a8c3678fb36


[ROCm/clr commit: 169cc857fd]
2021-08-27 05:12:07 -04:00
Saleel Kudchadker 96f2bdd6ce SWDEV-297448 - Improve logging
Print non pointer kernel args
Change-Id: Ice0dbc894aae1430ac085df319f4b91dfa21665a


[ROCm/clr commit: 75fea4dca6]
2021-08-25 15:46:06 -07:00