rocm-systems

Автор	SHA1	Сообщение	Дата
German	5ed568998f	SWDEV-349794 - Fix time accumulation If the execution command had a split into multiple HW operations, then runtime has to accumulate time for all operations Change-Id: Iaba31e96250918d8190bf63adb4c07730fdfefbf [ROCm/clr commit: `24f5362296`]	2022-08-24 09:53:54 -04:00
Maneesh Gupta	92f6e1a0d2	SWDEV-350289 - Fix build warnings due to file re-org Change-Id: I0066fa163b9f25fdde4c5b3baed1ef0654390c06 [ROCm/clr commit: `289062682a`]	2022-08-10 03:05:56 -04:00
Sarbojit Sarkar	a0981a092b	SWDEV-343921 - added Max stack size Change-Id: I5c1a088e05215ca951afc9d92f8d298c5e3a65f1 [ROCm/clr commit: `27a08a132f`]	2022-08-02 07:13:18 -04:00
German Andryeyev	110e3e68a0	SWDEV-340703 - Use different status value for the callback event Change-Id: Ida725df53abfbf348b18e24c19edf011dc9192dd [ROCm/clr commit: `6844b8c7e0`]	2022-06-30 11:03:02 -04:00
Saleel Kudchadker	b3ad41f6e4	SWDEV-335780 - Indicate if handler is queued Maintain status of handler callback. For event records we no longer submit callbacks to reduce the load on the async handler thread. However without a callback we leak command memory/decrement refcounts. Indicate status of the handler which we can use to queue a callback when finish is called. Change-Id: I89fd02f3d047a0e8162664ee17581a14795f1928 [ROCm/clr commit: `5df34a2f7a`]	2022-06-14 20:55:06 -04:00
German Andryeyev	acf2856677	SWDEV-339296 - Delay hidden heap allocation till the usage Move hidden heap creation to the kernel launch to make sure it's allocated on the actual first usage. Change-Id: I1b65a82fc06d9129ed45a69765bf14ea3d945b04 [ROCm/clr commit: `4975f69337`]	2022-06-14 12:18:34 -04:00
Sarbojit Sarkar	ee5bcf6444	SWDEV-331066 - support for LimitStackSize Change-Id: Ie6ae74f008b4f72de83663194aafb0ebdddfc8b6 [ROCm/clr commit: `51a00aeefe`]	2022-05-19 00:24:06 -04:00
kjayapra-amd	ae0b32126b	SWDEV-331355 - Fixing the surface object on fillMemory function call. Change-Id: Ieaa359ea8f31b0251d54b720469cdefde202579f [ROCm/clr commit: `643ee46f28`]	2022-05-04 14:24:03 -04:00
Saleel Kudchadker	d9c2aee526	SWDEV-334152 - Set release as systemscope Set release scope as system for dispatch AQL when events are passed to hipLaunchKernelGGL Change-Id: I93b91591e0ab023f1ecc5247f7905eca26147358 [ROCm/clr commit: `02566677cf`]	2022-04-29 13:19:29 -04:00
German Andryeyev	d5bc650de9	SWDEV-307184 - Fix a regression from `dafc64ea` Disable hostcall buffer in OCL for now. COv5 can add hostcallbuffer metadata for unknown reason. OCL may fail the buffer allocation and kernel launch. Change-Id: I34a6a45bac86c57422b764c0d69760c96920d6c5 [ROCm/clr commit: `934149ff0a`]	2022-04-28 11:57:48 -04:00
Ajay	9fcc7a7219	SWDEV-332522 - streamOpsWrite & streamOpsWait to accept memory offset Change-Id: I4b6ecb4d80c093d038d86616a637c4bb465ae24e [ROCm/clr commit: `d2f837d25f`]	2022-04-25 14:59:36 -04:00
Jason Tang	7bdbf61a9d	SWDEV-324411 - Use blit kernel for copyBufferRect if atomic is not supported Change-Id: I2e110fd3418117ee9c7ede379244d2c6c4f248b7 [ROCm/clr commit: `ed7737564e`]	2022-04-24 11:41:16 -04:00
sdashmiz	dafc64ea0a	SWDEV-204804 - Detecing pcie atomic support - check pcie atomci support for printf functionality - if not enabled printf wont work Signed-off-by: sdashmiz <shadi.dashmiz@amd.com> Change-Id: Ib366e8e71772b02210c4a830bca4bd8cc7a11664 [ROCm/clr commit: `15f1632dfa`]	2022-04-22 08:53:16 -04:00
Julia Jiang	1320312a62	SWDEV-330164 - Fix in conformance svm_enqueue_api crash Change-Id: I12eca6ca3e8d722b7534047fca79b289604aa2b0 [ROCm/clr commit: `b1611e0123`]	2022-04-20 13:20:18 -04:00
Saleel Kudchadker	b306843e26	SWDEV-332512 - Signal pool changes Create a new signal if the next set of signals are busy Change-Id: I5108e68c88fe41e3a45bad4495ebdf3742e76dcd [ROCm/clr commit: `9ec8a7306d`]	2022-04-18 15:58:38 -04:00
Saleel Kudchadker	cad3dfe4ec	SWDEV-301667 - Separate scope from marker_ts_ Change-Id: I19f4d394e898bfb8c9d9a2c2edf9d5bf5def3b08 [ROCm/clr commit: `b6cbfaf499`]	2022-04-16 19:26:31 -04:00
German Andryeyev	4b4137ae63	SWDEV-332512 - Add ROC_SIGNAL_POOL_SIZE Default value is 32 HSA signals in the pool. Change-Id: Icb69413d3ff6ef228d9a9e22fd024e72c6d8ebe4 [ROCm/clr commit: `7975a07112`]	2022-04-14 17:32:00 -04:00
Saleel Kudchadker	3d0100c5ab	SWDEV-301667 - Add cache state for a device - Add a global cache state for a device to indicate scopes of submitted AQL packets - Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records. It would flush caches by default with system scope release. - Calling finish() should ensure if caches are flushed, if not queue a marker Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0 [ROCm/clr commit: `8eeaa998c0`]	2022-04-12 12:27:31 -04:00
Maxime Chambonnet	38928e85c1	SWDEV-1 - ROC CLR typos This is cherry-picked from this github issue: https://github.com/ROCm-Developer-Tools/ROCclr/issues/28 Change-Id: I236f4f25a2dabe05883159af0fab0bad06ab0fd0 [ROCm/clr commit: `d45794e985`]	2022-04-11 14:24:39 -04:00
German Andryeyev	4715a87d44	SWDEV-307184 - Report 1 for unused dimensions Remove assert for kernel arg size, because COv5 reports a value bigger than the actual usage in the most of cases Change-Id: I8e15bc45a9e21b58a5894f9977511ca84408ce61 [ROCm/clr commit: `2be0b1e612`]	2022-04-08 13:43:37 -04:00
kjayapra-amd	ba0119e933	SWDEV-331104 - Size passed to fillBuffer should not be 0. Change-Id: Ifbc6047fafa0e55b5ab956cf3b7254c7e20b1e88 [ROCm/clr commit: `b3b88ef926`]	2022-04-08 09:29:55 -04:00
German Andryeyev	e09245ceae	SWDEV-307184 - Move local size calculation With COv5 local size calculation must occur before runtime programs kernel arguments Change-Id: I0726c6529bde69b8fcf5360aa83986cf84e04168 [ROCm/clr commit: `caa6110c29`]	2022-04-05 11:19:51 -04:00
kjayapra-amd	2ab9ef0915	SWDEV-325776 - Adding device release scope for kernel dispatch packet Change-Id: I8ea763f4c0239c410143b748c05822e9f6694412 (cherry picked from commit ec4894f8a27a3330b895a0ded385ab96f5ef242d) [ROCm/clr commit: `378a427d8c`]	2022-04-01 08:17:29 -04:00
kjayapra-amd	31c0525344	SWDEV-305527 - Changes to handle memset blit kernel that takes width, height and depth. This also fixes SWDEV-317261. Change-Id: Ic85f63a95d9d8f48884fc8c7fd95cbb496dfbbca [ROCm/clr commit: `7fb80a027a`]	2022-03-31 09:02:33 -04:00
Saleel Kudchadker	f99304adcd	SWDEV-322225 - Use numa_allocate_bitmask - Fix a crash with AMD_CPU_AFFINITY=1 as numa_bitmask_alloc isnt the right api to allocate bitmask - Do not set affinity for ROCr thread. It worsens performance rather than any improvement. - Fix regression from my previous change for event handler. Change-Id: I3ea75adc2a6333f29752283eddd5b555e9b58cc5 [ROCm/clr commit: `802c2c8a9f`]	2022-03-26 13:24:51 -04:00
Saleel Kudchadker	4dbec887a2	SWDEV-301667 - Selectively queue handler - Queue handler for hipEventRecord(aka marker_ts_) only if there is a callback associated with it. Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a [ROCm/clr commit: `3c3c0ca4c5`]	2022-03-24 19:46:28 -04:00
German Andryeyev	7d5ed33e8f	SWDEV-307185 - Create heap for device memory allocator Pass the allocated heap with the kernel arguments Change-Id: Icdec09b7f937845c39e21cbca7071dc3ba791af9 [ROCm/clr commit: `7b114a2b8b`]	2022-03-04 00:44:41 -05:00
German Andryeyev	c52280ae72	SWDEV-323702 - Use active queue for transfer Pass active queue for transfers in the cache coherency layer. That will allow to use device transfer queue only for cases when active queue isn't available, because using device transfer queue from another active queue may cause a deadlock Change-Id: Ifbe7e0303b77dbf6eeda3939ffbc25a3df7472de [ROCm/clr commit: `95d55fdfa8`]	2022-02-18 09:10:53 -05:00
German Andryeyev	7c8a7ddf5e	SWDEV-323364 - Fix a typo Change-Id: I2031296ab9451342d5930b8b2d3d2e6277946647 [ROCm/clr commit: `fbf531398a`]	2022-02-17 20:50:29 -05:00
Saleel Kudchadker	de9a5438b9	SWDEV-322605 - Fix infinite loop condition If GlobalMemCacheLine reported is 0, runtime may run into an infinite loop as the KernelSegmentAlignment is chosen as size of the cache line. Change-Id: Ide547940cc0407f16fab10ee210b4fd3ae4eaafc [ROCm/clr commit: `041ddc0c1c`]	2022-02-16 13:16:18 -05:00
German Andryeyev	bd96ef9a34	SWDEV-307184 - Add support for the new metadata Metadata in Codeobject version 5 is the extension of CO3 and CO4. Add the detection of the new fields and program them in the setup of the kernel arguments. Change-Id: I27e58df77320ad00f4f16d35912668db803826af [ROCm/clr commit: `be6a06384e`]	2022-02-07 14:05:58 -05:00
Satyanvesh Dittakavi	85c2cac111	SWDEV-306939 - Fix vdi errors/warnings by CppCheck Change-Id: I56d910f8363787f1050d5d7e8064ed553c5827fd [ROCm/clr commit: `e20dd61932`]	2022-01-12 00:22:16 -05:00
Saleel Kudchadker	500f6a6513	SWDEV-313306 - Fix Co-operative groups dtests Add a state indicator to retain ExternalSignals when needed. Co-operative group launch uses external signals to indicate a dependency to the next command. Change-Id: I6d0daa006e2377c3bbf4aeca0fd5b63c7ac8fbbb [ROCm/clr commit: `1fbd75b825`]	2021-12-17 12:41:37 -08:00
Saleel Kudchadker	42625f0527	SWDEV-313306 - Clear external signals Crash was due to the fact that external signal structure was stale even after destroyign the command. That is because we skipped wait due to a missing check. Detect external signals and dispatch a barrier in ReleaseGpuMemoryFence. Also clear external_signals_ at ProfilingBegin. Change-Id: I991387edcfe928b511bf5e780988ee131321ed5a [ROCm/clr commit: `3239222516`]	2021-12-13 23:03:33 -08:00
German Andryeyev	5ad02b78c4	SWDEV-305016 - Improve MGPU scaling in Tensorflow Add a threshold for ROCR/SDMA P2P transfers. ROCR copy path requires extra barriers in compute for synchronization. That costs extra performance with tiny transfers. Reduce active wait time to 10us. Tensorflow uses extra thread per GPU with constant hipEventQuery() calls. Longer active waits in ROCr affect CPU performance. Change-Id: I9020358438615fa2d4617f862f00a562f0a588e7 [ROCm/clr commit: `008133cf41`]	2021-12-08 11:59:37 -05:00
German Andryeyev	861b9fb84c	SWDEV-294669 - Avoid stall when the new signal was created Stall in the host thread could occur earlier than the app expects. Make sure rutnime can grow the signals to the queue size without any stall. Also adding a new signal to the end of the pool could break the dependency chain on signal reuse. The new logic will insert the new signal after current to keep the chain intact. Change-Id: I9c90b98515907db8b677528263c3e88cd9581a14 [ROCm/clr commit: `102c19adf3`]	2021-11-29 10:08:06 -05:00
German Andryeyev	b0b0c3049f	SWDEV-313126 - Use data() method for the base array address Reference for the first element can trigger an assert with _GLIBCXX_ASSERTIONS build Change-Id: I59c63c052831307edfe5dcc6384798a43e9596dd [ROCm/clr commit: `6f2e7c3199`]	2021-11-26 09:51:57 -05:00
German Andryeyev	c116411e00	SWDEV-294669 - Avoid queue drain Use slot wait logic for direct dispatch Change-Id: I431ba1418eb4aa066b9881934f4055b3d338ce3a [ROCm/clr commit: `8e4101b4fd`]	2021-11-18 13:06:12 -05:00
kjayapra-amd	2fdfb47092	SWDEV-309657 - Align Virtual queue size to sizeof(uint64_t). Change-Id: Ia55d7316693bd13938875ce53f7849d5eb658e8c [ROCm/clr commit: `7e32d6d909`]	2021-11-12 10:35:36 -05:00
German Andryeyev	7821cddb3e	SWDEV-257789 - Initial change to skip kernel arg copy The optimization is controlled with ROCR_SKIP_KERNEL_ARG_COPY. This is initial check-in for experiments. Extra changes are necessary for full support: - handle graph capture with the original sysmem alloc - avoid memobject references, otherwise there is a race condition with reusage of the arg buffer - Remove arg setup from hip Change-Id: Ib0af710f93e79834711fa4049a7c66093711e68b [ROCm/clr commit: `7e12cf6318`]	2021-10-28 20:35:35 -04:00
German Andryeyev	d8201bc1ce	SWDEV-303567 - Add chunks for the pool of kernel arguments The kernel arg pool will be divided into 8 chunks to avoid long stalls, when the pool will be reused. Change-Id: I228e6ca1c09e428c1775f1e5b685220a9a5d71af [ROCm/clr commit: `f78b3a8919`]	2021-10-26 16:31:37 -04:00
Sarbojit Sarkar	548bcfb23b	SWDEV-306302 - Fix for OCLCreateImage test failure Change-Id: I781504bd1ff599ed75c5ea730be03b71f69761b2 [ROCm/clr commit: `c06c9f7b93`]	2021-10-07 19:52:58 +00:00
German Andryeyev	51f7944fcb	SWDEV-303567 - Increase the size of AQL queue ROC_AQL_QUEUE_SIZE will control the size of AQL queue. The current sefault value is 4096. Change-Id: Icd2a4ee3ba554c06aa05b08defd922d2c63e43fd [ROCm/clr commit: `7fe696b6ef`]	2021-10-06 08:27:36 -04:00
Sarbojit Sarkar	c053c7d17c	SWDEV-301823 - Optimize hipMemset2D/3D Change-Id: Ibe560149a263c2ac6b08e4eb1a1d331d2aeac78c [ROCm/clr commit: `22a847f3ce`]	2021-09-27 14:10:06 -04:00
Sourabh	936e0836a8	SWDEV-292525 - [vdi] Path to streamOps shaders Implementation to use a blit kernel to perform a hipStreamWait/write instead of an AQL packet. Change-Id: I462671ed5cec37144dfe97ff66439249196117c1 [ROCm/clr commit: `cbb8d82bdb`]	2021-09-27 13:59:35 -04:00
German Andryeyev	28c4d9c0df	SWDEV-294669 - Keep one more slot for HW processing The original logic left only one slot for HW processing in the queue. For some reason there is a race condition on CPU overwrite of the slot before the current active. The workaround is to avoid the previous to the current active slot for possible unfinished HW processing. Change-Id: I565495a8feeaedffc9fc8a505edbee5ff5816975 [ROCm/clr commit: `65ddfcc6a8`]	2021-09-13 13:56:05 -04:00
Jason Tang	e94aec09bd	SWDEV-1 - Some 'delete' clean up Change-Id: I02564f0f0e349375bde1471e9f82df268703367b [ROCm/clr commit: `73967c3b17`]	2021-09-09 12:12:40 -04:00
Sarbojit Sarkar	45953e81dd	SWDEV-300655 - Added thread ID to hip trace Change-Id: I9234d4ec93e7687cd0a5d1bd930bd4f80936311b [ROCm/clr commit: `42d33029dc`]	2021-09-06 00:22:42 -04:00
Satyanvesh Dittakavi	c4bba2456b	SWDEV-298985 - hipMemPrefetchAsync should prefetch the data to the specified destination device Pass the device agent specified by the user to the ROCr api instead of passing the device agent attached to the specified stream Change-Id: I86c98935b9dc404eaa6d47ccdd082a8c3678fb36 [ROCm/clr commit: `169cc857fd`]	2021-08-27 05:12:07 -04:00
Saleel Kudchadker	96f2bdd6ce	SWDEV-297448 - Improve logging Print non pointer kernel args Change-Id: Ice0dbc894aae1430ac085df319f4b91dfa21665a [ROCm/clr commit: `75fea4dca6`]	2021-08-25 15:46:06 -07:00

1 2 3

135 Коммитов