rocm-systems

Autor	SHA1	Wiadomość	Data
kjayapra-amd	f613be3d81	SWDEV-413997 - Cleanup fixes for Virtual Memory Management. Change-Id: I9a4a4d9087b5daf15e3ba31e786d34db431212a1 [ROCm/clr commit: `74ffc5f0d5`]	2024-04-22 10:58:06 -04:00
German Andryeyev	a2ffb2ad40	SWDEV-440746 - Release last command on terminate Change-Id: Ib6a9b8fc9a8692eb17b39b854cefd92c6b59733f [ROCm/clr commit: `0ccdb3e160`]	2024-04-22 09:57:38 -04:00
German Andryeyev	74d80fb509	SWDEV-440746 - Remove obsolete code The "optimized" version of memcpy is outdated and was used in win32 only. Change-Id: I7f2e0e9051e37cec95438266824b5b0025c324c6 [ROCm/clr commit: `7448113cfc`]	2024-04-22 09:56:42 -04:00
kjayapra-amd	2335c92a1a	SWDEV-455041 - Continue processing fat binary even if other code object bundle processing fail. Change-Id: Iea553ab0265c08341f915644075ce2b6ed9b3200 [ROCm/clr commit: `863c56262e`]	2024-04-20 14:25:49 -04:00
Rakesh Roy	cb1f02f4f7	SWDEV-453180 - Add UUID support for HIP_VISIBLE_DEVICES on Windows - UUID needs to be specified in the format GPU-<body>, <body> encodes UUID as a 16 chars - Convert set UUID in HIP_VISIBLE_DEVICES to device index internally - Then use existing device index logic for HIP_VISIBLE_DEVICES Change-Id: I654f492a49cd4d7a9b7339360ab558165240caa5 [ROCm/clr commit: `fb217fa9e0`]	2024-04-20 02:39:19 -04:00
German Andryeyev	02d949ec38	SWDEV-440746 - Wait for signal before release Change-Id: I9e2aefdbcbba153c7f1080d80aab7a345eaf1eb4 [ROCm/clr commit: `329ba271fa`]	2024-04-19 18:33:28 -04:00
German Andryeyev	00fb29aab6	SWDEV-353281 - Reuse timestamp on memory reuse Mempool may reuse memory without a wait. Hence, the timestamp has to be preserved and can't be destroyed. Change-Id: I6f095f44afa69887a4b7aeb3b329804aedd96f3e [ROCm/clr commit: `ffb516db3e`]	2024-04-19 18:00:29 -04:00
German Andryeyev	7eaba0bd33	SWDEV-440746 - Don't set CL_SUBMITTED twice Change-Id: I9ba34454f7487d6bc0d398b322a147cbac6c6443 [ROCm/clr commit: `fd81490bb8`]	2024-04-19 17:36:51 -04:00
Satyanvesh Dittakavi	2e58154d80	SWDEV-446610 - Attribute HIP_POINTER_ATTRIBUTE_SYNC_MEMOPS should return the correct value Change-Id: Ieced2ee61bba28f2d1df96893a661287b0a5c7b7 [ROCm/clr commit: `8f7acbdadb`]	2024-04-19 14:40:09 -04:00
Ioannis Assiouras	2f430138c5	SWDEV-451594 - Implement Readback and Avoid HDP Flush workaround for device kernel args Change-Id: I6d41a089a17f55306e7ff402588a1e831b20a7a7 [ROCm/clr commit: `bf74ef4025`]	2024-04-19 09:29:20 -04:00
Anusha GodavarthySurya	c7022ad6ce	SWDEV-455869 - Revert "SWDEV-410751 - Consider null amd::memory is invalid." This reverts commit `60a3077ed4`. Change-Id: I26c4b3c74b2861afc17f979492d025b59d4388ab [ROCm/clr commit: `e829ef68e4`]	2024-04-19 00:54:26 -04:00
kjayapra-amd	71aa6ff3a0	SWDEV-413997 - VMM IPC implementation for Linux. Change-Id: Icfeb83ca51e96be35abb67a94d6e3e1a1ca5a934 [ROCm/clr commit: `56ebf5157a`]	2024-04-18 11:28:13 -04:00
Anusha GodavarthySurya	b77908a819	SWDEV-450053 - Handle MemcpyNodeSetParamsTo/FromSymbol negative parameters For all windows allocation on SVM memory tagged with flag ROCCLR_MEM_INTERPROCESS. hipHostMalloc validation is based on flag. So remove ROCCLR_MEM_INTERPROCESS before check. Change-Id: I823bbf228d9a4a9acb4abffc01ac6b3f544c6e12 [ROCm/clr commit: `8179fa98a2`]	2024-04-18 05:39:35 -04:00
Jaydeep Patel	3dae883e73	SWDEV-453535 - Capture hipMemset3DAsync. Change-Id: I517c2557573db258b3e3e353f02f6a56652b0fde [ROCm/clr commit: `12e0bdcd32`]	2024-04-18 00:05:45 -04:00
Jaydeep Patel	24bb38acb8	SWDEV-455346 - End wait if HostcallListener terminates. Change-Id: I21ec8eadb189147c579ec65acf68de40d604686b [ROCm/clr commit: `8942939fac`]	2024-04-18 00:04:00 -04:00
German Andryeyev	562f3ef098	SWDEV-440746 - Fix the hostcall buffer creation Avoid a deadlock on the host call buffer creation. Since the buffer will be allocated in the queue thread, then use direct device memory allocation skipping the global context lock. Change-Id: I09b55ee03bb42ab5d320c152b52a8c842c5fdcc1 [ROCm/clr commit: `62559a6e5a`]	2024-04-17 12:37:23 -04:00
sdashmiz	d2b2b2c5fa	SWDEV-441603 - Correct dst device Signed-off-by: sdashmiz <shadi.dashmiz@amd.com> Change-Id: Ie60aa598dd73df66cdf02c1d96daf2dfccba7a59 [ROCm/clr commit: `d511e57257`]	2024-04-17 09:21:06 -04:00
Jatin Chaudhary	5a7b0c50ad	SWDEV-379007 - fix bool check for fp8_fnuz for fnuz nums zero val is 0x00, -0 i.e. 0x80 would be a NaN Change-Id: Ibdc4fb4b9fb307b5952434f08d45a8ddd6262db8 [ROCm/clr commit: `d7b0d78fad`]	2024-04-17 05:31:21 -04:00
Sameer Sahasrabuddhe	c44b565943	SWDEV-454959 - ignore the upper half of the mask in wave32 mode Change-Id: If027dd8cbe5cbe142fff353cb72c16f08e9aea8e [ROCm/clr commit: `03562a2547`]	2024-04-17 10:12:57 +05:30
pghafari	e3266682d0	SWDEV-455699 - removing HW_REG_TRAPSTS for gfx12 Change-Id: I5f8b030eefdb37d3d51da3e135e5aa0f18ad9018 [ROCm/clr commit: `5ddca5854c`]	2024-04-16 19:46:21 -04:00
Jatin Chaudhary	80f02a1534	SWDEV-379007 - use avx instruction for bf16 cvt AMD CPUs have had avx512_bf16 support for quite some time now (from consumer Ryzen 7000 series to enterprise grade CPUs). This patch should allow users to use the hardware bf16 unit when running the __host__ variants of the function. This can be enabled via `hipcc ... -mavx512vl -mavx512bf16`. Change-Id: I67c377afc95ddfe8d45a048dce078a247d4a1878 [ROCm/clr commit: `49349f168c`]	2024-04-16 18:35:08 -04:00
German Andryeyev	7f195e2996	SWDEV-444670 - Enable teardown class Force implicit runtime teardown with a global destructor. Change-Id: Iabe63dedf5b94fefc98668585c45a61607120669 [ROCm/clr commit: `c95a75a2bf`]	2024-04-16 12:00:06 -04:00
kjayapra-amd	854cbe1d24	SWDEV-422580 - Adding back the pcie.function to PCI address string in hipGetDevicePCIBusId. Change-Id: I932724cc872d7ae2643ce6ac2924901cb49cd7ad [ROCm/clr commit: `a1e0970d6d`]	2024-04-16 07:28:48 -04:00
Jatin Chaudhary	5e42260071	SWDEV-379007 initial implementation of fp8 header Change-Id: Id9a5a85641882961e4d860a815217c641e6f3387 [ROCm/clr commit: `ca07f59fb1`]	2024-04-16 05:37:59 -04:00
Sourabh Betigeri	26b7b7b951	SWDEV-453577 - Fixes to account for right CU count based on WGP or CU mode Change-Id: Ib9739f9917bc6ff69cc76f444d909311922ebc1e [ROCm/clr commit: `fcfe2ec88b`]	2024-04-15 11:53:43 -04:00
kjayapra-amd	867769b0ec	SWDEV-413997 - Fixing alignment validation check for power of 2 instead of granularity factor. Change-Id: I1e0db6e0628c09d26850e5a0339e2a4660442db8 [ROCm/clr commit: `00ddc3e284`]	2024-04-15 09:45:29 -04:00
kjayapra-amd	79b3fc1fb8	SWDEV-413997 - Read Access can be valid now that ROCr takes care of access. Change-Id: Iecda74ca0207c95d3fbed8b4e15c8c4c5895d939 [ROCm/clr commit: `815e450cfd`]	2024-04-15 06:00:14 -04:00
Rakesh Roy	f7dc86bdf4	SWDEV-453180 - Add UUID support for HIP_VISIBLE_DEVICES on Linux - UUID is Ascii string with a maximum of 21 chars which uniquely identifies a GPU - Convert set UUID in HIP_VISIBLE_DEVICES to device index internally - Then use existing device index logic for HIP_VISIBLE_DEVICES Change-Id: I8cab4fe42459f8209b97f909300789e6e687b9ac [ROCm/clr commit: `52db98edd9`]	2024-04-13 22:07:19 -04:00
kjayapra-amd	765e6f5d2f	SWDEV-413997 - Fixing multiple device cases. Change-Id: I10ad3fbfca887e92cd81f68392fa1acf753cbd2b [ROCm/clr commit: `d52d16c8e6`]	2024-04-13 06:14:03 -04:00
German Andryeyev	27dec1c62d	SWDEV-455254 - Reduce blit kernels signature Remove offset from blit kernels, since it can be applied in setup. Change-Id: I06b585068d68a0ee8e125ddf46a36fccb372f30d [ROCm/clr commit: `7de7da4016`]	2024-04-12 14:45:55 -04:00
taosang2	cbbf49f5e7	SWDEV-424956 - Fix half vector printf issue Refactor PrintfDbg::outputArgument() to remove potential risk. Fix half vector printf issue on all devices. Fix FEAT-56794 as well. Change-Id: Iae39359d2128588def2e43d77fe58e868b8e71ff [ROCm/clr commit: `35c80dd482`]	2024-04-12 14:25:44 -04:00
Jaydeep Patel	b855d3f1ed	SWDEV-436754 - Use glbctx instead so that ref count increments for multi devices and chunk decommit gets delayed. Change-Id: Ia4b0d5fbfa8f198776e52d14de8b22c6942f740d [ROCm/clr commit: `d52168b46d`]	2024-04-12 00:04:34 -04:00
German Andryeyev	f29d608ca3	SWDEV-455254 - Add kernel arg optimization Add kernel arguments optimization into blit path. Enabled by default on MI300. Change-Id: I2694a81b90d48ad07d86dfe4c0c64fe187bada8e [ROCm/clr commit: `f0c7ecf617`]	2024-04-10 18:08:37 -04:00
kjayapra-amd	ed9c629ad6	SWDEV-446298 - Adding error code to the logs on p2p hsa api failure. Change-Id: Ic41b1ad1b64cca0e31986337a83a5146d52a7328 [ROCm/clr commit: `2b8634bada`]	2024-04-10 06:00:00 -04:00
Jatin Chaudhary	565f915ebe	SWDEV-379007 - add __hip_bfloat16_raw types This also brings bfloat16 implementation closer to CUDA's. Change-Id: I23f381141faacd6537923ae9b88ada4d661db496 [ROCm/clr commit: `481912a1fd`]	2024-04-09 05:32:13 -04:00
Saleel Kudchadker	4285981222	SWDEV-301667 - Fix SDMA mask reuse If we are using the mask returned by getLastUsedSdmaEngine() then we need to apply the SDMA Read/Write mask to it before using with HSA copy_on_engine API. Change-Id: I6e5dc6c187eeb3c61ee159e9d2a0fa7b4737c06e [ROCm/clr commit: `3f0bcf7834`]	2024-04-08 15:42:52 -04:00
Sourabh Betigeri	7cc5dd56ab	SWDEV-451964 - Limit gpu single allocation percentage for gfx940 only Change-Id: Iadcdadd734e7aeeb23742e426353defa972d3ad5 [ROCm/clr commit: `dbac2976e4`]	2024-04-05 09:43:42 -04:00
Ioannis Assiouras	78008c05c5	SWDEV-453301 - Remove the option to write multiple packets in dispatchGenericAqlPacket Dispatching multiple packets with ring the doorbell once is not supported by the lower layers Change-Id: I7665a2dcdd4ef9e47dadfe410180fed64c5a4ee0 [ROCm/clr commit: `d7f352dbed`]	2024-04-05 05:28:10 -04:00
Rakesh Roy	9630492a1d	SWDEV-450361 - Add nullptr validation for waitStream - Application is passing null for parameter stream in API hipStreamWaitEvent - When event stream isn't capturing and event is not recorded, causes segfault because we are accessing deviceId() from waitStream Change-Id: I8b87ffd6f234677f68b66dcb7ef44b2ff04a7c91 [ROCm/clr commit: `880f1f0049`]	2024-04-04 02:07:18 -04:00
cadolphe	e89ea8577d	SWDEV-446726 - Disable large bar for 32 bit windows When large bar is enabled, persistent memory leads to overallocation for 32 bit architecture. Change-Id: Iae39359d8128588de02e42d77fe58e868b8e71fd [ROCm/clr commit: `bc80802c1a`]	2024-04-03 15:36:41 -04:00
cadolphe	02330a5324	SWDEV-443537 - fix make build warning message Add cltrace compile definition for CL_TARGET_OPENCL_VERSION to OpenCL 2.2 Change-Id: Ie868ab0a6e86951afc6d07da58be942c3b736d15 [ROCm/clr commit: `f7b1398361`]	2024-04-02 16:42:01 -04:00
cadolphe	8f185a24d9	SWDEV-451687 - Fix alloc message values in AMD_LOG_LEVEL for 32 bit Change-Id: Icbe67024297c92bf59139b6a2ccd2ba3674f60b1 [ROCm/clr commit: `411960a131`]	2024-04-01 13:32:20 -04:00
Anusha GodavarthySurya	ff29b47bb8	SWDEV-452787 - correct hipDrvGraphAddMemcpyNode check Change-Id: Id58f982edd4f17d675f7a0f61a9b4dea0baebd9b [ROCm/clr commit: `ea4f09e8c0`]	2024-03-29 00:56:12 -04:00
Anusha GodavarthySurya	ae296c8fad	SWDEV-443567 - SWDEV-436126 - Fix Prohibited and Unhandled Operations during capture => hipDeviceSynchronize is not allowed during capture. => hipEventSynchronize during capture should return hipErrorCapturedEvent error => hipEventQuery during capture should return hipErrorCapturedEvent error hipStreamSynchronize, hipEventSynchronize, hipStreamWaitEvent, hipStreamQuery For Side Stream(Stream that is not currently under capture): => If current thread is capturing in relaxed mode, calls are allowed => If any stream in current/concurrent thread is capturing in global mode, calls are not allowed => If any stream in current thread is capturing in ThreadLocal mode, calls are not allowed For Stream that is currently under capture => calls are not allowed => Any call that is not allowed during capture invalidates the capture sequence => It is invalid to call synchronous APIs during capture. Synchronous APIs, such as hipMemcpy(), enqueue work to the legacy stream and synchronize it before returning. Change-Id: I201c6e63e1a5d93fd416a3b520264c0fdbe31237 [ROCm/clr commit: `19b4660cbb`]	2024-03-28 22:10:31 -04:00
German Andryeyev	4f123e9c10	SWDEV-449922 - Remove a global lock for queues on wait The new logic has a lock for PAL call and doesn't require the lock for queues. Change-Id: I61b67c3c4abd2ede44809de1d6beed756766032e [ROCm/clr commit: `2f3ad43c4a`]	2024-03-28 18:17:46 -04:00
German Andryeyev	31ace540fa	SWDEV-449922 - Remove per queue residency update alwaysResident setting doesn't require per queue residency tracking. Thus, the logic can be skipped to avoid the lock of queues. Change-Id: Ib5cff5b79d3ecb8c2f2eb2565cf069f9a69438b0 [ROCm/clr commit: `95e3958748`]	2024-03-28 16:16:37 -04:00
Jaydeep Patel	21a5b16faa	SWDEV-450636 - Hostcall Listen thread cause seg fault due to main thread is being destroied by app/test's unload libamdhip64.so call. Change-Id: I8d4a8d8b6801d9f6eb745c45adf831597def0cb5 [ROCm/clr commit: `4761ecbb15`]	2024-03-27 00:10:55 -04:00
Saleel Kudchadker	f3aedfbec0	SWDEV-301667 - Create TS for each node recorded in graph - Create a vector to allow multiple TS to be stored in Command. - This would mean we dont wait for entire batch in Accumulate command to finish when we exhaust signals. - Reduce the number of signals created at init to 64. This min value may still need to be tuned but the KFD allows max of 4094 interrupt signals per device. - Store kernel names whenever they are available and not just when profiling. If we dynamically enable profiling like for Torch, a crash can happen if hipGraphInstantiate wasnt included in Torch profile scope beacuse we previously entered kernel names only when profiler is attached. Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006 [ROCm/clr commit: `c157bfb202`]	2024-03-26 14:47:24 -04:00
Rakesh Roy	150144a05f	SWDEV-445096 - Fix -O0 crash in OpenCL tests - With https://gerrit-git.amd.com/c/lightning/ec/llvm-project/+/1002628 applied, at -O0 Kernel::dynamicParallelism() returns true but virtual queue isn't created - This causes segfault inside VirtualGPU::submitKernelInternal() when getVQVirtualAddress() is called Change-Id: Ia7af042adad2329e870c142caaac3e8fa886f8b8 [ROCm/clr commit: `d1fff7cea2`]	2024-03-26 11:42:33 -04:00
Jaydeep Patel	c4239ed8e8	SWDEV-446992 - Request can be for bytes OR dptr OR both. Change-Id: Ib479c744b90125b74d99cbf18b7f4b8cf765bf1c [ROCm/clr commit: `09328f45b3`]	2024-03-26 11:10:24 -04:00

1 2 3 4 5 ...

12156 Commity