Граф коммитов

12149 Коммитов

Автор SHA1 Сообщение Дата
German Andryeyev 7eaba0bd33 SWDEV-440746 - Don't set CL_SUBMITTED twice
Change-Id: I9ba34454f7487d6bc0d398b322a147cbac6c6443


[ROCm/clr commit: fd81490bb8]
2024-04-19 17:36:51 -04:00
Satyanvesh Dittakavi 2e58154d80 SWDEV-446610 - Attribute HIP_POINTER_ATTRIBUTE_SYNC_MEMOPS should return the correct value
Change-Id: Ieced2ee61bba28f2d1df96893a661287b0a5c7b7


[ROCm/clr commit: 8f7acbdadb]
2024-04-19 14:40:09 -04:00
Ioannis Assiouras 2f430138c5 SWDEV-451594 - Implement Readback and Avoid HDP Flush workaround for device kernel args
Change-Id: I6d41a089a17f55306e7ff402588a1e831b20a7a7


[ROCm/clr commit: bf74ef4025]
2024-04-19 09:29:20 -04:00
Anusha GodavarthySurya c7022ad6ce SWDEV-455869 - Revert "SWDEV-410751 - Consider null amd::memory is invalid."
This reverts commit 60a3077ed4.

Change-Id: I26c4b3c74b2861afc17f979492d025b59d4388ab


[ROCm/clr commit: e829ef68e4]
2024-04-19 00:54:26 -04:00
kjayapra-amd 71aa6ff3a0 SWDEV-413997 - VMM IPC implementation for Linux.
Change-Id: Icfeb83ca51e96be35abb67a94d6e3e1a1ca5a934


[ROCm/clr commit: 56ebf5157a]
2024-04-18 11:28:13 -04:00
Anusha GodavarthySurya b77908a819 SWDEV-450053 - Handle MemcpyNodeSetParamsTo/FromSymbol negative parameters
For all windows allocation on SVM memory tagged with flag ROCCLR_MEM_INTERPROCESS.
hipHostMalloc validation is based on flag. So remove ROCCLR_MEM_INTERPROCESS before check.

Change-Id: I823bbf228d9a4a9acb4abffc01ac6b3f544c6e12


[ROCm/clr commit: 8179fa98a2]
2024-04-18 05:39:35 -04:00
Jaydeep Patel 3dae883e73 SWDEV-453535 - Capture hipMemset3DAsync.
Change-Id: I517c2557573db258b3e3e353f02f6a56652b0fde


[ROCm/clr commit: 12e0bdcd32]
2024-04-18 00:05:45 -04:00
Jaydeep Patel 24bb38acb8 SWDEV-455346 - End wait if HostcallListener terminates.
Change-Id: I21ec8eadb189147c579ec65acf68de40d604686b


[ROCm/clr commit: 8942939fac]
2024-04-18 00:04:00 -04:00
German Andryeyev 562f3ef098 SWDEV-440746 - Fix the hostcall buffer creation
Avoid a deadlock on the host call buffer creation. Since the buffer will be
allocated in the queue thread, then use direct device memory allocation
skipping the global context lock.

Change-Id: I09b55ee03bb42ab5d320c152b52a8c842c5fdcc1


[ROCm/clr commit: 62559a6e5a]
2024-04-17 12:37:23 -04:00
sdashmiz d2b2b2c5fa SWDEV-441603 - Correct dst device
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: Ie60aa598dd73df66cdf02c1d96daf2dfccba7a59


[ROCm/clr commit: d511e57257]
2024-04-17 09:21:06 -04:00
Jatin Chaudhary 5a7b0c50ad SWDEV-379007 - fix bool check for fp8_fnuz
for fnuz nums zero val is 0x00, -0 i.e. 0x80 would be a NaN

Change-Id: Ibdc4fb4b9fb307b5952434f08d45a8ddd6262db8


[ROCm/clr commit: d7b0d78fad]
2024-04-17 05:31:21 -04:00
Sameer Sahasrabuddhe c44b565943 SWDEV-454959 - ignore the upper half of the mask in wave32 mode
Change-Id: If027dd8cbe5cbe142fff353cb72c16f08e9aea8e


[ROCm/clr commit: 03562a2547]
2024-04-17 10:12:57 +05:30
pghafari e3266682d0 SWDEV-455699 - removing HW_REG_TRAPSTS for gfx12
Change-Id: I5f8b030eefdb37d3d51da3e135e5aa0f18ad9018


[ROCm/clr commit: 5ddca5854c]
2024-04-16 19:46:21 -04:00
Jatin Chaudhary 80f02a1534 SWDEV-379007 - use avx instruction for bf16 cvt
AMD CPUs have had avx512_bf16 support for quite some time now (from
consumer Ryzen 7000 series to enterprise grade CPUs). This
patch should allow users to use the hardware bf16 unit when running the
__host__ variants of the function. This can be enabled via `hipcc ...
-mavx512vl -mavx512bf16`.

Change-Id: I67c377afc95ddfe8d45a048dce078a247d4a1878


[ROCm/clr commit: 49349f168c]
2024-04-16 18:35:08 -04:00
German Andryeyev 7f195e2996 SWDEV-444670 - Enable teardown class
Force implicit runtime teardown with a global destructor.

Change-Id: Iabe63dedf5b94fefc98668585c45a61607120669


[ROCm/clr commit: c95a75a2bf]
2024-04-16 12:00:06 -04:00
kjayapra-amd 854cbe1d24 SWDEV-422580 - Adding back the pcie.function to PCI address string in hipGetDevicePCIBusId.
Change-Id: I932724cc872d7ae2643ce6ac2924901cb49cd7ad


[ROCm/clr commit: a1e0970d6d]
2024-04-16 07:28:48 -04:00
Jatin Chaudhary 5e42260071 SWDEV-379007 initial implementation of fp8 header
Change-Id: Id9a5a85641882961e4d860a815217c641e6f3387


[ROCm/clr commit: ca07f59fb1]
2024-04-16 05:37:59 -04:00
Sourabh Betigeri 26b7b7b951 SWDEV-453577 - Fixes to account for right CU count based on WGP or CU mode
Change-Id: Ib9739f9917bc6ff69cc76f444d909311922ebc1e


[ROCm/clr commit: fcfe2ec88b]
2024-04-15 11:53:43 -04:00
kjayapra-amd 867769b0ec SWDEV-413997 - Fixing alignment validation check for power of 2 instead of granularity factor.
Change-Id: I1e0db6e0628c09d26850e5a0339e2a4660442db8


[ROCm/clr commit: 00ddc3e284]
2024-04-15 09:45:29 -04:00
kjayapra-amd 79b3fc1fb8 SWDEV-413997 - Read Access can be valid now that ROCr takes care of access.
Change-Id: Iecda74ca0207c95d3fbed8b4e15c8c4c5895d939


[ROCm/clr commit: 815e450cfd]
2024-04-15 06:00:14 -04:00
Rakesh Roy f7dc86bdf4 SWDEV-453180 - Add UUID support for HIP_VISIBLE_DEVICES on Linux
- UUID is Ascii string with a maximum of 21 chars which uniquely identifies a GPU
- Convert set UUID in HIP_VISIBLE_DEVICES to device index internally
- Then use existing device index logic for HIP_VISIBLE_DEVICES

Change-Id: I8cab4fe42459f8209b97f909300789e6e687b9ac


[ROCm/clr commit: 52db98edd9]
2024-04-13 22:07:19 -04:00
kjayapra-amd 765e6f5d2f SWDEV-413997 - Fixing multiple device cases.
Change-Id: I10ad3fbfca887e92cd81f68392fa1acf753cbd2b


[ROCm/clr commit: d52d16c8e6]
2024-04-13 06:14:03 -04:00
German Andryeyev 27dec1c62d SWDEV-455254 - Reduce blit kernels signature
Remove offset from blit kernels, since it can be applied in setup.

Change-Id: I06b585068d68a0ee8e125ddf46a36fccb372f30d


[ROCm/clr commit: 7de7da4016]
2024-04-12 14:45:55 -04:00
taosang2 cbbf49f5e7 SWDEV-424956 - Fix half vector printf issue
Refactor PrintfDbg::outputArgument() to remove potential risk.
Fix half vector printf issue on all devices.
Fix FEAT-56794 as well.

Change-Id: Iae39359d2128588def2e43d77fe58e868b8e71ff


[ROCm/clr commit: 35c80dd482]
2024-04-12 14:25:44 -04:00
Jaydeep Patel b855d3f1ed SWDEV-436754 - Use glbctx instead so that ref count increments for multi devices and chunk decommit gets delayed.
Change-Id: Ia4b0d5fbfa8f198776e52d14de8b22c6942f740d


[ROCm/clr commit: d52168b46d]
2024-04-12 00:04:34 -04:00
German Andryeyev f29d608ca3 SWDEV-455254 - Add kernel arg optimization
Add kernel arguments optimization into blit path.
Enabled by default on MI300.

Change-Id: I2694a81b90d48ad07d86dfe4c0c64fe187bada8e


[ROCm/clr commit: f0c7ecf617]
2024-04-10 18:08:37 -04:00
kjayapra-amd ed9c629ad6 SWDEV-446298 - Adding error code to the logs on p2p hsa api failure.
Change-Id: Ic41b1ad1b64cca0e31986337a83a5146d52a7328


[ROCm/clr commit: 2b8634bada]
2024-04-10 06:00:00 -04:00
Jatin Chaudhary 565f915ebe SWDEV-379007 - add __hip_bfloat16_raw types
This also brings bfloat16 implementation closer to CUDA's.

Change-Id: I23f381141faacd6537923ae9b88ada4d661db496


[ROCm/clr commit: 481912a1fd]
2024-04-09 05:32:13 -04:00
Saleel Kudchadker 4285981222 SWDEV-301667 - Fix SDMA mask reuse
If we are using the mask returned by getLastUsedSdmaEngine() then we
need to apply the SDMA Read/Write mask to it before using with HSA
copy_on_engine API.

Change-Id: I6e5dc6c187eeb3c61ee159e9d2a0fa7b4737c06e


[ROCm/clr commit: 3f0bcf7834]
2024-04-08 15:42:52 -04:00
Sourabh Betigeri 7cc5dd56ab SWDEV-451964 - Limit gpu single allocation percentage for gfx940 only
Change-Id: Iadcdadd734e7aeeb23742e426353defa972d3ad5


[ROCm/clr commit: dbac2976e4]
2024-04-05 09:43:42 -04:00
Ioannis Assiouras 78008c05c5 SWDEV-453301 - Remove the option to write multiple packets in dispatchGenericAqlPacket
Dispatching multiple packets with ring the doorbell once is not supported by the lower layers

Change-Id: I7665a2dcdd4ef9e47dadfe410180fed64c5a4ee0


[ROCm/clr commit: d7f352dbed]
2024-04-05 05:28:10 -04:00
Rakesh Roy 9630492a1d SWDEV-450361 - Add nullptr validation for waitStream
- Application is passing null for parameter stream in API hipStreamWaitEvent
- When event stream isn't capturing and event is not recorded, causes segfault because we are accessing deviceId() from waitStream

Change-Id: I8b87ffd6f234677f68b66dcb7ef44b2ff04a7c91


[ROCm/clr commit: 880f1f0049]
2024-04-04 02:07:18 -04:00
cadolphe e89ea8577d SWDEV-446726 - Disable large bar for 32 bit windows
When large bar is enabled, persistent memory leads to overallocation for 32 bit architecture.

Change-Id: Iae39359d8128588de02e42d77fe58e868b8e71fd


[ROCm/clr commit: bc80802c1a]
2024-04-03 15:36:41 -04:00
cadolphe 02330a5324 SWDEV-443537 - fix make build warning message
Add cltrace compile definition for CL_TARGET_OPENCL_VERSION to OpenCL 2.2

Change-Id: Ie868ab0a6e86951afc6d07da58be942c3b736d15


[ROCm/clr commit: f7b1398361]
2024-04-02 16:42:01 -04:00
cadolphe 8f185a24d9 SWDEV-451687 - Fix alloc message values in AMD_LOG_LEVEL for 32 bit
Change-Id: Icbe67024297c92bf59139b6a2ccd2ba3674f60b1


[ROCm/clr commit: 411960a131]
2024-04-01 13:32:20 -04:00
Anusha GodavarthySurya ff29b47bb8 SWDEV-452787 - correct hipDrvGraphAddMemcpyNode check
Change-Id: Id58f982edd4f17d675f7a0f61a9b4dea0baebd9b


[ROCm/clr commit: ea4f09e8c0]
2024-03-29 00:56:12 -04:00
Anusha GodavarthySurya ae296c8fad SWDEV-443567 - SWDEV-436126 - Fix Prohibited and Unhandled Operations during capture
=> hipDeviceSynchronize is not allowed during capture.
=> hipEventSynchronize during capture should return hipErrorCapturedEvent error
=> hipEventQuery during capture should return hipErrorCapturedEvent error
hipStreamSynchronize, hipEventSynchronize, hipStreamWaitEvent, hipStreamQuery
For Side Stream(Stream that is not currently under capture):
=> If current thread is capturing in relaxed mode, calls are allowed
=> If any stream in current/concurrent thread is capturing in global mode, calls are not allowed
=> If any stream in current thread is capturing in ThreadLocal mode, calls are not allowed
For Stream that is currently under capture
=> calls are not allowed
=> Any call that is not allowed during capture invalidates the capture sequence
=> It is invalid to call synchronous APIs during capture. Synchronous APIs,
such as hipMemcpy(), enqueue work to the legacy stream and synchronize it before returning.

Change-Id: I201c6e63e1a5d93fd416a3b520264c0fdbe31237


[ROCm/clr commit: 19b4660cbb]
2024-03-28 22:10:31 -04:00
German Andryeyev 4f123e9c10 SWDEV-449922 - Remove a global lock for queues on wait
The new logic has a lock for PAL call and doesn't require the lock for queues.

Change-Id: I61b67c3c4abd2ede44809de1d6beed756766032e


[ROCm/clr commit: 2f3ad43c4a]
2024-03-28 18:17:46 -04:00
German Andryeyev 31ace540fa SWDEV-449922 - Remove per queue residency update
alwaysResident setting doesn't require per queue residency tracking.
Thus, the logic can be skipped to avoid the lock of queues.

Change-Id: Ib5cff5b79d3ecb8c2f2eb2565cf069f9a69438b0


[ROCm/clr commit: 95e3958748]
2024-03-28 16:16:37 -04:00
Jaydeep Patel 21a5b16faa SWDEV-450636 - Hostcall Listen thread cause seg fault due to main thread is being destroied by app/test's unload libamdhip64.so call.
Change-Id: I8d4a8d8b6801d9f6eb745c45adf831597def0cb5


[ROCm/clr commit: 4761ecbb15]
2024-03-27 00:10:55 -04:00
Saleel Kudchadker f3aedfbec0 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006


[ROCm/clr commit: c157bfb202]
2024-03-26 14:47:24 -04:00
Rakesh Roy 150144a05f SWDEV-445096 - Fix -O0 crash in OpenCL tests
- With https://gerrit-git.amd.com/c/lightning/ec/llvm-project/+/1002628 applied, at -O0 Kernel::dynamicParallelism() returns true but virtual queue isn't created
- This causes segfault inside VirtualGPU::submitKernelInternal() when getVQVirtualAddress() is called

Change-Id: Ia7af042adad2329e870c142caaac3e8fa886f8b8


[ROCm/clr commit: d1fff7cea2]
2024-03-26 11:42:33 -04:00
Jaydeep Patel c4239ed8e8 SWDEV-446992 - Request can be for bytes OR dptr OR both.
Change-Id: Ib479c744b90125b74d99cbf18b7f4b8cf765bf1c


[ROCm/clr commit: 09328f45b3]
2024-03-26 11:10:24 -04:00
kjayapra-amd 0f8a9567e6 SWDEV-413997 - Save hsa_handle as ptr in hipMemCreate path.
Change-Id: Ica32017ef7b00326dfb6d1f604e126d40ad5b786


[ROCm/clr commit: 5cbd74b554]
2024-03-26 10:24:29 -04:00
kjayapra-amd a17f9d2d75 SWDEV-448288 - Remove references to deprecated llvm references.
Change-Id: I7b58177c41dc0c6c59813977cb90e65a6cb3be72


[ROCm/clr commit: 5b28e386f6]
2024-03-26 10:23:18 -04:00
Anusha GodavarthySurya cf03256284 SWDEV-301667 - Disable HostBlit copy for HIP correct if check
Change-Id: I33d1359d5e4c871f63350d8300f726e039664d86


[ROCm/clr commit: 7f84df9f74]
2024-03-26 02:18:51 -04:00
German Andryeyev e46ab0bff5 SWDEV-353281 - Change pool type for graphs
Under ROCr physical allocations don't have initial VA and require extra
flag in ROCclr. Add an option to have a mempool of physical allocations.

Change-Id: I4d062fe0dd8113d4eaf6e8b51749ed56d8701d1e


[ROCm/clr commit: f296159f62]
2024-03-25 10:21:05 -04:00
Ioannis Assiouras 50ecfb39c9 SWDEV-451736 - Revert "SWDEV-444670 - Register the Runtime::tearDown function to be called at exit"
This reverts commit 8392200773.

Change-Id: Ib9cb1cc0c3903bfba56c9a5d05ae8afe96be583a


[ROCm/clr commit: 51926b6b6b]
2024-03-23 07:05:02 -04:00
Julia Jiang 08e18077cc Revert "SWDEV-444098 - remove rocm-ocl-icd packaging"
This reverts commit 6c46d696ae.

Reason for revert: due to windows staging branch using Opencl-icd-loader master

Change-Id: I9cca7564a21de1733665a34da6f0322aa3b886e7


[ROCm/clr commit: f44ca70bb7]
2024-03-22 16:59:23 -04:00
Jaydeep Patel 1bbbff9dd9 SWDEV-452299 - Pass dst pitch while capturing hipMemcpyParam2DAsync & elementSize should be 1 as width is in bytes while capturing hipMemset2DAsync.
Change-Id: I8f9122a30cba0a07c097dfd7609432090caab142


[ROCm/clr commit: 0be92b8f09]
2024-03-21 12:49:34 -04:00