Граф коммитов

13402 Коммитов

Автор SHA1 Сообщение Дата
Li, Todd tiantuo 04dc7ca51f SWDEV-508980 - [6.4 Preview] fix hipDeviceSetCacheConfig during stream capture
Change-Id: I8e89774a8163fdc120155f742606ee2c0aa7103b


[ROCm/clr commit: 9faaf20aae]
2025-02-22 01:05:28 -05:00
Li, Todd tiantuo 82f78ce187 SWDEV-510271 - [6.4 Preview] fix hipCreateSurfaceObject & hipDestroySurfaceObject during stream capture
Change-Id: I19e149549c271d847f52b72e04cb2427ca194b24


[ROCm/clr commit: c07468e53c]
2025-02-22 01:04:35 -05:00
Ioannis Assiouras 8d29fb9e6d SWDEV-509788 - Use stream memory operation in hipStreamWaitEvent
This change removes the stream callback from hipStreamWaitEvent and
uses a stream memory wait operation instead. This allows the
hipStreamWaitEvent to be non-blocking on the host.

Change-Id: Ie5530febda5a5bcb5daa0db8a01249d6b137fd43


[ROCm/clr commit: 721c5800ca]
2025-02-21 11:46:09 -05:00
Julia Jiang 1495cc77eb SWDEV-513294 - fix regression on SVM sub-test failure in Conformance
Change-Id: Ic2449dd34a9cd2b623d5f8fbe89fd042566a56e3


[ROCm/clr commit: b7eaec76fc]
2025-02-20 15:40:23 -05:00
kjayapra-amd 010253430f SWDEV-516303 - Remove SDMA retainer logic to select the engine.
Change-Id: I818129444131825cdb87e06cb495afa3e5cdb683


[ROCm/clr commit: 1f583a6870]
2025-02-20 11:34:38 -05:00
German Andryeyev a7f3ad7867 SWDEV-515356 - Make the round-robin queue selection
- Add custom compare to the map of queues, which will help with
 the round-robin selection

Change-Id: Ie67a820bfb1a5b484a1b3edced967eed94228bb8


[ROCm/clr commit: ba8e740be4]
2025-02-20 11:09:54 -05:00
German Andryeyev f9d9b2c441 SWDEV-497841 - Add virtual memory heap
Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.

Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500


[ROCm/clr commit: 296dce5570]
2025-02-20 10:55:49 -05:00
German Andryeyev 6f2a603277 SWDEV-497619 - Allocate extra space in CB
Compute doesn't support IB chaining, but RGP may collect
perf counters, which require more space in CB.
Increase CB size if RGP is enabled.

Change-Id: Iaa0a620ead8541a679b0dfe5e5711af5afdba545


[ROCm/clr commit: 63cf3057ba]
2025-02-20 10:40:09 -05:00
Jimbo Xie 8a42a52d0f SWDEV-477219 - implement hipEventRecordWithFlags
Change-Id: Icf07e85fc8c15f921f6e7c9fbd31dd3856dc988b


[ROCm/clr commit: 7a4a22d454]
2025-02-19 13:53:00 -05:00
Jatin Chaudhary 16f9dbff6c SWDEV-511239 - make fp8 standalone host compileable
- Use correct header in device_library_decl
- use std:: instead of __hip_internal:: for host compilation
- hide device specific stuff behind __clang__ and __HIP__ check

Change-Id: I2f3647e00555ed0e79f9954a459c41394c3cd49b


[ROCm/clr commit: c3f49c8788]
2025-02-18 19:07:45 -05:00
Jatin Chaudhary 508d043176 SWDEV-515255 - do not free bitcode object before code gen
- Also add a cache, which allows compiled code objects to be reused
  instead of compiling again. This should improve performance on
  multigpu systems.

Change-Id: Ib135d616c076b77f8aaf28de275d408b38021d89


[ROCm/clr commit: 0391aec14a]
2025-02-18 12:39:31 -05:00
Tim Gu 8fcbc2acfe SWDEV-502248 - Parse file path with space characters
Signed-off-by: Tim Gu <Tim.Gu@amd.com>
Change-Id: I67fb9cf5559c9c06f24627a1b25fec3e89b2d1cf


[ROCm/clr commit: 84a867fb73]
2025-02-18 10:31:21 -05:00
agunashe 52a1f5dbf7 SWDEV-507967 - Deprecate gfx9, gfx8, gfx7 on Windows
PAL_CLIENT_INTERFACE_MAJOR_VERSION from 872 --> 910

Change-Id: I03dfa2924ccdae4c2f13f09d5f34ee58298e1343


[ROCm/clr commit: ea804e16f8]
2025-02-17 02:59:41 -05:00
Anusha GodavarthySurya c6bea0ea59 SWDEV-469422 - hipgraph remove static typecast to parent
Change-Id: I339250cfd26a7c04543722a82301acbb41c7d5d7


[ROCm/clr commit: 199e464402]
2025-02-14 11:09:32 -05:00
David Salinas e2da5772ff Deprecate roc-obj* tooling
- make Perl packages RECOMENDS/SUGGESTS for hip-dev
  - update CHANGE log

  SWDEV-511528 - TECH Remove ROCM Perl dependency - hip-dev
  SWDEV-333176 - Shift functionality of 'roc-obj-*' perl scripts into llvm-objdump

Change-Id: Iec3ba245848781f95c825f0d37aff4b4fb54f5e4


[ROCm/clr commit: c942833b34]
2025-02-13 11:42:57 -05:00
Vladana Stojiljkovic 7078aab436 SWDEV-510059 - Format CU mask properly
Change-Id: I80e94b4f3ea25f6988fc06d83aeb398e81ccddd1


[ROCm/clr commit: 061c5d877f]
2025-02-13 11:02:56 -05:00
harkgill-amd cac2e94141 Specify C++ language mode for warning post amdgpu-arch failure
Change-Id: I55bf6734a1e8dc06dd0a1ee12086b7667332206f


[ROCm/clr commit: 935b538261]
2025-02-13 09:40:13 -05:00
Aidan Belton-Schure 4b4a35b86b SWDEV-508279 - Improve HIP event profiling
There are 2 functional changes to this patch:
* Use GPU timing for internal markers for HIP.
* Measure CPU time closer to GPU timer, to reduce delta between GPU/CPU timestamp measurements.

There are some smaller non-functional updates:
* waifForFence -> waitForFence typo
* Remove unused drmProfiling

Change-Id: I4c5fa600a842ab60e454888779edcac8449a902a


[ROCm/clr commit: 179801a750]
2025-02-13 04:15:40 -05:00
Jatin Chaudhary 5725b99619 SWDEV-474146 - use __bf16 to do operations
Change-Id: I568dfa97238fd760f5362a8e560c33402f96cff3


[ROCm/clr commit: c23913f6e7]
2025-02-12 07:03:05 -05:00
Jatin Chaudhary db2a3214c4 SWDEV-504769 - Allow hipEvent_t to record on hipStreamLegacy
Change-Id: Ib86412255adad172598620ea81214e5eb56020ea


[ROCm/clr commit: e560d94d2c]
2025-02-12 07:02:35 -05:00
Ioannis Assiouras a349b23474 SWDEV-514686 - Fixed hipEventSynchronize/hipStreamWaitEvent for IPC events
Resolved an issue where hipEventSynchronize and hipStreamWaitEvent APIs
did not function correctly for events created with the hipEventInterprocess flag.
The bug caused the event to be incorrectly marked as "recorded,"
leading to these APIs failing to wait for the event as expected.

Change-Id: Ic9fdfaab2393beb93d6e0b83661545e902a63499


[ROCm/clr commit: 1cdfbfd270]
2025-02-11 18:43:06 -05:00
kjayapra-amd 1f648c7d94 SWDEV-511672 - Special case the Remote USWC memory usage for HIP, if the alloc size is large.
Change-Id: I524c1402b249cedfd58b56f494caa2ac057e1623


[ROCm/clr commit: cf6aabb823]
2025-02-11 06:42:18 -05:00
Saleel Kudchadker 71e1a0b10d SWDEV-504494 - Further copy improvements
- Fix regression for D2H pinned copies which adds systemscope release.
- Skip cpu wait for D2H unpinned copies as we can pass the signal of the
  barrier to rocr copy.
- Fix an old bug in sdmaEngineRetainCount_ logic
- Improve logging

Change-Id: If074bddb05564b15949b0d5f9bf12acd3692174e


[ROCm/clr commit: 4c95ee5e1e]
2025-02-11 00:55:52 -05:00
victzhan 7cd780c1cb SWDEV-485042 - Remove -I option passed into comgr when file type is not FILE_TYPE_ASM_TEXT
Change-Id: If8e469f881651f7b3dae364e8182ef1ba6f3a0d1


[ROCm/clr commit: ca35d93672]
2025-02-10 11:47:04 -05:00
Ioannis Assiouras eb77b9aba6 SWDEV-508435 - Use the stream of the src/dst image memory object in A2H and H2A commands
Change-Id: I9b776a54760a4633d5f84cf7b467d2d3ba8cbdde


[ROCm/clr commit: a8edb8d467]
2025-02-07 13:38:31 -05:00
taosang2 f84a8e62d3 SWDEV-446880 - Make ocltst MemoryInfo pass in EMU
Make ocltst -m tests/ocltst/liboclruntime.so -t OCLMemoryInfo
pass in emu where GPU memory is very big.

Cherry pick
  https://gerrit-git.amd.com/c/compute/ec/clr/+/1014858

Change-Id: I0228c5e87ce7c366983fd4af71c25e7f8161c2c7


[ROCm/clr commit: de83d7a6ae]
2025-02-07 09:16:24 -05:00
Satyanvesh Dittakavi 8daab29f7f SWDEV-477584 - hipExtGetLastError should return the immediate previous API error
hipGetLastError should return the error by any of the previous APIs
in the same host thread to match the CUDA behavior, whereas
hipExtGetLastError will return the error by the immediate previous API.
This Ext API was added earlier to facilitate the existing HIP apps which
are following the current behavior of hipGetLastError

Change-Id: I61e95b1fc136cc761e2434e02187b7ed2598b733


[ROCm/clr commit: 4b443f8133]
2025-02-06 23:30:48 -05:00
Ioannis Assiouras 6a00aa8d61 SWDEV-508435 - Added a fix for double free of hsaImageObject
Change-Id: I9397f7c9dbbad7c249b359155df312cb920eba6c


[ROCm/clr commit: d05ecea253]
2025-02-05 22:21:24 +00:00
Ioannis Assiouras c0b728fcad SWDEV-513323 - Fix for BatchMemOp on devices with no image support
BatchMemop should be positioned before the image support kernels
because the total number of kernels is determined by BlitLinearTotal,
when there is no image support on the device.

Change-Id: I8e53caf744ba54259ac04bad1762eef21806f3f2


[ROCm/clr commit: 3e01da3dac]
2025-02-05 04:45:22 -05:00
Anusha GodavarthySurya 5535f15104 SWDEV-469422 - hipGraph move to classes from structs
Change-Id: I0f9c8ef1161c0c92ebe0cce6844b2feacfee83f5


[ROCm/clr commit: 32e5b00c30]
2025-02-05 00:33:41 -05:00
taosang2 27e87ccca6 SWDEV-513458 - Add gfx950 target ID
Add gfx950 target ID

Cherry-picked
https://gerrit-git.amd.com/c/compute/ec/clr/+/997678
https://gerrit-git.amd.com/c/compute/ec/clr/+/1063519

Change-Id: I0228c5e87ceec366983fd4afb1c25e7f8161c2c2


[ROCm/clr commit: 29cc394510]
2025-02-04 18:30:23 -05:00
Steven Chung 5513df58eb SWDEV-496674 - Convert non-templated typedefs to templates for consistent mangling
Change-Id: I952d15f20afc85c0118403f82e75360197049ef5


[ROCm/clr commit: 782976f5c2]
2025-02-04 16:37:00 -05:00
kjayapra-amd 892d7bb064 SWDEV-488290 - Remove Stream to Engine logic and rely on engine query status HSA API.
Change-Id: I469ab6679360c8ee8d4ee515678a8aa8d4578ebf


[ROCm/clr commit: cc62a82347]
2025-02-04 13:00:16 -05:00
Ajay cb281e23cd SWDEV-485453 - add hipcc dependency to hip-dev
Change-Id: I607fc7c3b3a2137835cb2fb8eeb23d3daed51c91


[ROCm/clr commit: 25572c2efc]
2025-02-04 11:29:59 -05:00
Rahul Manocha 4cbfbe2112 SWDEV-511855 - Fix hipMemcpyPeer to support stream capture checks
Change-Id: I7797f069b3ed4240b6785e82da7494a97b4843c6


[ROCm/clr commit: 81051f3520]
2025-02-04 11:22:35 -05:00
Aidan Belton-Schure 33b4f178c0 SWDEV-443561 - Add tools dispatch table
Change-Id: I3445554e486ab7b94592571f52c1530cb918d021


[ROCm/clr commit: 152cee3737]
2025-02-04 04:57:38 -05:00
Juan Manuel Martinez Caamaño 5356f13902 SWDEV-132637: Remove OpenCL cl_khr_depth_images workaround that is not needed anymore
The cl_khr_depth_images associated macro definition is defined twice in
the compiler: in opencl-c.h and automatically by the compiler deduced
from the cl-ext list. These two co-exist and there is no need to remove
cl_khr_depth_images from the cl-ext list.

If we remove cl_khr_depth_images from the cl-ext list, and we do not
include opencl-c.h the macro is not defined.

This fixes conformance test ./test_compiler compiler_defines_for_extensions
when using Comgr with -include opencl-c-base.h -fdeclare-opencl-builtins
without including opencl-c.h.

Before we got the error `ERROR: Supported extension cl_khr_depth_images
not defined in kernel`

This change is needed to eventually get rid of the opencl-c.pch that is embedded in comgr, and that makes implementing a compilation cache in comgr hard.

Change-Id: I76497874ebe7163966420d4ac23a0788b93a36fd


[ROCm/clr commit: 8c9e6d0fa5]
2025-02-04 03:14:31 -05:00
Jacob Lambert 2bd527c676 SWDEV-387063 - Use clang default for C++ version
Instead of enforcing c++14 here, we can instead use the current
clang default

Change-Id: Ib0a178a53c1377f2910edf6fab82b2bac6567ac7


[ROCm/clr commit: 33e48b9629]
2025-02-03 11:07:52 -05:00
Jimbo Xie cc229f251f SWDEV-504383 - Cleaned up kForcedTimeout10us and removed IsHwEventReadyForcedWait
Also removed active_wait_timeout

Change-Id: I7a429f003c09a4df267b5c0983050704260094c6


[ROCm/clr commit: 4872b420c9]
2025-01-31 14:40:18 -05:00
taosang2 40df900647 SWDEV-501963 - Add missing codes for gfx950
Cherry-pick https://gerrit-git.amd.com/c/compute/ec/clr/+/1162997

Change-Id: I6b3c6bf55c61cffd43cd6f17b75998f751b75723


[ROCm/clr commit: 32daa8f384]
2025-01-31 14:34:49 -05:00
taosang2 af99b5d52d FEAT-56803 - Fix ocltst slow issues
Fix very slow issues of two ocltst test cases.

Cherry pick
 https://gerrit-git.amd.com/c/compute/ec/clr/+/1009383

Change-Id: I0228c5e87cdec366993fd4afb1c25e7f8161c2c5


[ROCm/clr commit: 4ec274c7d4]
2025-01-31 10:45:43 -05:00
Anusha GodavarthySurya 837f7ca08c SWDEV-489084 - Avoid creating internal stream when graph has single branch
Change-Id: I9371d44481257069bb51c0217a57f97d803589c4


[ROCm/clr commit: b385992f94]
2025-01-31 00:16:57 -05:00
kjayapra-amd 712987ed08 SWDEV-509280 - Combine multiple definitions of callbackQueue into a single function.
Change-Id: Ibbb56136bec2beed71c202d75e8aec9e82640a4e


[ROCm/clr commit: 0324014710]
2025-01-30 15:58:11 -05:00
Jatin Chaudhary f8421ce480 SWDEV-508617 - There is no NaN for E4M3 and FNUZ
Change-Id: I330b041019990231c098073f94d9d40a3c13ba76


[ROCm/clr commit: 1fdbf35d14]
2025-01-30 11:48:34 -05:00
Saleel Kudchadker d0656c944b SWDEV-504494 - Resolve signal dependencies
- Resolve signal dependencies for barrier value packet if there are > 1
  depenent signals. Barrier Value packet accounts for only 1 dep signal
- Better log

Change-Id: Ia506ad5d80b91d598f92e7b539f41756e9b4b64b


[ROCm/clr commit: 2d450e8b06]
2025-01-29 19:49:02 +00:00
Jatin Chaudhary 992b5fd009 SWDEV-507817 - fix the return type of one of the atomicMin variants
Change-Id: I9915eb174d5677e21adbabae5819c9e306338ab3


[ROCm/clr commit: e6fb89190a]
2025-01-29 11:52:19 -05:00
Jimbo Xie 0a30936c67 SWDEV-510869 - add gfx1153 id
Change-Id: I36d39a1db2392990ad9b01d70676c3c986435707


[ROCm/clr commit: 4abedf2a0e]
2025-01-28 18:15:46 -05:00
Saleel Kudchadker 21ae9ef25e SWDEV-508225 - Improve fat binary handling
Change-Id: I78a9951f2f4c4c743c1205b1e40aac215054e27d


[ROCm/clr commit: 08af3eb484]
2025-01-28 14:38:21 -05:00
German Andryeyev ae379965dd SWDEV-459826 - Add a crash dump for a failed queue
The logic can analyze the AQL queue state and
find a failed AQL packet with the kernel's name

Change-Id: I1a478fa2c25462cd07a194784958bdf22454b897


[ROCm/clr commit: ea0b092af8]
2025-01-28 14:27:46 -05:00
Tao Sang 7803594aea SWDEV-458943 - Add fast path in wait()
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
 signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().

Improve monitor wrapper for better performance.

Fix some bugs left from name removing patch.

Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24


[ROCm/clr commit: f2ff56af9c]
2025-01-28 12:19:55 -05:00