Граф коммитов

13390 Коммитов

Автор SHA1 Сообщение Дата
agunashe ea804e16f8 SWDEV-507967 - Deprecate gfx9, gfx8, gfx7 on Windows
PAL_CLIENT_INTERFACE_MAJOR_VERSION from 872 --> 910

Change-Id: I03dfa2924ccdae4c2f13f09d5f34ee58298e1343
2025-02-17 02:59:41 -05:00
Anusha GodavarthySurya 199e464402 SWDEV-469422 - hipgraph remove static typecast to parent
Change-Id: I339250cfd26a7c04543722a82301acbb41c7d5d7
2025-02-14 11:09:32 -05:00
David Salinas c942833b34 Deprecate roc-obj* tooling
- make Perl packages RECOMENDS/SUGGESTS for hip-dev
  - update CHANGE log

  SWDEV-511528 - TECH Remove ROCM Perl dependency - hip-dev
  SWDEV-333176 - Shift functionality of 'roc-obj-*' perl scripts into llvm-objdump

Change-Id: Iec3ba245848781f95c825f0d37aff4b4fb54f5e4
2025-02-13 11:42:57 -05:00
Vladana Stojiljkovic 061c5d877f SWDEV-510059 - Format CU mask properly
Change-Id: I80e94b4f3ea25f6988fc06d83aeb398e81ccddd1
2025-02-13 11:02:56 -05:00
harkgill-amd 935b538261 Specify C++ language mode for warning post amdgpu-arch failure
Change-Id: I55bf6734a1e8dc06dd0a1ee12086b7667332206f
2025-02-13 09:40:13 -05:00
Aidan Belton-Schure 179801a750 SWDEV-508279 - Improve HIP event profiling
There are 2 functional changes to this patch:
* Use GPU timing for internal markers for HIP.
* Measure CPU time closer to GPU timer, to reduce delta between GPU/CPU timestamp measurements.

There are some smaller non-functional updates:
* waifForFence -> waitForFence typo
* Remove unused drmProfiling

Change-Id: I4c5fa600a842ab60e454888779edcac8449a902a
2025-02-13 04:15:40 -05:00
Jatin Chaudhary c23913f6e7 SWDEV-474146 - use __bf16 to do operations
Change-Id: I568dfa97238fd760f5362a8e560c33402f96cff3
2025-02-12 07:03:05 -05:00
Jatin Chaudhary e560d94d2c SWDEV-504769 - Allow hipEvent_t to record on hipStreamLegacy
Change-Id: Ib86412255adad172598620ea81214e5eb56020ea
2025-02-12 07:02:35 -05:00
Ioannis Assiouras 1cdfbfd270 SWDEV-514686 - Fixed hipEventSynchronize/hipStreamWaitEvent for IPC events
Resolved an issue where hipEventSynchronize and hipStreamWaitEvent APIs
did not function correctly for events created with the hipEventInterprocess flag.
The bug caused the event to be incorrectly marked as "recorded,"
leading to these APIs failing to wait for the event as expected.

Change-Id: Ic9fdfaab2393beb93d6e0b83661545e902a63499
2025-02-11 18:43:06 -05:00
kjayapra-amd cf6aabb823 SWDEV-511672 - Special case the Remote USWC memory usage for HIP, if the alloc size is large.
Change-Id: I524c1402b249cedfd58b56f494caa2ac057e1623
2025-02-11 06:42:18 -05:00
Saleel Kudchadker 4c95ee5e1e SWDEV-504494 - Further copy improvements
- Fix regression for D2H pinned copies which adds systemscope release.
- Skip cpu wait for D2H unpinned copies as we can pass the signal of the
  barrier to rocr copy.
- Fix an old bug in sdmaEngineRetainCount_ logic
- Improve logging

Change-Id: If074bddb05564b15949b0d5f9bf12acd3692174e
2025-02-11 00:55:52 -05:00
victzhan ca35d93672 SWDEV-485042 - Remove -I option passed into comgr when file type is not FILE_TYPE_ASM_TEXT
Change-Id: If8e469f881651f7b3dae364e8182ef1ba6f3a0d1
2025-02-10 11:47:04 -05:00
Ioannis Assiouras a8edb8d467 SWDEV-508435 - Use the stream of the src/dst image memory object in A2H and H2A commands
Change-Id: I9b776a54760a4633d5f84cf7b467d2d3ba8cbdde
2025-02-07 13:38:31 -05:00
taosang2 de83d7a6ae SWDEV-446880 - Make ocltst MemoryInfo pass in EMU
Make ocltst -m tests/ocltst/liboclruntime.so -t OCLMemoryInfo
pass in emu where GPU memory is very big.

Cherry pick
  https://gerrit-git.amd.com/c/compute/ec/clr/+/1014858

Change-Id: I0228c5e87ce7c366983fd4af71c25e7f8161c2c7
2025-02-07 09:16:24 -05:00
Satyanvesh Dittakavi 4b443f8133 SWDEV-477584 - hipExtGetLastError should return the immediate previous API error
hipGetLastError should return the error by any of the previous APIs
in the same host thread to match the CUDA behavior, whereas
hipExtGetLastError will return the error by the immediate previous API.
This Ext API was added earlier to facilitate the existing HIP apps which
are following the current behavior of hipGetLastError

Change-Id: I61e95b1fc136cc761e2434e02187b7ed2598b733
2025-02-06 23:30:48 -05:00
Ioannis Assiouras d05ecea253 SWDEV-508435 - Added a fix for double free of hsaImageObject
Change-Id: I9397f7c9dbbad7c249b359155df312cb920eba6c
2025-02-05 22:21:24 +00:00
Ioannis Assiouras 3e01da3dac SWDEV-513323 - Fix for BatchMemOp on devices with no image support
BatchMemop should be positioned before the image support kernels
because the total number of kernels is determined by BlitLinearTotal,
when there is no image support on the device.

Change-Id: I8e53caf744ba54259ac04bad1762eef21806f3f2
2025-02-05 04:45:22 -05:00
Anusha GodavarthySurya 32e5b00c30 SWDEV-469422 - hipGraph move to classes from structs
Change-Id: I0f9c8ef1161c0c92ebe0cce6844b2feacfee83f5
2025-02-05 00:33:41 -05:00
taosang2 29cc394510 SWDEV-513458 - Add gfx950 target ID
Add gfx950 target ID

Cherry-picked
https://gerrit-git.amd.com/c/compute/ec/clr/+/997678
https://gerrit-git.amd.com/c/compute/ec/clr/+/1063519

Change-Id: I0228c5e87ceec366983fd4afb1c25e7f8161c2c2
2025-02-04 18:30:23 -05:00
Steven Chung 782976f5c2 SWDEV-496674 - Convert non-templated typedefs to templates for consistent mangling
Change-Id: I952d15f20afc85c0118403f82e75360197049ef5
2025-02-04 16:37:00 -05:00
kjayapra-amd cc62a82347 SWDEV-488290 - Remove Stream to Engine logic and rely on engine query status HSA API.
Change-Id: I469ab6679360c8ee8d4ee515678a8aa8d4578ebf
2025-02-04 13:00:16 -05:00
Ajay 25572c2efc SWDEV-485453 - add hipcc dependency to hip-dev
Change-Id: I607fc7c3b3a2137835cb2fb8eeb23d3daed51c91
2025-02-04 11:29:59 -05:00
Rahul Manocha 81051f3520 SWDEV-511855 - Fix hipMemcpyPeer to support stream capture checks
Change-Id: I7797f069b3ed4240b6785e82da7494a97b4843c6
2025-02-04 11:22:35 -05:00
Aidan Belton-Schure 152cee3737 SWDEV-443561 - Add tools dispatch table
Change-Id: I3445554e486ab7b94592571f52c1530cb918d021
2025-02-04 04:57:38 -05:00
Juan Manuel Martinez Caamaño 8c9e6d0fa5 SWDEV-132637: Remove OpenCL cl_khr_depth_images workaround that is not needed anymore
The cl_khr_depth_images associated macro definition is defined twice in
the compiler: in opencl-c.h and automatically by the compiler deduced
from the cl-ext list. These two co-exist and there is no need to remove
cl_khr_depth_images from the cl-ext list.

If we remove cl_khr_depth_images from the cl-ext list, and we do not
include opencl-c.h the macro is not defined.

This fixes conformance test ./test_compiler compiler_defines_for_extensions
when using Comgr with -include opencl-c-base.h -fdeclare-opencl-builtins
without including opencl-c.h.

Before we got the error `ERROR: Supported extension cl_khr_depth_images
not defined in kernel`

This change is needed to eventually get rid of the opencl-c.pch that is embedded in comgr, and that makes implementing a compilation cache in comgr hard.

Change-Id: I76497874ebe7163966420d4ac23a0788b93a36fd
2025-02-04 03:14:31 -05:00
Jacob Lambert 33e48b9629 SWDEV-387063 - Use clang default for C++ version
Instead of enforcing c++14 here, we can instead use the current
clang default

Change-Id: Ib0a178a53c1377f2910edf6fab82b2bac6567ac7
2025-02-03 11:07:52 -05:00
Jimbo Xie 4872b420c9 SWDEV-504383 - Cleaned up kForcedTimeout10us and removed IsHwEventReadyForcedWait
Also removed active_wait_timeout

Change-Id: I7a429f003c09a4df267b5c0983050704260094c6
2025-01-31 14:40:18 -05:00
taosang2 32daa8f384 SWDEV-501963 - Add missing codes for gfx950
Cherry-pick https://gerrit-git.amd.com/c/compute/ec/clr/+/1162997

Change-Id: I6b3c6bf55c61cffd43cd6f17b75998f751b75723
2025-01-31 14:34:49 -05:00
taosang2 4ec274c7d4 FEAT-56803 - Fix ocltst slow issues
Fix very slow issues of two ocltst test cases.

Cherry pick
 https://gerrit-git.amd.com/c/compute/ec/clr/+/1009383

Change-Id: I0228c5e87cdec366993fd4afb1c25e7f8161c2c5
2025-01-31 10:45:43 -05:00
Anusha GodavarthySurya b385992f94 SWDEV-489084 - Avoid creating internal stream when graph has single branch
Change-Id: I9371d44481257069bb51c0217a57f97d803589c4
2025-01-31 00:16:57 -05:00
kjayapra-amd 0324014710 SWDEV-509280 - Combine multiple definitions of callbackQueue into a single function.
Change-Id: Ibbb56136bec2beed71c202d75e8aec9e82640a4e
2025-01-30 15:58:11 -05:00
Jatin Chaudhary 1fdbf35d14 SWDEV-508617 - There is no NaN for E4M3 and FNUZ
Change-Id: I330b041019990231c098073f94d9d40a3c13ba76
2025-01-30 11:48:34 -05:00
Saleel Kudchadker 2d450e8b06 SWDEV-504494 - Resolve signal dependencies
- Resolve signal dependencies for barrier value packet if there are > 1
  depenent signals. Barrier Value packet accounts for only 1 dep signal
- Better log

Change-Id: Ia506ad5d80b91d598f92e7b539f41756e9b4b64b
2025-01-29 19:49:02 +00:00
Jatin Chaudhary e6fb89190a SWDEV-507817 - fix the return type of one of the atomicMin variants
Change-Id: I9915eb174d5677e21adbabae5819c9e306338ab3
2025-01-29 11:52:19 -05:00
Jimbo Xie 4abedf2a0e SWDEV-510869 - add gfx1153 id
Change-Id: I36d39a1db2392990ad9b01d70676c3c986435707
2025-01-28 18:15:46 -05:00
Saleel Kudchadker 08af3eb484 SWDEV-508225 - Improve fat binary handling
Change-Id: I78a9951f2f4c4c743c1205b1e40aac215054e27d
2025-01-28 14:38:21 -05:00
German Andryeyev ea0b092af8 SWDEV-459826 - Add a crash dump for a failed queue
The logic can analyze the AQL queue state and
find a failed AQL packet with the kernel's name

Change-Id: I1a478fa2c25462cd07a194784958bdf22454b897
2025-01-28 14:27:46 -05:00
Tao Sang f2ff56af9c SWDEV-458943 - Add fast path in wait()
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
 signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().

Improve monitor wrapper for better performance.

Fix some bugs left from name removing patch.

Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24
2025-01-28 12:19:55 -05:00
Saleel Kudchadker d208e8052f SWDEV-504494 - Set active engine for SDMA
Change-Id: I4cec84e71903c5813a7063e8b9ff1ea4473f4720
2025-01-27 17:54:36 -05:00
Gerardo Hernandez b073063612 SWDEV-510589 - Use libgcc1 package (on Debian 10 only)
Change-Id: Ibe945e366468a84fd717e0e425cfaf7dab5a99c4
2025-01-27 11:02:30 -05:00
Marko Arandjelovic 269ec54252 SWDEV-489619 - Added checks for memcpy capture path
Change-Id: I0e156099282f0b6393bcbcee2e9b96c31034a851
2025-01-27 03:51:34 -05:00
Jacob Lambert 1fc7c6bb9a SWDEV-360440 - Prepare CLR CMake for Comgr V3 transition
Change-Id: Ia279928fd3549a45bae561d0d2d8fcf110d8c245
2025-01-27 01:09:23 -05:00
Ioannis Assiouras 21c223f8df SWDEV-510319 - Fixed random segfaults in graph tests
This change fixes random segfaults in graph tests that
are seen after the change make internal callbacks non-blocking.
The callback thread that decreases the GraphExec ref count
may now run after the runtime shutdown. This can cause a segfault
because the hip::device that is accessed in GraphExec destructor
is already destroyed during runtime shutdown. This patch ensures
that the hip::device object  stays alive until after the
callback thread completes.

Change-Id: I75a6ac01f27a0b2250bbd10ed389ebfb322927af
2025-01-25 09:54:15 -05:00
Sourabh Betigeri c460b0541b SWDEV-502219 - Adds validity checks for negative parameters passed
Change-Id: Ib8a531533306a27143d74b81c074de81051eb896
2025-01-24 16:32:29 -05:00
Saleel Kudchadker 9b7e0ad48a SWDEV-510186 - Improve logging of kernel names
- Demangle kernel names in logs

Change-Id: I9aa58e8c109becb45ef7fc747d991bd657c4190a
2025-01-24 11:43:02 -05:00
zichguan-amd 272ef9a7bf SWDEV-509518 - Allow LLVM_ROOT and Clang_ROOT to be used with find_program
Fixes #123. find_program doesn't follow CMP0074 and thus ignores LLVM_ROOT and Clang_ROOT. This change adds LLVM_ROOT and Clang_ROOT to the search path of find_program for llvm-mc and clang in hiprtc to mimics previous add_package behaviour.
Caveat: cmake-specific variables like CMAKE_PREFIX_PATH will take precedence over paths specified with HINTS for find_program, there's no way to change the ordering unless we skip cmake-specific variables all together using NO_CMAKE_PATH and NO_CMAKE_ENVIRONMENT_PATH.

Change-Id: I1fedb60cda09744416e19b3c6e3e0c5c9045f8e7
2025-01-23 11:50:36 -05:00
taosang2 799e54aa0d SWDEV-507969 - Fix wrong VGPRs for some devices
Change-Id: Ia8fc19564272e2c7171d991376bf896a99085a97
2025-01-22 10:11:47 -05:00
Jaydeep Patel 57df1b348f SWDEV-508982 - [6.4 Preview] - Handle hipMemPoolCreate, hipMemPoolDestory & hipDeviceSetMemPool during stream capture.
Change-Id: Ia195442041803896df814798c3d2053c0ba7770c
2025-01-22 05:28:47 -05:00
Jatin Chaudhary bd7d40a4d8 SWDEV-491248 - Fix build_mask
thread_rank() gives thread index in a block. Limit the range to the
current warp size.

Change-Id: Ib5c9831236096485cf99ba7ab0b911a3b10de31c
2025-01-22 04:46:01 -05:00
Jaydeep Patel b4df9fb6ec SWDEV-457316 - Use phy memory obj stored in user data instead of querying from memObjs.
Change-Id: Id837eb00195d88b50904441f01cf8153fa752ecd
2025-01-21 22:05:14 -05:00