Граф коммитов

13497 Коммитов

Автор SHA1 Сообщение Дата
Kudchadker, Saleel 072fb0804e SWDEV-521647 - Fix tracking of hw_event (#206)
- When a command may possibly have two packets(like device heap
  initializer), and if there is no signal on the main kernel packet the
tracking was broken as it marked HW event of the command as the first
packet signal.
- Make sure if no completion signal is attached to the second packet
  then clear the HW event for the command.
2025-04-25 08:46:44 -07:00
Kudchadker, Saleel ce24936970 SWDEV-510186 - Improve logging (#220)
- Print all arguments for logs, this is useful for debug
2025-04-25 08:40:31 -07:00
Li, Todd tiantuo 95cdc83eaf SWDEV-511055 - fix HIP PAL memory allocation workaround for APU (#40) 2025-04-24 15:07:16 -07:00
Sang, Tao 1113eff3f9 SWDEV-493275 - Support scratch limit (#20)
Support programmatic query and change of scratch limit on
AMD devices.

Change-Id: Id5da355a77366f97868e462847f3916e87fd2af6
2025-04-24 17:15:25 -04:00
Critchley, Paul 4f2a4b12a9 SWDEV-527731 - [Ubertrace] OpenCL driver reports wrong Instrumentation API Version (#211) 2025-04-24 14:06:17 -07:00
Godavarthy Surya, Anusha e5ce544c45 SWDEV-469423 - hipStreamEndCapture graph* can be nullptr (#170) 2025-04-24 13:57:09 +05:30
Hila, Nino 38d48c9a7d Add palamida.yml (#215) 2025-04-23 13:15:09 -07:00
Sang, Tao 27aad09bd4 SWDEV-518831 - fix streams' sync issue in mthreads (#123)
* SWDEV-518831 - fix streams' sync issue in mthreads

1. Fix sync issue of null stream and non-null streams in
multithreads.
2. Remove assert(GetSubmissionBatch() == nullptr) as it
is invalid in multithreads.
3. Update getActiveQueues() to deal with the state of 
being terminated.
2025-04-23 15:08:07 -04:00
Sang, Tao 78f92901d8 SWDEV-516050 - Fix monitor hang in OCL (#75)
Fix monitor hang in cts integer_ops.
Improve notify().
Won't affect notifyAll() and Hip in direct
dispatch mode.

Change-Id: I95a458358e1cab9c76aefde117db09cdbd1fd3af
2025-04-23 14:34:53 -04:00
Xie, Jiabao(Jimbo) 9a8c9e70b2 SWDEV-441487 - add gfx1150/1 support to amd-staging clr (#182)
Co-authored-by: Jimbo Xie <jiabaxie@amd.com>
2025-04-23 20:43:03 +05:30
GunaShekar, Ajay 64d6f5714a SWDEV-523281 - CHANGELOG.md and negative test return values : hipLaunchKernelEx, hipLaunchKernelExC, hipDrvLaunchKernelEx (#155) 2025-04-22 21:47:37 +05:30
Andryeyev, German a5c860f3b0 SWDEV-497841 - Enable memory manager by default (#149) 2025-04-22 21:20:37 +05:30
Andryeyev, German a3effa16f1 SWDEV-523300 - Add the new option to build HIP (#179)
Add the new cmake option AMD_COMPUTE_WIN  to build HIP on Windows
from the public github. AMD_COMPUTE_WIN should point to a special
repo with the PAL static libs
2025-04-22 21:05:04 +05:30
Hernandez, Gerardo 1a8d766836 SWDEV-420237 - Fix reduce sync operations when masks are divergent (#181)
Do not use __ockl_activelane_u32() to calculate the index of the lane within the mask, as that would not work with divergent masks that have other bits on before the associated lane.
2025-04-22 19:47:58 +05:30
Godavarthy Surya, Anusha bf28bbd9ab SWDEV-508538 - Optimize mem access and pack structure (#71)
Change-Id: Ib05b8891a6d228fc3266918a000d332fddc7438b
2025-04-21 13:43:25 +05:30
Brzak, Branislav 99142c3dd9 SWDEV-526612 - Add missing copyright notices (#201) 2025-04-18 20:54:27 +05:30
Ramirez, Lucas d020598a0f SWDEV-524612 - Consider "1" a truthy value for WGPMode (#187)
The compiler currently serializes the workgroup_processor_mode COMGR metadata boolean field as "0"/"1" instead of "false"/"true". Consider "1" a truthy value during parsing.
2025-04-17 11:50:07 +02:00
Brzak, Branislav d00b2a0953 SWDEV-525423 - In COMGR Loader don't open file if image is already mapped (#193) 2025-04-16 11:00:54 +02:00
Arandjelovic, Marko 5fe080fd67 SWDEV-523137 - function ptrs should match across all devices (#171) 2025-04-16 10:35:48 +02:00
Andryeyev, German 3fd7650fe3 SWDEV-459758 - Pass workgroup size explicitly (#185)
It's easier for compiler to move explicit kernel arguments into user SGPRs
2025-04-15 15:22:15 -04:00
Chaudhary, Jatin Jaikishan 5d638d831c SWDEV-512924 - add fp4 API (#52)
* Remove C-style include guard

* clean up issues in the PR
2025-04-15 17:53:50 +01:00
Xie, Pengda e92ea151b2 SWDEV-518317 - Remove Redundant Error Message in removeFatBinary (#164) 2025-04-15 09:00:39 -07:00
Andryeyev, German f6c804edc0 SWDEV-526836 - Switch PAL backend to CmdReleaseThenAcquire() (#175) 2025-04-15 11:49:53 -04:00
Chaudhary, Jatin Jaikishan fcaefe97b8 SWDEV-509213 - make cmake_minimum_required consistent across clr (#51)
Change-Id: Ib0b1df7af8984a37d6bf7ca68ec99597d5978821
2025-04-15 15:23:41 +05:30
Chaudhary, Jatin Jaikishan 588cf0fc69 SWDEV-520627 - include warp functions header for warpSize (#177)
Change-Id: Id3fff8f2722d521071ef0ff71b09fc365ef6fa82
2025-04-15 14:40:27 +05:30
Chaudhary, Jatin Jaikishan 07e57a1f0d SWDEV-517941 - use device bitcode before spirv (#95)
Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.

* use cache for already compiled code objects

* address review comments and use the two spirv isa names
2025-04-14 23:40:52 +01:00
Six, Lancelot 7b72c1b786 SWDEV-517078 - Update 2nd level trap handlers (#148)
* SWDEV-517078 - Maintain the trap handler ABI version in CLR

The trap handler ABI version is communicated to the debugger using
the r_version field in the r_debug structure.  This structure is
an external dependency, which makes it complicated to keep the trap
handler source (in CRL) and the ABI version number (external dependency)
in sync.

This patch proposes to patch the trap handler ABI version number in
_amdgpu_r_debug before communicating it to the debugger.

We can't directly include sc's executable.hpp file in CRL as it relies
on conflicting definition of ELF related types, so instead we need to
rely on a-priori knowledge on the r_debug structure.  Fortunately, this
structure is part of a stable ABI, so its layout is guaranteed to be
kept stable.

Update the 2nd level trap handler to follow updates from the
ROCr-runtime.  The trap handlers are stripped from parts dedicated to
architectures unsupported by CLR.

Bump the r_debug.r_version to track the ABI changes in the trap handler.
2025-04-11 18:59:54 +01:00
Milanov, Aleksandar c4fa3ef927 SWDEV-526208 - Fix miscalculation of coalesced tiled partition mask (#162) 2025-04-11 19:40:26 +02:00
Hernandez, Gerardo 66496258b4 SWDEV-521920 - Fix compilation issues introduced by the reduce sync operations - 2 (#167)
Fix pytorch 2.5 issues, by defining reduce sync operations for type __half in amd_hip_fp16.h and not in
amd_warp_sync_functions.h which is problematic in case __half does not get included before that header.
Only define types not supported by cuda if HIP_ENABLE_EXTRA_WARP_SYNC_TYPES is defined, to avoid portability issues
2025-04-11 17:00:59 +05:30
Yao, Longlong 0de73eeaf8 SWDEV-518966 - Avoid creating Arena Memobj for VMM pointer (#39)
Change-Id: I69c6c0a1464d01e674ac929de34ab10047012f1a

Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-04-11 16:55:53 +05:30
Haehnle, Nicolai 199b0f1086 Report null stream creation failure (#152)
Explicitly nulling the pointer causes us to report the error below
instead of keeping a dangling pointer around that will most likely lead
to a subsequent segfault.
2025-04-10 11:40:05 -07:00
Jiang, Julia b44f5f9992 SWDEV-525231 - Update changelog for 6.5 feature implementations (#150) 2025-04-10 14:17:32 -04:00
Sang, Tao 6d10577761 SWDEV-521083 - Fix atomicMin/Max issues (#151)
Fix atomicMin/Max(), atomicMin/Max_system() issue on
float types.
2025-04-10 12:30:55 -04:00
Chaudhary, Jatin Jaikishan 628777b73d SWDEV-461087 - fp4/fp6/fp8 ocp headers (#41)
This now has host conversions too, which is directly from Christopher's
work on fcbx.

Signed-off-by: Christopher M. Riedl

* add const to func parameter

* do not depend on builtins, use gfx950 detection
2025-04-10 17:22:15 +01:00
Andryeyev, German 4c363df3bf SWDEV-517481 - Add more restrictions to the queue management (#168) 2025-04-10 21:51:45 +05:30
Xie, Jiabao(Jimbo) 0d6e554d92 SWDEV-524188 - Check for VRam and system RAM properly (#122)
Currently, we check if there's enough system RAM even if we don't allocate on host device. This is incorrect logic.
We should not check for this size on windows because PAL checks for memory allocation. See SWDEV-467263.

Co-authored-by: Jimbo Xie <jiabaxie@amd.com>
2025-04-10 21:50:48 +05:30
Chaudhary, Jatin Jaikishan 5c030840d6 SWDEV-520627 - include sync_warp header instead of warp function header (#18)
Change-Id: Ic3f54b0f5bfee8565a8bbb6218fb0ccdb900c9ea
2025-04-10 21:50:25 +05:30
Andryeyev, German 94cd9bc4f7 SWDEV-525725 - Enable resource cache for SVM (#156)
- Make sure reserved_va_ updated before svmPtr overwrite
2025-04-10 10:54:28 -04:00
Patel, Jaydeepkumar 997519fa94 SWDEV-521262 - Adding MSVC compiler options to fix the conflict with SC module while building hip in debug. (#24) 2025-04-10 15:25:58 +05:30
Patel, Jaydeepkumar 8531cd3bbe SWDEV-508973 - If total # of threads/block is more than HW capacity, it's invalid config issue and should return invalid config error. (#25) 2025-04-10 15:16:16 +05:30
Stojiljkovic, Vladana e91cb4f320 SWDEV-505795 - Return the same ptr from hipIpcOpenMemHandle if it is called multiple times (#93)
* SWDEV-505795 - Return the same ptr from hipIpcOpenMemHandle if it is called multiple times

* Move initialization outside of if statement
2025-04-10 11:20:36 +02:00
Chaudhary, Jatin Jaikishan 5214d1ca07 SWDEV-525969 - add gfx950 to use ocp type for fp8 (#157)
* do not use __gfx94plus_clr__ macro in fp8 header
2025-04-09 16:21:39 +01:00
Stojiljkovic, Vladana bc474ea5af Set maxTexture2DLinear fileds in deviceProp (#89) 2025-04-09 17:13:49 +02:00
Sang, Tao 18d191fd1d SWDEV-523824 - Fix data validation issue of rocFFT (#154)
Fix data validation issue of rocFFT when dynamic queue on.
ReleaseHwQueue() can be called only when no command in HostQueue.
The checking condition need be protected by lock.
2025-04-08 20:30:06 -04:00
Brzak, Branislav b006380ff6 SWDEV-525653 - Make hipGetDeviceProperties and hipChooseDevice use the new API (#159) 2025-04-08 18:54:05 +02:00
Patel, Jaydeepkumar 9e7248aa36 SWDEV-521011 - Allow max stack size as per ISA. (#73) 2025-04-08 10:15:38 +05:30
Andryeyev, German e974f7fde1 SWDEV-497841 - Add VmHeapArray support (#76)
Add VmHeapArray class to reduce the pressure on VA reservation, since
multiple memory pools can be active at the same time.
2025-04-03 21:04:18 +05:30
Andryeyev, German 3514f45544 SWDEV-524849 - Fix HIP error returned during capture (#141)
Always use the latest dependent nodes during hipEventRecord capture
2025-04-03 20:08:25 +05:30
Betigeri, Sourabh 8c6b90996e SWDEV-523281 - [clr] Implementation of hipLaunchKernelExC and hipDrvLaunchKernelEx API with support for cooperative launch (#92) 2025-04-03 20:10:05 +09:00
Arandjelovic, Marko 8fcaa1ca93 SWDEV-517867 - Remove invalid assert (#55)
* Remove invalid assert

* Retrigger CI

* Rebase
2025-04-03 11:14:32 +02:00