Граф коммитов

12771 Коммитов

Автор SHA1 Сообщение Дата
Sang, Tao 45b75013ec SWDEV-516050 - Fix monitor hang in OCL (#75)
Fix monitor hang in cts integer_ops.
Improve notify().
Won't affect notifyAll() and Hip in direct
dispatch mode.

Change-Id: I95a458358e1cab9c76aefde117db09cdbd1fd3af

[ROCm/clr commit: 78f92901d8]
2025-04-23 14:34:53 -04:00
Xie, Jiabao(Jimbo) c7737558a4 SWDEV-441487 - add gfx1150/1 support to amd-staging clr (#182)
Co-authored-by: Jimbo Xie <jiabaxie@amd.com>

[ROCm/clr commit: 9a8c9e70b2]
2025-04-23 20:43:03 +05:30
GunaShekar, Ajay 34d3f7022b SWDEV-523281 - CHANGELOG.md and negative test return values : hipLaunchKernelEx, hipLaunchKernelExC, hipDrvLaunchKernelEx (#155)
[ROCm/clr commit: 64d6f5714a]
2025-04-22 21:47:37 +05:30
Andryeyev, German f8344154a0 SWDEV-497841 - Enable memory manager by default (#149)
[ROCm/clr commit: a5c860f3b0]
2025-04-22 21:20:37 +05:30
Andryeyev, German 0569c7713c SWDEV-523300 - Add the new option to build HIP (#179)
Add the new cmake option AMD_COMPUTE_WIN  to build HIP on Windows
from the public github. AMD_COMPUTE_WIN should point to a special
repo with the PAL static libs

[ROCm/clr commit: a3effa16f1]
2025-04-22 21:05:04 +05:30
Hernandez, Gerardo ba5a9a5395 SWDEV-420237 - Fix reduce sync operations when masks are divergent (#181)
Do not use __ockl_activelane_u32() to calculate the index of the lane within the mask, as that would not work with divergent masks that have other bits on before the associated lane.

[ROCm/clr commit: 1a8d766836]
2025-04-22 19:47:58 +05:30
Godavarthy Surya, Anusha 41c4bea0f5 SWDEV-508538 - Optimize mem access and pack structure (#71)
Change-Id: Ib05b8891a6d228fc3266918a000d332fddc7438b

[ROCm/clr commit: bf28bbd9ab]
2025-04-21 13:43:25 +05:30
Brzak, Branislav 5df4734f71 SWDEV-526612 - Add missing copyright notices (#201)
[ROCm/clr commit: 99142c3dd9]
2025-04-18 20:54:27 +05:30
Ramirez, Lucas 0a45aa85c5 SWDEV-524612 - Consider "1" a truthy value for WGPMode (#187)
The compiler currently serializes the workgroup_processor_mode COMGR metadata boolean field as "0"/"1" instead of "false"/"true". Consider "1" a truthy value during parsing.

[ROCm/clr commit: d020598a0f]
2025-04-17 11:50:07 +02:00
Brzak, Branislav 6ae4c7278e SWDEV-525423 - In COMGR Loader don't open file if image is already mapped (#193)
[ROCm/clr commit: d00b2a0953]
2025-04-16 11:00:54 +02:00
Arandjelovic, Marko 963de868ae SWDEV-523137 - function ptrs should match across all devices (#171)
[ROCm/clr commit: 5fe080fd67]
2025-04-16 10:35:48 +02:00
Andryeyev, German a9df586812 SWDEV-459758 - Pass workgroup size explicitly (#185)
It's easier for compiler to move explicit kernel arguments into user SGPRs

[ROCm/clr commit: 3fd7650fe3]
2025-04-15 15:22:15 -04:00
Chaudhary, Jatin Jaikishan 7257b705ce SWDEV-512924 - add fp4 API (#52)
* Remove C-style include guard

* clean up issues in the PR


[ROCm/clr commit: 5d638d831c]
2025-04-15 17:53:50 +01:00
Xie, Pengda 9c1d61fbb0 SWDEV-518317 - Remove Redundant Error Message in removeFatBinary (#164)
[ROCm/clr commit: e92ea151b2]
2025-04-15 09:00:39 -07:00
Andryeyev, German db3c5f87ea SWDEV-526836 - Switch PAL backend to CmdReleaseThenAcquire() (#175)
[ROCm/clr commit: f6c804edc0]
2025-04-15 11:49:53 -04:00
Chaudhary, Jatin Jaikishan c72604a2af SWDEV-509213 - make cmake_minimum_required consistent across clr (#51)
Change-Id: Ib0b1df7af8984a37d6bf7ca68ec99597d5978821

[ROCm/clr commit: fcaefe97b8]
2025-04-15 15:23:41 +05:30
Chaudhary, Jatin Jaikishan 7c3dcb707e SWDEV-520627 - include warp functions header for warpSize (#177)
Change-Id: Id3fff8f2722d521071ef0ff71b09fc365ef6fa82

[ROCm/clr commit: 588cf0fc69]
2025-04-15 14:40:27 +05:30
Chaudhary, Jatin Jaikishan e9e207d7b0 SWDEV-517941 - use device bitcode before spirv (#95)
Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.

* use cache for already compiled code objects

* address review comments and use the two spirv isa names

[ROCm/clr commit: 07e57a1f0d]
2025-04-14 23:40:52 +01:00
Six, Lancelot b389f97d73 SWDEV-517078 - Update 2nd level trap handlers (#148)
* SWDEV-517078 - Maintain the trap handler ABI version in CLR

The trap handler ABI version is communicated to the debugger using
the r_version field in the r_debug structure.  This structure is
an external dependency, which makes it complicated to keep the trap
handler source (in CRL) and the ABI version number (external dependency)
in sync.

This patch proposes to patch the trap handler ABI version number in
_amdgpu_r_debug before communicating it to the debugger.

We can't directly include sc's executable.hpp file in CRL as it relies
on conflicting definition of ELF related types, so instead we need to
rely on a-priori knowledge on the r_debug structure.  Fortunately, this
structure is part of a stable ABI, so its layout is guaranteed to be
kept stable.

Update the 2nd level trap handler to follow updates from the
ROCr-runtime.  The trap handlers are stripped from parts dedicated to
architectures unsupported by CLR.

Bump the r_debug.r_version to track the ABI changes in the trap handler.

[ROCm/clr commit: 7b72c1b786]
2025-04-11 18:59:54 +01:00
Milanov, Aleksandar c83df8a653 SWDEV-526208 - Fix miscalculation of coalesced tiled partition mask (#162)
[ROCm/clr commit: c4fa3ef927]
2025-04-11 19:40:26 +02:00
Hernandez, Gerardo 9264e97cbb SWDEV-521920 - Fix compilation issues introduced by the reduce sync operations - 2 (#167)
Fix pytorch 2.5 issues, by defining reduce sync operations for type __half in amd_hip_fp16.h and not in
amd_warp_sync_functions.h which is problematic in case __half does not get included before that header.
Only define types not supported by cuda if HIP_ENABLE_EXTRA_WARP_SYNC_TYPES is defined, to avoid portability issues

[ROCm/clr commit: 66496258b4]
2025-04-11 17:00:59 +05:30
Yao, Longlong 6015dda120 SWDEV-518966 - Avoid creating Arena Memobj for VMM pointer (#39)
Change-Id: I69c6c0a1464d01e674ac929de34ab10047012f1a

Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>

[ROCm/clr commit: 0de73eeaf8]
2025-04-11 16:55:53 +05:30
Haehnle, Nicolai b3d1ac7232 Report null stream creation failure (#152)
Explicitly nulling the pointer causes us to report the error below
instead of keeping a dangling pointer around that will most likely lead
to a subsequent segfault.

[ROCm/clr commit: 199b0f1086]
2025-04-10 11:40:05 -07:00
Jiang, Julia 2a39c6a782 SWDEV-525231 - Update changelog for 6.5 feature implementations (#150)
[ROCm/clr commit: b44f5f9992]
2025-04-10 14:17:32 -04:00
Sang, Tao 929209b988 SWDEV-521083 - Fix atomicMin/Max issues (#151)
Fix atomicMin/Max(), atomicMin/Max_system() issue on
float types.

[ROCm/clr commit: 6d10577761]
2025-04-10 12:30:55 -04:00
Chaudhary, Jatin Jaikishan 665e88008b SWDEV-461087 - fp4/fp6/fp8 ocp headers (#41)
This now has host conversions too, which is directly from Christopher's
work on fcbx.

Signed-off-by: Christopher M. Riedl

* add const to func parameter

* do not depend on builtins, use gfx950 detection

[ROCm/clr commit: 628777b73d]
2025-04-10 17:22:15 +01:00
Andryeyev, German c50f85df20 SWDEV-517481 - Add more restrictions to the queue management (#168)
[ROCm/clr commit: 4c363df3bf]
2025-04-10 21:51:45 +05:30
Xie, Jiabao(Jimbo) 88841d1dee SWDEV-524188 - Check for VRam and system RAM properly (#122)
Currently, we check if there's enough system RAM even if we don't allocate on host device. This is incorrect logic.
We should not check for this size on windows because PAL checks for memory allocation. See SWDEV-467263.

Co-authored-by: Jimbo Xie <jiabaxie@amd.com>

[ROCm/clr commit: 0d6e554d92]
2025-04-10 21:50:48 +05:30
Chaudhary, Jatin Jaikishan 07c7d3e860 SWDEV-520627 - include sync_warp header instead of warp function header (#18)
Change-Id: Ic3f54b0f5bfee8565a8bbb6218fb0ccdb900c9ea

[ROCm/clr commit: 5c030840d6]
2025-04-10 21:50:25 +05:30
Andryeyev, German 90e3d2619a SWDEV-525725 - Enable resource cache for SVM (#156)
- Make sure reserved_va_ updated before svmPtr overwrite

[ROCm/clr commit: 94cd9bc4f7]
2025-04-10 10:54:28 -04:00
Patel, Jaydeepkumar c2ed585737 SWDEV-521262 - Adding MSVC compiler options to fix the conflict with SC module while building hip in debug. (#24)
[ROCm/clr commit: 997519fa94]
2025-04-10 15:25:58 +05:30
Patel, Jaydeepkumar dea880f9da SWDEV-508973 - If total # of threads/block is more than HW capacity, it's invalid config issue and should return invalid config error. (#25)
[ROCm/clr commit: 8531cd3bbe]
2025-04-10 15:16:16 +05:30
Stojiljkovic, Vladana 81a566e397 SWDEV-505795 - Return the same ptr from hipIpcOpenMemHandle if it is called multiple times (#93)
* SWDEV-505795 - Return the same ptr from hipIpcOpenMemHandle if it is called multiple times

* Move initialization outside of if statement

[ROCm/clr commit: e91cb4f320]
2025-04-10 11:20:36 +02:00
Chaudhary, Jatin Jaikishan b0d9ea7454 SWDEV-525969 - add gfx950 to use ocp type for fp8 (#157)
* do not use __gfx94plus_clr__ macro in fp8 header


[ROCm/clr commit: 5214d1ca07]
2025-04-09 16:21:39 +01:00
Stojiljkovic, Vladana ef52738425 Set maxTexture2DLinear fileds in deviceProp (#89)
[ROCm/clr commit: bc474ea5af]
2025-04-09 17:13:49 +02:00
Sang, Tao 60a1e6dbc1 SWDEV-523824 - Fix data validation issue of rocFFT (#154)
Fix data validation issue of rocFFT when dynamic queue on.
ReleaseHwQueue() can be called only when no command in HostQueue.
The checking condition need be protected by lock.

[ROCm/clr commit: 18d191fd1d]
2025-04-08 20:30:06 -04:00
Brzak, Branislav d4275741ba SWDEV-525653 - Make hipGetDeviceProperties and hipChooseDevice use the new API (#159)
[ROCm/clr commit: b006380ff6]
2025-04-08 18:54:05 +02:00
Patel, Jaydeepkumar 2f3bc7f01c SWDEV-521011 - Allow max stack size as per ISA. (#73)
[ROCm/clr commit: 9e7248aa36]
2025-04-08 10:15:38 +05:30
Andryeyev, German 4c9cc6ba30 SWDEV-497841 - Add VmHeapArray support (#76)
Add VmHeapArray class to reduce the pressure on VA reservation, since
multiple memory pools can be active at the same time.

[ROCm/clr commit: e974f7fde1]
2025-04-03 21:04:18 +05:30
Andryeyev, German 3ceab5ba02 SWDEV-524849 - Fix HIP error returned during capture (#141)
Always use the latest dependent nodes during hipEventRecord capture

[ROCm/clr commit: 3514f45544]
2025-04-03 20:08:25 +05:30
Betigeri, Sourabh 487ede31a9 SWDEV-523281 - [clr] Implementation of hipLaunchKernelExC and hipDrvLaunchKernelEx API with support for cooperative launch (#92)
[ROCm/clr commit: 8c6b90996e]
2025-04-03 20:10:05 +09:00
Arandjelovic, Marko 1c83314659 SWDEV-517867 - Remove invalid assert (#55)
* Remove invalid assert

* Retrigger CI

* Rebase

[ROCm/clr commit: 8fcaa1ca93]
2025-04-03 11:14:32 +02:00
Patel, Jaydeepkumar b217d3a4e6 SWDEV-508632 - Align address to 2 MBs for hidden heap allocation. (#29)
[ROCm/clr commit: b5c9cbc236]
2025-04-02 16:33:29 +05:30
Mallya, Ameya Keshava 29be7230eb fixed syntax to mainline
[ROCm/clr commit: 98f1db181c]
2025-04-01 09:51:41 -07:00
Mallya, Ameya Keshava f117699bef !verify functionality
[ROCm/clr commit: ae1d0ef8a1]
2025-03-31 13:14:08 -07:00
Mallya, Ameya Keshava 594c7e6704 Adding KWS check for amd-mainline
[ROCm/clr commit: 24184e151c]
2025-03-28 08:05:47 -07:00
MartinezFernandez, Juan 966157cd5b Remove PCH code: the code related to PCH is dead and not used (#66)
cherry-pick of compute/ec/clr/+/1184122

Co-authored-by: Juan Manuel Martinez Caamaño <juamarti@amd.com>

[ROCm/clr commit: f580632174]
2025-03-28 10:36:19 +01:00
Sang, Tao d49a2a51d6 SWDEV-508863 - Support generic target in compressed fatbin (#44)
[ROCm/clr commit: 8d90b44a1b]
2025-03-27 20:13:51 +05:30
GunaShekar, Ajay aaba454bfc SWDEV-523853 - Use RecordRenderOps instead of RecordRenderOp (#97)
[ROCm/clr commit: 686dd56a4e]
2025-03-26 09:28:40 +05:30
Belton-Schure, Aidan e27e3eb66a SWDEV-515426 - Use RAII classes for comgr (#28)
Change-Id: I9f6005542cc88f1e16e22741dcc0ce904fdaa2b0

[ROCm/clr commit: ded41058a0]
2025-03-25 20:10:44 +05:30