Graphe des révisions

12359 Révisions

Auteur SHA1 Message Date
kjayapra-amd 30609c2e65 SWDEV-465509 - Save the handle type during import function.
Change-Id: If069abb6cd474a7b071617757041402b53575414


[ROCm/clr commit: 8ddb023512]
2024-08-16 10:55:36 -04:00
Ioannis Assiouras 1110a2f345 SWDEV-470372 - Un-deprecate hipHostAlloc, comply with cuda and introduce hipHostAlloc flags
Change-Id: I8165342825dfe07b6e9edc492d0166d0a03be62d


[ROCm/clr commit: 1e4c60f286]
2024-08-15 18:25:22 -04:00
Ioannis Assiouras 3cc948278e SWDEV-449052 - Fix hipMemcpyParam2D when source or destination pitch is set to zero
When source or destination pitch is set to zero in hip_Memcpy2D struct
it should default to WidthInBytes + [src/dst]XInBytes

Change-Id: Id57b53cab40ba72ced231258da9356554c4868c3


[ROCm/clr commit: 7a1e818c82]
2024-08-14 04:46:41 -04:00
amd-jmacaran 5dac731f29 SWDEV-458516 - External CI: Align with branch naming convention.
Change-Id: Ie1d874742b804f02ceda68064fa54f5d59c092b7


[ROCm/clr commit: cd4ed0916b]
2024-08-13 11:47:11 -04:00
Satyanvesh Dittakavi 6907974f90 SWDEV-473942 - SWDEV-431367 - Correct atomicMax(_system) and atomicMin(_system)
- Fixes -0.0 and +0.0 comparison. For atomicMax if the value on
address is -0.0 and on val is +0.0, gfx90a's unsafe atomics will swap
them. This behavior should be consistent with cas loop as well.

- _system variants of atomicMax and atomicMin are resulting in
incorrect output. Updated these to use the similar implementation as
atomicMax and atomicMin.

Change-Id: I20df36ee29ae0434a6b564f2ba71193fe41cfa59


[ROCm/clr commit: d69cc35750]
2024-08-13 10:38:50 -04:00
Satyanvesh Dittakavi 6bd51db0b1 SWDEV-475185 - Handle device id for hipStreamLegacy
Change-Id: Ib56e6edb77a923f3f9738df64cb9d9ef0b4ba564


[ROCm/clr commit: aa6d07518f]
2024-08-12 09:59:17 -04:00
Ajay 3d22e51806 SWDEV-471863 - APU: device allocation greater than invisible memory
Change-Id: I37f1769873ac7dcbb3cfa51fd815ee1e2123aeae


[ROCm/clr commit: ec0971dd08]
2024-08-09 14:29:18 -04:00
Rahul Manocha 436271e407 SWDEV-468039 - FP8 host only conversion support on mi200
Change-Id: I0891f42d1b7c0d94d099fe26df5db3eff64ba564


[ROCm/clr commit: 39bbc0341d]
2024-08-07 20:51:00 -04:00
Jaydeep Patel c1f83df84c SWDEV-474937 - Fix race condition between main and work thread on windows.
Change-Id: I4d6b9de41d0e5a39094eb3babe47dffde72e0587


[ROCm/clr commit: 912de7ab44]
2024-08-07 14:29:14 -04:00
Alex Xie a381538161 SWDEV-444098 remove "rocm-ocl-icd" package
This is the first step to remove rocm-ocl-icd.
We don't build amd icd after this commit.
We still need to remove header files usage in future steps.

Change-Id: Ic4ac5476180f9ef2ce87b62891c08b28d6c9bfd2


[ROCm/clr commit: 5f775b8b7f]
2024-08-07 11:29:41 -04:00
Jaydeep Patel 12eea11370 SWDEV-457316 - Release graph exec before stream gets deleted.
Releasing graph exec after wait completes and before delete hip::stream obj
during stream destroy.

Change-Id: I1d68aa8d844f7d3af330c6d09c44af07f8553551


[ROCm/clr commit: 8e80429b87]
2024-08-06 00:39:37 -04:00
Jaydeep Patel 82474ca1db SWDEV-465220 - Validate stream on which Kernel is planned to be launched.
Change-Id: I34c679bd888c275584c11ad3e8346d4d542976f9


[ROCm/clr commit: b0047d690a]
2024-08-06 00:31:22 -04:00
Jaydeep Patel 2aafd5a30c SWDEV-457316 - Multiple graph exec can be for given stream.
Change-Id: I0f1b184eb63e0432119d62f094637d375a3d4e55


[ROCm/clr commit: d954eb64db]
2024-08-06 00:31:04 -04:00
Jaydeep Patel c51153f759 SWDEV-470886 - Add maybe_undef attribute for shfl device function due to not all lanes of wave define var and compiler needs to know about this.
Change-Id: I3a683887e033305ac55362f356838b491a6d50f2


[ROCm/clr commit: 6344ddb2f3]
2024-08-05 00:53:13 -04:00
German Andryeyev 35c7a87014 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e


[ROCm/clr commit: 9db52f9a46]
2024-08-02 14:19:44 -04:00
Anusha GodavarthySurya 31927fefd6 SWDEV-468424 - Add support to capture multiple AQL Packets
=> Added support to capture multiple AQL Packets.
=> Added Interface to callback to hip runtime from rocclr to allocate
kernel args from the graph kernel arg pool.
=> Enabled Support to capture memset node.

Change-Id: I7e1c2ba06927459e024653058af142bd82192c43


[ROCm/clr commit: bd3a35bde1]
2024-08-01 23:55:51 -04:00
Julia Jiang 6fa0563ed6 SWDEV-436608 - Un-deprecate hipHostAlloc()
Change-Id: I71393e6f536ba84b4e172acf54ba4f72350e2ae8


[ROCm/clr commit: e988e5e448]
2024-08-01 11:45:20 -04:00
Marko Arandjelovic 7cd2515908 SWDEV-465204 - Fix hipModuleLaunchKernel data validation
Change-Id: I129f265a5eb79d0a13da4f12e78e06ba307b17ee


[ROCm/clr commit: be5f097e8e]
2024-08-01 05:09:23 -04:00
Vladana Stojiljkovic 91800f18ea SWDEV-465142 - Copy memAllocNodePtrs_ when cloning graph
Change-Id: I5a0907e59397e71b44db59c44b551b74a6e59ba0


[ROCm/clr commit: d62c1dea72]
2024-08-01 11:05:06 +02:00
Vladana Stojiljkovic 6fcb2c655f SWDEV-475127 - Check if hipBindTextureToArray parameters are null before dereferencing them
Change-Id: Id0173faff0a385d1665194c9033083ef9b2c48b5


[ROCm/clr commit: d7b07b94a0]
2024-08-01 05:01:55 -04:00
Ioannis Assiouras 42e8d3c894 SWDEV-476460 - Fix for a race condition in SysmemPool::Alloc
Change-Id: Ia94709e68b236c9460589963c0f09ec1f481c306


[ROCm/clr commit: 8e137e8702]
2024-08-01 04:22:26 -04:00
Marko Arandjelovic 92ddb8e242 SWDEV-475114 - Prevent segfaults in hipBindTexture
Change-Id: I050f36a5c74a5d4542155040ccce043fee6b73ad


[ROCm/clr commit: b3153a5f41]
2024-07-31 16:57:57 -04:00
Marko Arandjelovic 662ba4701c SWDEV-461791 make memcpy synchronous for D2D if src&dst ptrs have SYNC_MEMOPS attribute
Change-Id: I603081d21e5eb3c73111845e350d8fa2ba5a7733


[ROCm/clr commit: 7d0ff387e9]
2024-07-30 11:46:55 -04:00
Ioannis Assiouras 19d16561a4 SWDEV-472309 - Ensure static maps are destroyed after __hipUnregisterFatBinary
hipDeviceSynchronize called from __hipUnregisterFatBinary
accesses static maps and monitors. This change ensures these ojects
are not destroyed before __hipUnregisterFatBinary  is called.
Additionally it disables the teardown process for static build.

Change-Id: I46b58641d60efcf6637a8e99cdd786ffe9e2c77d


[ROCm/clr commit: 9b33db9b24]
2024-07-30 10:26:59 -04:00
Sourabh Betigeri f518648183 SWDEV-365151 - Fix the fns32 to do 32 bit computation and adds a wrapper to ease porting from CUDA
Change-Id: I0b5a9ca11c98f8c1c40cfba7f4e057bfda2d756e


[ROCm/clr commit: 7298b80112]
2024-07-26 11:20:14 -04:00
Saleel Kudchadker abfe135e4f SWDEV-475341 - Fix stream resolution for graphs launches
This issue was happening because of incorrect usage of getStream call,
if we get the null stream first and then typecast it, and call on
getStream again, we lose the advantage of simply passing "nullptr" to
indicate NULL stream. Thus we enter the waitActiveStream call and add
barriers to sync across streams.

Change-Id: I94dc4e3ec927295b9e1ab6dee4b37d7d3e00b0cc


[ROCm/clr commit: cda4b7db1c]
2024-07-25 19:38:23 -04:00
Sourabh Betigeri 5b0cc86295 SWDEV-475394 - Fix for the return type to be in-line with CUDA
Change-Id: I7c833571d47b4e86a86e4a0095b61947d16ecab6


[ROCm/clr commit: 4fbd7abbb2]
2024-07-25 16:33:33 -04:00
Saleel Kudchadker 16920809d7 SWDEV-301667 - Refactor Blit force env var
Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a


[ROCm/clr commit: d379f4efd0]
2024-07-25 15:15:10 -04:00
Rahul Manocha de67a2a1dc SWDEV-468039 - FP8 OCP headers
Change-Id: Iecd32c5a0357781da07395d32f894415954b7b22


[ROCm/clr commit: 353f15afa6]
2024-07-25 12:42:23 -04:00
German Andryeyev fffd8d8190 SWDEV-470612 - Avoid copying a vector on the stack
Change-Id: Ia5fc7d1f77d2519dedeedb2c82c26efebb03d1d3


[ROCm/clr commit: c6b8d69158]
2024-07-25 10:09:19 -04:00
German Andryeyev 9d1d3a6493 SWDEV-470612 - Avoid processing internal signals
If only external signals were provided, then just process it
without adding internal signals

Change-Id: Iaefd65d0f8b0a64b9f6a864a9bd73de20a29dfa4


[ROCm/clr commit: 18187cd8fe]
2024-07-25 10:08:16 -04:00
taosang2 47dcfbae6b SWDEV-458943 - make new AMD_MONITOR on
make DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR be true

Change-Id: I1d21378ff462478d3238d71e4e2a1a7d6b9167ac


[ROCm/clr commit: f8598dabb0]
2024-07-24 14:29:27 -04:00
Maneesh Gupta c17cffba63 SWDEV-459583 - Fix codeowners file
Change-Id: Ib03328a7fb13375fa44626a40202b1eeb177b8b5


[ROCm/clr commit: 66943288a5]
2024-07-24 08:20:37 +00:00
German Andryeyev cd3b700351 SWDEV-469602 - Focre unaligned memory mode with RDP
Change-Id: I770f3dc8dde49d8e4ecdf5c38819e44df3960bce


[ROCm/clr commit: 1bac09ea20]
2024-07-23 18:31:52 -04:00
cadolphe a82e0fe333 SWDEV-462404 - Fix num_mip_levels for 1D Buffer
Updating field num_mip_levels to better align with OpenCL specification that mip-mapped images can not be created for CL_MEM_OBJECT_IMAGE1D_BUFFER images. Added check for miplevels value used for ClCreateImage call.

Change-Id: I82a25b83ef0637a877409572b7976d9e4413dfac


[ROCm/clr commit: 21a1c9075a]
2024-07-23 11:16:38 -04:00
Jatin Chaudhary 0193d66679 SWDEV-466747 - add shfl functions in bfloat16
Change-Id: Ide7d7e1d449783cced8867abf43ff45f5bce113a


[ROCm/clr commit: e43176bde9]
2024-07-23 10:51:02 -04:00
taosang2 f33637b1c6 SWDEV-474091 – Fix sporadic crash in streamcallback test
Also in the scope of SWDEV-467540.
Fix sporadic crash in Unit_hipStreamAddCallback_MultipleThreads by
deferring release() of block_command.
The test will invoke 1000 threads on the same stream thus there
is a chance to free block_command too early in original code.
By deferring release() of block_command we can make sure block_command
is always valid during calling block_command->notifyCmdQueue().

Change-Id: I31555ee18e6958e34b89f04181867fa4e932a38c


[ROCm/clr commit: e3ef19e22a]
2024-07-23 10:24:10 -04:00
kjayapra-amd 30a4b9e316 SWDEV-460948 - Remove dflock, since kernel arguments are part of command now.
Change-Id: I6b5a229307b41bd24ffa0bc172c64ad1154df474


[ROCm/clr commit: 9c03f85f46]
2024-07-22 16:02:01 -04:00
Anusha GodavarthySurya da132a2e28 SWDEV-468424 - Fix kernelArg mgr release and clear commands after capture
Creation of ReferenceCountedObject will increase reference count by 1.
Clear the commands from Node after capture so that they wont be reference later.

Change-Id: I1cc4085939cf65218ec2aa2e25ab6d737f7cacd3


[ROCm/clr commit: 6ae5d6896c]
2024-07-22 05:16:12 +00:00
Anusha GodavarthySurya 7985a72073 SWDEV-468424 - hipgraph capture memset node
Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch.

Added support to capture single graph memset node.
Capture support for memset node is currently disabled.
Memset capture will be enabled when capture for multiple packets are supported..

Change-Id: I14dfbc41731025cc3a548a730558915def3fa384


[ROCm/clr commit: 346da4bb40]
2024-07-19 23:52:50 -04:00
German Andryeyev 7363b984c1 SWDEV-470585 - Disable double copy in HIP
- HIP path doesn't support resource tracking. Thus, double copy can't be enabled,
because it requires resource tracking.

Change-Id: I0f9c4e185b5b2d2b1abde041fca21bb099db9ccd


[ROCm/clr commit: 4c763e45a1]
2024-07-19 18:32:34 -04:00
kjayapra-amd cf28e2b27a SWDEV-439234 - Implement Set/Get Access APIs in PAL/Windows.
Change-Id: I997c330880da70c5128b187e1ef4d9c449218880


[ROCm/clr commit: 11817b4405]
2024-07-19 10:42:41 -04:00
Jatin Chaudhary 6240b203dc SWDEV-466747 - optimize conversions for bfloat16 operations
Since we made the members public, we can optimize some operations which
do not require redundant conversions to half_raw types.

Change-Id: I31555ef18e695d8e24b89f0418187fa4e932a38a


[ROCm/clr commit: 6a655a77e7]
2024-07-17 18:37:25 -04:00
Maneesh Gupta d007896179 SWDEV-472433 - Update year in license
Change-Id: I61a8cf5f361504989a754ed44247c6c02e857a89


[ROCm/clr commit: 375089876a]
2024-07-17 05:14:20 -04:00
Jatin Chaudhary 2418a0aa68 SWDEV-467414 - add sharedMemPerBlockOptin = sharedMemPerBlock
On some platforms user can ask for extended shared memory for a
particular kernel in some cases. This feature does not exist on HIP at
the moment. So we are setting it to sharedMemPerBlock which is the
maximum user can expect for their kernels.

Change-Id: I81005cf0d1c9fb941e77d34fb8385241ffe5bdd0


[ROCm/clr commit: 4b95e7bc87]
2024-07-16 11:00:29 -04:00
kjayapra-amd 573dfa21e1 SWDEV-460113 - Remove the ufd print.
Change-Id: If0d64ea4b6662493784c040aa1ceffafc8efa1c3


[ROCm/clr commit: a5664fc93f]
2024-07-16 10:39:16 -04:00
kjayapra-amd a064de92b8 SWDEV-464828 - Initial implementation of VMM IPC on PAL/Windows.
Change-Id: I3d5e148fad9105704db6724b00df06bef4fc9d2f


[ROCm/clr commit: e7a7feb273]
2024-07-16 10:38:35 -04:00
Satyanvesh Dittakavi da5bff9464 SWDEV-471935 - Destroy hsa queues with cumask set
Fixes the memory leak with hipExtStreamCreateWithCUMask API.
hsa queues with cumask set are not being reused and created
everytime the API is called, But these queues were not being
destroyed during hipStreamDestroy causing memory leak.

Change-Id: Ibfbe019bbd73604e98eca80461efe53fa64bb701


[ROCm/clr commit: 191869b252]
2024-07-16 10:02:42 -04:00
Anusha GodavarthySurya b6d82323e9 SWDEV-468424 - Refactor kernel arg
For refactoring of childGraph to have its own graphExec,
kernelArgs needs to be separated from the graphExec object.
All the childNodes part of graph should share same kernelArg pool.
Otherwise we endup creating multiple device kernel arg memory chucks
for single graphExec.

Change-Id: I4029a46ebc1fa112d87df64ab1fecbf288fabe5e


[ROCm/clr commit: 35079e834e]
2024-07-16 08:38:44 -04:00
Marko Arandjelovic 2e7581a69a SWDEV-441296 - Allign hipTexObjectCreate error handling to CUDA
Change-Id: I9ff01c22f14344e0e82e473104d6930e9fa5ff77


[ROCm/clr commit: 7d3c0c5e10]
2024-07-15 15:51:41 -04:00