rocm-systems

Author	SHA1	Message	Date
German Andryeyev	bb1295bcdf	SWDEV-547108 - Fix compilation errors under Windows (#1085 ) Also correct AQL print under Windows	2025-09-26 09:42:50 -04:00
Rahul Manocha	2bc561d404	SWDEV-557057 - fix for datatype for hipMemcpy3DBatchAsync (#1114 ) Co-authored-by: Rahul Manocha <rmanocha@amd.com>	2025-09-25 13:53:23 -07:00
Godavarthy Surya, Anusha	fb72d7f851	SWDEV-524746 - Part-II Add multi device support for hip graph. Updated kernel arg manager for each device (#813 ) - Updated kernel arg manager to support allocating kernel args on multiple devices for single graph. - Updated AQL path to capture on the device where graph node is added. Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>	2025-09-25 20:38:18 +05:30
MachineTom	4a31affb76	Users/taosang/SWDEV-510994 - Refractor atomics header and tests (#902 ) * SWDEV-550626 - Refactor atomics header and tests 1. Introduce __HIP_ATOMIC_BACKWARD_COMPAT. By default we define __HIP_ATOMIC_BACKWARD_COMPAT=1 to let hip atomic functions maintain old assumptions. if users want to adopt the new behavior, that is , by default assume no-fine-grained no-remote-memory, then they can define __HIP_ATOMIC_BACKWARD_COMPAT=0 and get the new behaviour. 2. Use __HIP_ATOMIC_BACKWARD_COMPAT_MEMORY to replace original __HIP_FINE_GRAINED_MEMORY in atomic header. And apply __HIP_FINE_GRAINED_MEMORY onto all atomicXXX_system() functions to prevent failure on memory allocated by hipHostMalloc(). 3. Replace HIP_TEST_FINE_GRAINED_MEMORY with HIP_TEST_ATOMIC_BACKWARD_COMPAT_MEMORY in hip-tests. 4. Fix negative test errors. Fix managed memory test error on memory order. some other minor changes. As a result all originally disabled tests are enabled. 5. Add more atomics tests in some cases. 6. Reduce test time in each case. Reduce iteration number to 1 for tests that cost too much time. 8. Put common codes into hip_test_common.hh	2025-09-25 10:58:59 -04:00
Ioannis Assiouras	c53bdb9643	SWDEV-556866 - Added misssing include of rocrctx.hpp in rocurilocator (#1094 )	2025-09-24 06:44:02 +01:00
SaleelK	34b9184686	clr: Fix memory corruption for memset nodes (#1068 ) * Detect graph capture and use graph kernelarg memory for FillBuffer pattern	2025-09-23 17:17:33 -07:00
Ioannis Assiouras	97bc3af918	SWDEV-550882 - Add support for hipIpcMemLazyEnablePeerAccess (#817 )	2025-09-23 00:05:51 +01:00
Ajay GunaShekar	0118184d22	SWDEV-554678 - Navi44 on windows (#936 ) * SWDEV-554678 - Navi44 on windows * SWDEV-554678 - Navi44 in palsettings	2025-09-22 08:52:41 -07:00
Shadi Dashmiz	9b350754cc	SWDEV-555084: Fix the python script (#996 ) - no need to manually updated the newly generated hip_prof_str.h Signed-off-by: shadi <shadi.dashmiz@amd.com>	2025-09-22 08:41:19 -04:00
MachineTom	25922d08c3	SWDEV-539145 - Return error when ext_fine_grain_pool unavailable (#877 ) Return error when ext_fine_grain_pool is unavailable for hipHostMallocUncached, hipHostAllocUncached and hipExtHostRegisterUncached. Disable related tests on Navi4x where ext_fine_grain_pool is unavailable	2025-09-21 19:25:28 -04:00
MachineTom	c6c2fa212c	SWDEV-1 Fix a bug of VGPRs (#1000 ) Fix a bug of VGPRs due to a previous patch: SWDEV-546223 - Get image support info from ISA meta	2025-09-21 19:23:12 -04:00
Todd tiantuo Li	7137c7f3d8	SWDEV-541478 - return hipSuccess for hipTexObjectCreate TypePitch2D with zero width or height (#712 )	2025-09-19 20:48:01 -07:00
Stella Laurenzo	2e93b9f6cb	[clr] Only enable comgr dynamic loading if it is a shared lib. (#1065 ) Prior we were enabling dynamic loading mode if BUILD_SHARED_LIBS, but this is not correct. We should only be loading dynamically if the amd_comgr library itself is shared. Background: we have a configuration where we use a static linked comgr stub in order to achieve LLVM isolation (it dynamically loads the comgr and compiler into a dedicated link namespace) in an otherwise dynamic linked clr.	2025-09-19 16:10:15 -07:00
Jatin Chaudhary	e79eaaa8a5	SWDEV-546287 - Implement hipLibrary load/unload (#975 )	2025-09-19 22:23:49 +01:00
JonathanLichtnerAMD	f31afe1d20	[HIP CLR] Make hipMemPtrGetInfo consistent with malloc and hipMalloc (#1005 ) hipMemPtrGetInfo was returning the error hipErrorInvalidValue if it was called on a nullptr. However, this does not match the malloc convention where a nullptr has size zero; for example, malloc_usable_size() returns zero if called on a nullptr. This commit changes hipMemPtrGetInfo to set the size to zero and return hipSuccess when called with a nullptr. (This also fits with hipMalloc and hipFree usage, since hipMalloc of size zero results in a nullptr, and hipFree of a nullptr is successful.)	2025-09-19 12:53:41 -06:00
Julia Jiang	1c10592be2	SWDEV-546376 - Fix CTS profiling failure (#976 )	2025-09-19 13:38:28 -04:00
German Andryeyev	ea89ddd589	SWDEV-547108 - Add dll loader for Windows build (#1004 ) The build of ROCR backend will be enabled by default in Windows. It requires the dll loader until ROCR dll will be always available in Windows for any configuration.	2025-09-19 11:25:30 -04:00
Godavarthy Surya, Anusha	538528d1e5	SWDEV-548417 - Fix Memleaks in Graph (#973 ) Command enqueued on the graph internal stream are not released add stream during graphExec release Co-authored-by: Rahul Manocha <rmanocha@amd.com>	2025-09-19 17:45:01 +05:30
Godavarthy Surya, Anusha	ce560304a8	SWDEV-548417 - Fix Memleaks in Graph (#713 ) Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>	2025-09-19 17:39:36 +05:30
Jaydeep	9f5b390db4	SWDEV-555484 - getQueueId uses hsa_queue's id which is not necessary to be bound by GPU_MAX_HW_QUEUES and hence accessing array beyond size cause data curruption. (#1040 )	2025-09-19 14:31:27 +05:30
Jaydeep	99613f1009	SWDEV-555484 - Invalidate capturing stream only for null/legacy stream. (#1032 )	2025-09-19 14:31:17 +05:30
German Andryeyev	f3d672d507	SWDEV-552741 - Exclude OCLGetQueueThreadID from ocl tests (#1024 ) The tests uses AMD OCL extension to check the queue thread id, but there is no queue thread with DD	2025-09-18 18:28:51 -04:00
SaleelK	149dc17c90	clr: Optimize doorbell ring (#1030 ) Lay foundation to batch packets efficiently for graphs Dynamically copy packets with max threshold set with DEBUG_HIP_GRAPH_BATCH_SIZE, if not stagger packet copy with pow2 Default threshold for DEBUG_HIP_GRAPH_BATCH_SIZE is 256 If TS are not collected for a signal for reuse, create a new signal. This can potentially increase signal footprint if the handler doesn't run fast enough.	2025-09-18 15:02:10 -07:00
Ioannis Assiouras	5ac163a811	SWDEV-548770 - Added system scope acquire for all packets in gfx12 (#966 )	2025-09-18 14:33:17 +01:00
lancesix	45b48fb987	SWDEV-555043 - Do not wait on signal if gpu in error state (#1023 ) During a process tear-down we wait on all signals before releasing them: VirtualGPU::HwQueueTracker::~HwQueueTracker() { for (auto& signal : signal_list_) { CpuWaitForSignal(signal); signal->release(); } [...] } In the case where we exit the process after a GPU error that did not cause an abort (ulimit -c == 0), waiting for the signal can be skipped. With the device on the error state, no progress is made, and the signal is probably never going to be modified again: inline bool WaitForSignal(hsa_signal_t signal, bool active_wait = false, bool yield = false) { [...] if (HIP_SKIP_ABORT_ON_GPU_ERROR && amd::Device::IsGPUInError()) { ClPrint(amd::LOG_ERROR, amd::LOG_SIG, "Device not Stable, while waiting for Signal =" "(0x%lx) for %d ns", signal.handle, kTimeout4Secs); return true; } [...] } However, after calling CpuWaitForSignal, when calling "release", we can end-up on a signal dtor which also tries to wait on the signal. Because the GPU is the error state, we never receive the signal, and hang the process during tear down. This happens with the ProfilingSignal dtor: ProfilingSignal::~ProfilingSignal() { if (signal_.handle != 0) { if (hsa_signal_load_relaxed(signal_) > 0) { LogError("Runtime shouldn't destroy a signal that is still busy!"); if (hsa_signal_wait_scacquire(signal_, HSA_SIGNAL_CONDITION_LT, kInitSignalValueOne, kUnlimitedWait, HSA_WAIT_STATE_BLOCKED) != 0) { } } hsa_signal_destroy(signal_); } } This dtor should check that the GPU is not in the error state before trying to wait, which is what this patch implements. Bug: SWDEV-555043 Bug: SWDEV-553435 Bug: SWDEV-553679 Bug: SWDEV-555119	2025-09-18 14:32:04 +01:00
Ioannis Assiouras	5c1eebab84	SWDEV-543723 - Change agentInfo parameter in hostAlloc to void* (#995 )	2025-09-18 11:43:15 +01:00
Julia Jiang	5db71b8e4c	SWDEV-551652 - Adding one change in 7.0 changelog (#960 ) Co-authored-by: Istvan Kiss <istvan.kiss@amd.com>	2025-09-17 09:22:26 -07:00
systems-assistant[bot]	0018a4e70c	SWDEV-541623 - cuda parity hipLaunchCooperativeKernelMultiDevice and hipExtLaunchMultiKernelMultiDevice (#415 ) * SWDEV-541623 - cuda parity hipLaunchCooperativeKernelMultiDevice and hipExtLaunchMultiKernelMultiDevice numDevices does not match the system devices * SWDEV-541623 - enable Unit_hipExtLaunchMultiKernelMultiDevice_Negative_MultiKernelSameDevice --------- Co-authored-by: agunashe <ajay.gunashekar@amd.com>	2025-09-17 08:33:59 -07:00
SaleelK	ec5e9673ad	clr: Use current device copy engine for inter-dev copy (#945 ) * For inter-device copies always use the SDMA engine of current device * ROCr uses srcAgent SDMA engine, and it could be a remote device	2025-09-16 12:56:07 -07:00
systems-assistant[bot]	d5fc1b3703	SWDEV-548838 Add local and global fence support for barrier function (#437 ) * SWDEV-548838 Add local and global fence support for barrier function The original barrier function didn't distinct between local and global scope. There was only __CLK_LOCAL_MEM_FENCE which triggers both local and global fence. This commit introduces __CLK_LOCAL_MEM_FENCE and __CLK_GLOBAL_MEM_FENCE that properly distinguish the scopes. --------- Co-authored-by: Tim <Tim.Gu@Amd.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com> Co-authored-by: Tim Gu <timgu102@amd.com>	2025-09-16 14:20:57 -04:00
AidanBeltonS	bf662640ee	SWDEV-539805, SWDEV-553860 - Resolve GCC clang ABI mismatch and check vector alignment (#909 ) * SWDEV-539805 - Add checks for vector alignment and size * SWDEV-553860 - Alter alignment for gcc * SWDEV-553860 - Align fallback method * SWDEV-553860 - Alter alignment requirement	2025-09-16 17:10:14 +01:00
harkgill-amd	d1b2b5ed44	Fix grid_group::group_dim to return grid_dim and not block_dim (#823 ) * Fix grid_group::group_dim to return grid_dim and not block_dim * Add unit test for grid_group.group_dim() * Fix unit test errors * Skip group_dim() assertions for base_type test	2025-09-15 09:42:55 -04:00
systems-assistant[bot]	c85200fc42	SWDEV-541096 - add hipEventWaitDefault and hipEventWaitExternal flags (#507 ) Co-authored-by: Li, Todd tiantuo <Toddtiantuo.Li@amd.com>	2025-09-11 14:50:55 -07:00
Jatin Chaudhary	3742814d82	SWDEV-553757 - add __HIP__ and __clang__ check for __shfl functions (#872 )	2025-09-11 21:57:39 +01:00
systems-assistant[bot]	3e1e2408a9	SWDEV-541427 - Fix forked stream joining to parent stream that is not origin stream(BeginCaptureStream) (#449 ) Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com> Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>	2025-09-11 16:57:33 +05:30
systems-assistant[bot]	0647cf1d28	SWDEV-542700 - Return an error if stream capture is attempted on the null stream while a stream capture is active. (#450 ) Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com> Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>	2025-09-11 16:57:22 +05:30
Ioannis Assiouras	35629e433d	SWDEV-546146 - Added support for hipMemLocationTypeHost in hipMemSetAccess (#682 )	2025-09-10 23:06:20 +01:00
Joseph Macaranas	dd1a2dbf8a	Fix LICENSE path for opencl build (#939 )	2025-09-10 17:54:22 -04:00
Julia Jiang	8bc97e3273	SWDEV-551652 - Adding changelog for HIP 7.0.2 (#849 )	2025-09-10 09:22:40 -07:00
Joseph Macaranas	696881ae82	LICENSE clean up (#919 ) - Clean up and standardization of MIT licenses after discussion with legal team. - Update README.md with blurb for top-level files. - MIT License explicitly mentioned for relevant projects. - Removal of years. - Copyright attribution should be to `Advanced Micro Devices, Inc.` and not `AMD ROCm(TM) Software` - Removal of `All rights reserved.` - Reduce line width of the text for readability. - Add clear visual separators for additional licenses. - Convert text files to markdown format for aforementioned separators. - Update build scripts to point to renamed files. - Fixed SMI doc references Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>	2025-09-10 12:06:14 -04:00
Godavarthy Surya, Anusha	1be5c9870a	SWDEV-524745 - Part-I Add multi device support for hip graph. Update nodes with DevId. (#812 ) - The graph nodes have been updated to capture the device ID from the capture stream or the current device when explicitly added. - Update the device ID for the memcpy node, ensuring that the device where the memory is allocated is taken into account for H2D and D2H pinned operations. Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>	2025-09-10 11:35:25 +05:30
systems-assistant[bot]	75602772aa	SWDEV-538606 - Handle updateStreams from multiple threads (#505 )	2025-09-10 11:24:52 +05:30
SaleelK	c8e91b3f3e	clr: Fix condition for taking shader path (#884 ) * SWDEV-551080 * Fix condition for taking shader path, the size check was moved incorrectly * Also account for a bitmask returned for preferred engines	2025-09-09 13:13:29 -07:00
systems-assistant[bot]	d341a6263a	Put safeguard to avoid defining target more than once authored-by: Mathieu Taillefumier <mathieu.taillefumier@free.fr>	2025-09-09 13:51:15 +01:00
Satyanvesh Dittakavi	85065dab32	SWDEV-550521 - Add the JIT options for HIPRTC linker APIs (#762 ) * SWDEV-550521 - Add the JIT options for HIPRTC linker APIs * Address review comments about using C++ datatypes	2025-09-09 12:24:08 +05:30
Ioannis Assiouras	4c6fce8ba0	SWDEV-546223 - Remove comgr query for image support from windows path (#861 )	2025-09-09 07:54:48 +05:30
SaleelK	e197aa83ba	SWDEV-543723 - Execute permission for kernArg buf (#728 ) - Refactor deviceLocalAlloc arguments - Refactor hostAlloc code, have cleaner interface - Kern args buffer need to have execute flag set as CP enforces this on certain newer HW.	2025-09-08 12:21:30 -07:00
vstojilj	f17e332fe0	Release graph if hipStreamEndCapture fails (#738 )	2025-09-08 16:32:03 +02:00
Todd tiantuo Li	c8ecf77a94	Update dispatch table to move 7.1 new APIs under HIP_RUNTIME_API_TABLE_STEP_VERSION 14 (#790 )	2025-09-05 14:14:43 -07:00
Jimbo	3d9d35a1f8	SWDEV-553375 - Allow hipMemAllocationTypeUncached in hipMemGetAllocationGranularity (#847 )	2025-09-05 10:31:20 -04:00

1 2 3 4 5 ...

13076 Commits