Commit Graph

13076 Commits

Author SHA1 Message Date
German Andryeyev bb1295bcdf SWDEV-547108 - Fix compilation errors under Windows (#1085)
Also correct AQL print under Windows
2025-09-26 09:42:50 -04:00
Rahul Manocha 2bc561d404 SWDEV-557057 - fix for datatype for hipMemcpy3DBatchAsync (#1114)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-09-25 13:53:23 -07:00
Godavarthy Surya, Anusha fb72d7f851 SWDEV-524746 - Part-II Add multi device support for hip graph. Updated kernel arg manager for each device (#813)
- Updated kernel arg manager to support allocating kernel args on multiple devices for single graph.
- Updated AQL path to capture on the device where graph node is added.

Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>
2025-09-25 20:38:18 +05:30
MachineTom 4a31affb76 Users/taosang/SWDEV-510994 - Refractor atomics header and tests (#902)
* SWDEV-550626 - Refactor atomics header and tests

1. Introduce __HIP_ATOMIC_BACKWARD_COMPAT.
By default we define __HIP_ATOMIC_BACKWARD_COMPAT=1 to
let hip atomic functions maintain old assumptions. if
users want to adopt the new behavior, that is , by default
assume no-fine-grained no-remote-memory, then they can
define __HIP_ATOMIC_BACKWARD_COMPAT=0 and get the new
behaviour.

2. Use  __HIP_ATOMIC_BACKWARD_COMPAT_MEMORY to replace
original __HIP_FINE_GRAINED_MEMORY  in atomic header.
And apply __HIP_FINE_GRAINED_MEMORY onto all 
atomicXXX_system() functions to prevent failure on memory
allocated by hipHostMalloc().

3. Replace HIP_TEST_FINE_GRAINED_MEMORY with
HIP_TEST_ATOMIC_BACKWARD_COMPAT_MEMORY in hip-tests.

4. Fix negative test errors.
    Fix managed memory test error on memory order.
    some other minor changes.
    As a result  all originally disabled tests are enabled.

5. Add more atomics tests in some cases.

6. Reduce test time in each case.
     Reduce iteration number to 1 for tests that cost too much time.

8. Put common codes into hip_test_common.hh
2025-09-25 10:58:59 -04:00
Ioannis Assiouras c53bdb9643 SWDEV-556866 - Added misssing include of rocrctx.hpp in rocurilocator (#1094) 2025-09-24 06:44:02 +01:00
SaleelK 34b9184686 clr: Fix memory corruption for memset nodes (#1068)
* Detect graph capture and use graph kernelarg memory for FillBuffer pattern
2025-09-23 17:17:33 -07:00
Ioannis Assiouras 97bc3af918 SWDEV-550882 - Add support for hipIpcMemLazyEnablePeerAccess (#817) 2025-09-23 00:05:51 +01:00
Ajay GunaShekar 0118184d22 SWDEV-554678 - Navi44 on windows (#936)
* SWDEV-554678 - Navi44 on windows

* SWDEV-554678 - Navi44 in palsettings
2025-09-22 08:52:41 -07:00
Shadi Dashmiz 9b350754cc SWDEV-555084: Fix the python script (#996)
- no need to manually updated the newly generated hip_prof_str.h

Signed-off-by: shadi <shadi.dashmiz@amd.com>
2025-09-22 08:41:19 -04:00
MachineTom 25922d08c3 SWDEV-539145 - Return error when ext_fine_grain_pool unavailable (#877)
Return error when ext_fine_grain_pool is unavailable for
hipHostMallocUncached, hipHostAllocUncached and
hipExtHostRegisterUncached.
Disable related tests on Navi4x where
ext_fine_grain_pool is unavailable
2025-09-21 19:25:28 -04:00
MachineTom c6c2fa212c SWDEV-1 Fix a bug of VGPRs (#1000)
Fix a bug of VGPRs due to a previous patch:
SWDEV-546223 - Get image support info from ISA meta
2025-09-21 19:23:12 -04:00
Todd tiantuo Li 7137c7f3d8 SWDEV-541478 - return hipSuccess for hipTexObjectCreate TypePitch2D with zero width or height (#712) 2025-09-19 20:48:01 -07:00
Stella Laurenzo 2e93b9f6cb [clr] Only enable comgr dynamic loading if it is a shared lib. (#1065)
Prior we were enabling dynamic loading mode if BUILD_SHARED_LIBS, but this is not correct. We should only be loading dynamically if the amd_comgr library itself is shared.

Background: we have a configuration where we use a static linked comgr stub in order to achieve LLVM isolation (it dynamically loads the comgr and compiler into a dedicated link namespace) in an otherwise dynamic linked clr.
2025-09-19 16:10:15 -07:00
Jatin Chaudhary e79eaaa8a5 SWDEV-546287 - Implement hipLibrary load/unload (#975) 2025-09-19 22:23:49 +01:00
JonathanLichtnerAMD f31afe1d20 [HIP CLR] Make hipMemPtrGetInfo consistent with malloc and hipMalloc (#1005)
hipMemPtrGetInfo was returning the error hipErrorInvalidValue if it
was called on a nullptr.  However, this does not match the malloc
convention where a nullptr has size zero;  for example,
malloc_usable_size() returns zero if called on a nullptr.

This commit changes hipMemPtrGetInfo to set the size to zero and
return hipSuccess when called with a nullptr.  (This also fits with
hipMalloc and hipFree usage, since hipMalloc of size zero results in a
nullptr, and hipFree of a nullptr is successful.)
2025-09-19 12:53:41 -06:00
Julia Jiang 1c10592be2 SWDEV-546376 - Fix CTS profiling failure (#976) 2025-09-19 13:38:28 -04:00
German Andryeyev ea89ddd589 SWDEV-547108 - Add dll loader for Windows build (#1004)
The build of ROCR backend will be enabled by default in Windows.
It requires the dll loader until ROCR dll will be always available in Windows for any configuration.
2025-09-19 11:25:30 -04:00
Godavarthy Surya, Anusha 538528d1e5 SWDEV-548417 - Fix Memleaks in Graph (#973)
Command enqueued on the graph internal stream are not released add stream during graphExec release

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-09-19 17:45:01 +05:30
Godavarthy Surya, Anusha ce560304a8 SWDEV-548417 - Fix Memleaks in Graph (#713)
Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>
2025-09-19 17:39:36 +05:30
Jaydeep 9f5b390db4 SWDEV-555484 - getQueueId uses hsa_queue's id which is not necessary to be bound by GPU_MAX_HW_QUEUES and hence accessing array beyond size cause data curruption. (#1040) 2025-09-19 14:31:27 +05:30
Jaydeep 99613f1009 SWDEV-555484 - Invalidate capturing stream only for null/legacy stream. (#1032) 2025-09-19 14:31:17 +05:30
German Andryeyev f3d672d507 SWDEV-552741 - Exclude OCLGetQueueThreadID from ocl tests (#1024)
The tests uses AMD OCL extension to check the queue thread id, but there is no queue thread with DD
2025-09-18 18:28:51 -04:00
SaleelK 149dc17c90 clr: Optimize doorbell ring (#1030)
*Lay foundation to batch packets efficiently for graphs
*Dynamically copy packets with max threshold set with
DEBUG_HIP_GRAPH_BATCH_SIZE, if not stagger packet copy with pow2
*Default threshold for DEBUG_HIP_GRAPH_BATCH_SIZE is 256
*If TS are not collected for a signal for reuse, create a new signal.
This can potentially increase signal footprint if the handler doesn't run
fast enough.
2025-09-18 15:02:10 -07:00
Ioannis Assiouras 5ac163a811 SWDEV-548770 - Added system scope acquire for all packets in gfx12 (#966) 2025-09-18 14:33:17 +01:00
lancesix 45b48fb987 SWDEV-555043 - Do not wait on signal if gpu in error state (#1023)
During a process tear-down we wait on all signals before releasing them:

    VirtualGPU::HwQueueTracker::~HwQueueTracker() {
      for (auto& signal : signal_list_) {
        CpuWaitForSignal(signal);
        signal->release();
      }
      [...]
    }

In the case where we exit the process after a GPU error that did not
cause an abort (ulimit -c == 0), waiting for the signal can be skipped.
With the device on the error state, no progress is made, and the signal
is probably never going to be modified again:

    inline bool WaitForSignal(hsa_signal_t signal, bool active_wait = false, bool yield = false) {
          [...]
          if (HIP_SKIP_ABORT_ON_GPU_ERROR && amd::Device::IsGPUInError()) {
            ClPrint(amd::LOG_ERROR, amd::LOG_SIG,
                    "Device not Stable, while waiting for Signal ="
                    "(0x%lx) for %d ns",
                    signal.handle, kTimeout4Secs);
            return true;
          }
          [...]
    }

However, after calling CpuWaitForSignal, when calling "release", we can
end-up on a signal dtor which also tries to wait on the signal.  Because
the GPU is the error state, we never receive the signal, and hang the
process during tear down.  This happens with the ProfilingSignal dtor:

    ProfilingSignal::~ProfilingSignal() {
      if (signal_.handle != 0) {
        if (hsa_signal_load_relaxed(signal_) > 0) {
          LogError("Runtime shouldn't destroy a signal that is still busy!");
          if (hsa_signal_wait_scacquire(signal_, HSA_SIGNAL_CONDITION_LT, kInitSignalValueOne,
                                        kUnlimitedWait, HSA_WAIT_STATE_BLOCKED) != 0) {
          }
        }
        hsa_signal_destroy(signal_);
      }
    }

This dtor should check that the GPU is not in the error state before
trying to wait, which is what this patch implements.

Bug: SWDEV-555043
Bug: SWDEV-553435
Bug: SWDEV-553679
Bug: SWDEV-555119
2025-09-18 14:32:04 +01:00
Ioannis Assiouras 5c1eebab84 SWDEV-543723 - Change agentInfo parameter in hostAlloc to void* (#995) 2025-09-18 11:43:15 +01:00
Julia Jiang 5db71b8e4c SWDEV-551652 - Adding one change in 7.0 changelog (#960)
Co-authored-by: Istvan Kiss <istvan.kiss@amd.com>
2025-09-17 09:22:26 -07:00
systems-assistant[bot] 0018a4e70c SWDEV-541623 - cuda parity hipLaunchCooperativeKernelMultiDevice and hipExtLaunchMultiKernelMultiDevice (#415)
* SWDEV-541623 - cuda parity hipLaunchCooperativeKernelMultiDevice and hipExtLaunchMultiKernelMultiDevice

numDevices does not match the system devices

* SWDEV-541623 -  enable Unit_hipExtLaunchMultiKernelMultiDevice_Negative_MultiKernelSameDevice

---------

Co-authored-by: agunashe <ajay.gunashekar@amd.com>
2025-09-17 08:33:59 -07:00
SaleelK ec5e9673ad clr: Use current device copy engine for inter-dev copy (#945)
* For inter-device copies always use the SDMA engine of current device
* ROCr uses srcAgent SDMA engine, and it could be a remote device
2025-09-16 12:56:07 -07:00
systems-assistant[bot] d5fc1b3703 SWDEV-548838 Add local and global fence support for barrier function (#437)
* SWDEV-548838 Add local and global fence support for barrier function

The original barrier function didn't distinct between local and global scope. There was only __CLK_LOCAL_MEM_FENCE which triggers both local and global fence. This commit introduces __CLK_LOCAL_MEM_FENCE and __CLK_GLOBAL_MEM_FENCE that properly distinguish the scopes. 

---------

Co-authored-by: Tim <Tim.Gu@Amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Tim Gu <timgu102@amd.com>
2025-09-16 14:20:57 -04:00
AidanBeltonS bf662640ee SWDEV-539805, SWDEV-553860 - Resolve GCC clang ABI mismatch and check vector alignment (#909)
* SWDEV-539805 - Add checks for vector alignment and size

* SWDEV-553860 - Alter alignment for gcc

* SWDEV-553860 - Align fallback method

* SWDEV-553860 - Alter alignment requirement
2025-09-16 17:10:14 +01:00
harkgill-amd d1b2b5ed44 Fix grid_group::group_dim to return grid_dim and not block_dim (#823)
* Fix grid_group::group_dim to return grid_dim and not block_dim

* Add unit test for grid_group.group_dim()

* Fix unit test errors

* Skip group_dim() assertions for base_type test
2025-09-15 09:42:55 -04:00
systems-assistant[bot] c85200fc42 SWDEV-541096 - add hipEventWaitDefault and hipEventWaitExternal flags (#507)
Co-authored-by: Li, Todd tiantuo <Toddtiantuo.Li@amd.com>
2025-09-11 14:50:55 -07:00
Jatin Chaudhary 3742814d82 SWDEV-553757 - add __HIP__ and __clang__ check for __shfl functions (#872) 2025-09-11 21:57:39 +01:00
systems-assistant[bot] 3e1e2408a9 SWDEV-541427 - Fix forked stream joining to parent stream that is not origin stream(BeginCaptureStream) (#449)
Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>
2025-09-11 16:57:33 +05:30
systems-assistant[bot] 0647cf1d28 SWDEV-542700 - Return an error if stream capture is attempted on the null stream while a stream capture is active. (#450)
Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>
2025-09-11 16:57:22 +05:30
Ioannis Assiouras 35629e433d SWDEV-546146 - Added support for hipMemLocationTypeHost in hipMemSetAccess (#682) 2025-09-10 23:06:20 +01:00
Joseph Macaranas dd1a2dbf8a Fix LICENSE path for opencl build (#939) 2025-09-10 17:54:22 -04:00
Julia Jiang 8bc97e3273 SWDEV-551652 - Adding changelog for HIP 7.0.2 (#849) 2025-09-10 09:22:40 -07:00
Joseph Macaranas 696881ae82 LICENSE clean up (#919)
- Clean up and standardization of MIT licenses after discussion with legal team.
- Update README.md with blurb for top-level files.
- MIT License explicitly mentioned for relevant projects.
- Removal of years.
- Copyright attribution should be to `Advanced Micro Devices, Inc.` and not `AMD ROCm(TM) Software`
- Removal of `All rights reserved.`
- Reduce line width of the text for readability.
- Add clear visual separators for additional licenses.
- Convert text files to markdown format for aforementioned separators.
- Update build scripts to point to renamed files.
- Fixed SMI doc references

Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-09-10 12:06:14 -04:00
Godavarthy Surya, Anusha 1be5c9870a SWDEV-524745 - Part-I Add multi device support for hip graph. Update nodes with DevId. (#812)
- The graph nodes have been updated to capture the device ID from the capture stream or the current device when explicitly added.
- Update the device ID for the memcpy node, ensuring that the device where the memory is allocated is taken into account for H2D and D2H pinned operations.

Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>
2025-09-10 11:35:25 +05:30
systems-assistant[bot] 75602772aa SWDEV-538606 - Handle updateStreams from multiple threads (#505) 2025-09-10 11:24:52 +05:30
SaleelK c8e91b3f3e clr: Fix condition for taking shader path (#884)
* SWDEV-551080
* Fix condition for taking shader path, the size check was moved
  incorrectly
* Also account for a bitmask returned for preferred engines
2025-09-09 13:13:29 -07:00
systems-assistant[bot] d341a6263a Put safeguard to avoid defining target more than once
authored-by: Mathieu Taillefumier <mathieu.taillefumier@free.fr>
2025-09-09 13:51:15 +01:00
Satyanvesh Dittakavi 85065dab32 SWDEV-550521 - Add the JIT options for HIPRTC linker APIs (#762)
* SWDEV-550521 - Add the JIT options for HIPRTC linker APIs

* Address review comments about using C++ datatypes
2025-09-09 12:24:08 +05:30
Ioannis Assiouras 4c6fce8ba0 SWDEV-546223 - Remove comgr query for image support from windows path (#861) 2025-09-09 07:54:48 +05:30
SaleelK e197aa83ba SWDEV-543723 - Execute permission for kernArg buf (#728)
- Refactor deviceLocalAlloc arguments
- Refactor hostAlloc code, have cleaner interface
- Kern args buffer need to have execute flag set as CP enforces this on
  certain newer HW.
2025-09-08 12:21:30 -07:00
vstojilj f17e332fe0 Release graph if hipStreamEndCapture fails (#738) 2025-09-08 16:32:03 +02:00
Todd tiantuo Li c8ecf77a94 Update dispatch table to move 7.1 new APIs under HIP_RUNTIME_API_TABLE_STEP_VERSION 14 (#790) 2025-09-05 14:14:43 -07:00
Jimbo 3d9d35a1f8 SWDEV-553375 - Allow hipMemAllocationTypeUncached in hipMemGetAllocationGranularity (#847) 2025-09-05 10:31:20 -04:00