Runtime can't assign internal HSA signals for HIP events, because
HIP application can destroy the HIP stream or signal reuse may
occur internally. Switch to global HSA signals for HIP events.
Change-Id: Ieaea2d6b039e492b2e7c5112782a8f4e601e50a1
If AMD event contains a reference to a HW event, then runtime
could check/wait for HW event. CPU status update will occur later
after HSA signal callback, but it's not important for the result.
Change-Id: I591391a953bbdba6a25ac07e2cd98aeb17cd4596
Switch HSA_AMD_SVM_ATTRIB_READ_ONLY to
HSA_AMD_SVM_ATTRIB_READ_MOSTLY to match Cuda. The new attribute
was just exposed in ROCr/KFD.
Change-Id: I2ee522d33c347ba52a4e272d2cd7f67960490cf7
Use HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE flag for the initial
allocation instead of HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE.
Change-Id: Ia52fe205563df1ea916dc2dc81e749e11c16f83d
hipIpcOpenMemHandle should return the device pointer which is
similar to the base ptr of the original allocation even if the offset
to the original pointer is passed to hipIpcGetMemHandle
Change-Id: I99c0553e8c67c15b5fed880b6a4c74bce39c3aee
In adition to removing the HSAIL logic from the ROCm backend, guard all
of the HSAIL includes in the common layer behind the WITH_COMPILER_LIB
define. This is to avoid including HSAIL headers when building with
no support for it.
In common logic replace the use of the aclType enum with the new
Program::file_type_t enum. This is essentially a local copy of the HSAIL
enum to avoid including any HSAIL headers.
Change-Id: Ica0651d1b29dfccc255cc584eb82a5cb35e1b520
HIP requires to return AccessedBy query for all device, but ROCr
can process one per query. Hence send the queries for all
available devices and then accumulate the results in runtime.
Change-Id: I082f9adb8e31c775a8ad1bf7a5af37440ef4bd16
This change unifies the hostcall implementation for all the backends,
by pushing the common logic to the device layer. This is done by
replacing the use of hsa_signal_t with device::Signal (a light wrapper
around it).
Change-Id: I7b6fca7930b5a0b199da5d85e2e048354cc04e7b
- Avoid GPU wait on the marker submission and update the command
batch after HSA signal callback upon HSA barrier completion.
Change-Id: I5c1c97212aefc2ae4b99aa9e2a81627ee9a38c1c
Remove targetId_, gfxipMajor_, gfxipMinor_ and gfxipStepping_ from
device::Info as they are now available in device::Isa.
Change-Id: I381b1d4798ebf50655740e004a01ac7f86dbf668
- Correct GSL path to report targets using the TargetID syntax.
- Correct GSL path to check compatibility of code objects when
loading.
- Add concept of an device isa and create a registery used by ROCm,
PAL and GSL.
- Support XNACK and SRAMECC target features consistently for PAL and ROCm.
- Correct logic for NullDevices and asserts to avoid memory coruption.
- Allow all NullDevices to be created for HIP.
- Numerous other code improvements.
Change-Id: I40abf3d2b22249c1492d1af5919665f8184f4e0e
Use strncpy instead of strcpy to ensure the arrays will not be
overflowered. Only copy one less than size of char array to leave a
NUL character at the end even if the copy is truncated provided the
original object is zeroed memory.
Change-Id: I00f7679630cf28dcb9a51cb0aba2810a4f4c72b9
roc:Settings and pal::Settings are derivations. Allocate them as their
derived class then assign that to the base class member to avoid the
need for a static_cast.
Use device::settings to access the Settings consistently.
Change-Id: I0f85157962fbf6fed176da0caf83b723bcbe1452
- Add assertions to enforce that objects are of the correct kind and
have been allocated.
- Make destructors check if objects have been allocated before
deleting.
- Operations that require a non-NullDevice return failure if given a
NullDevice.
- Use static_cast rather than reinterpret_cast when cohersing from a
base class to a derived class.
Change-Id: I02ee0ea9d7982fd7ca29d49c9b02cfae111b7127
- Correct spelling mistakes or working in comments.
- Adding missing line separators.
- Add missing comments for namespace closing brace.
Change-Id: If09cdd38aa088b0f68f750dfdef81351eb8c4935
HMM with xnack enabled should automatically update page tables,
but currently it doesn't perform that. For now, runtime will
force page table update on all devices unconditionally.
Change-Id: Idfa6e1c145e6c114856214dce042b8a8349e5c58
Implement the global class for signals tracking per device queue.
Switch to the new tracking mechanism.
Change-Id: I3c4dda04b34e6d18d6a95510d84102909633b415
When OCL ROCr backend performs CL_MEM_COPY_HOST_PTR it may attempt
to have access to amd::Memory object it's currently creating,
but it's not ready yet. The logic creates a temporary dummy object
to perform a copy transfer. The new change will make sure runtime
skips allocation of the same device::Memory object second time.
Change-Id: I14c6a00a3941fdcaa6aea299e9f096e4c3f5cadf
ROCr is now reporting the actual HW addressing limits for HIP, so OpenCL will have to impose lower limit.
Change-Id: I60c2ce27ed1d1f45f16fb76438965a236ba872c6