Support has been added to query the following
HSA_AMD_INFO_GET_CLOCK_COUNTERS agent info exposed through the hsa api
in rocr, rather than the user having to make a direct IOCTL call
through the kernel driver.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
[ROCm/ROCR-Runtime commit: e97d06530e]
Extend hsa_amd_vmem_address_reserve/hsa_amd_vmem_address_reserve_align
to support HSA_AMD_VMEM_ADDRESS_NO_REGISTER flag. This allocation can be
used to reserve virtual address ranges that can later be used by
hsa_amd_svm_attributes_set for SVM based memory allocations.
[ROCm/ROCR-Runtime commit: b3c48cc68c]
scratch_cache.h includes amd_gpu_agent.h which then again includes
scratch_cache.h, this has now been fixed removing the unecessary
header include.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
[ROCm/ROCR-Runtime commit: 06efa50c09]
scratch_backing_memory_byte_size was originally removed, and then put
back in e130172218. This was because it
was used by rocgdb. rocgdb code has been updated to not use this field.
Bumped _amdgpu_r_debug for the ABI change.
[ROCm/ROCR-Runtime commit: 3c0af843e3]
Cast range->x and range->y to uint64_t before performing multiplication
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
[ROCm/ROCR-Runtime commit: 77b86ca908]
The original version of hsa_amd_portable_export_dmabuf() did not
consider the conditions under which a dmabuf could be shared.
In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller
can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they
want to share the dmabuf over PCIe. In that case, the new code will check
that if it is a PCIe GPU and it is not in a XGMI Hive then if
large-BAR is not supported, we will return an error.
[ROCm/ROCR-Runtime commit: 3a9d14bb66]
Its safer to have the integer literal explicitly be an unsigned long
in this expression as that's what the type of the errorCode variable
resolves to, preventing any overflow errors.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
[ROCm/ROCR-Runtime commit: dce52be686]
Moved the Call to pthread_mutex_lock to an else statement for better
code readibility.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
[ROCm/ROCR-Runtime commit: 1635746a9c]
Because eventDescrp->mutex is a non-recursive lock attempting to
acquire the lock with pthread_mutex_lock can cause the system to hang
indefinitely if the lock was already previously aquired with the
preceeding call to pthread_mutex_trylock.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
[ROCm/ROCR-Runtime commit: a97b7df4b9]
Refactor variable assignments to use std::move() where appropriate.
Updat function headers to accept parameters by const& where appropriate.
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
[ROCm/ROCR-Runtime commit: f6c8cbd293]
Changed variable assignments to use std::move() where appropriate.
Changed function headers to pass string arguments by reference where appropriate.
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
[ROCm/ROCR-Runtime commit: ae6851dbb4]
Changed variable assignments to use std::move() where appropriate.
Revert change in amd_kfd_driver.cpp.
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
[ROCm/ROCR-Runtime commit: a945b5d493]
allocated memory was previously not freed in the event of an error
with rwlock initialization.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
[ROCm/ROCR-Runtime commit: 293092f32f]
For L2 Cache and above, we report the total amount of cache for the
whole partition, so we add up the L2 Cache entry for each partition.
[ROCm/ROCR-Runtime commit: fc561ff37a]
ROCr initially had a bug where memory allocations that were not 4K
aligned were internally 4K aligned but ROCr would not keep track
of user-requested size. This would cause some pointer_info queries
to fail, but HIP was already aligning the buffer sizes for IPC
requests. For backward compatibility accross 2 minor versions,
we allowed IPC look-ups to be both aligned and un-aligned.
Removing this check as this 4 minor versions have been released
since then.
[ROCm/ROCR-Runtime commit: d52f1d0453]
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.
The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).
[ROCm/ROCR-Runtime commit: b7361c5ee4]
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.
The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).
[ROCm/ROCR-Runtime commit: 3e99bb6150]
On large BAR systems, for small-sized code-objects, we get performance
using direct memcpy due to latencies when doing the blit-copy.
[ROCm/ROCR-Runtime commit: da2607024b]
Invalidate only the address range that covers the newly copied
code-object. This avoids invalidating I$ for old code objects and thus
might increase I$ hit rate.
[ROCm/ROCR-Runtime commit: e969e01f54]
When compiling with -O0, some compilers generate a xchg instruction for
the __atomic_store(...) built-in. Using xchg on MMIO memory is
undefined-behavior and may be ignored on certain CPUs.
[ROCm/ROCR-Runtime commit: f011a9506d]
For native and DTIF backends, unify to use HSAKMT_CALL(...) to call
hsaKmt APIs.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 7ba77fb193]
Using HSA_ENABLE_DTIF to control dtif/native thunk code path
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 166b0fa45a]