Integrate the supported device ID list distributed in topology, queue, and
pmc into one place: topology.
Change-Id: If035cf8e4a6fc6caff6c94ec627647cfb11c3d79
[ROCm/ROCR-Runtime commit: 4827b09119]
Though S_IWOTH flag is set in the open() call, the lock file is not
created as accessable by others if others try to open the file with O_RDWR
permission. It's because the default umask masks off S_IWOTH. This patch
changes the umask to S_IXOTH since others don't need that permission but
it'll open up S_IWOTH. Restore the umask to original after the file is
opened.
Change-Id: I8a239e1566ce0b0b18821913385f239db7c3588e
[ROCm/ROCR-Runtime commit: 1a8a9cb57b]
StartTrace and StopTrace send ioctl requests to enable/disable performance
counters. QueryTrace reads the counter from the perf_event fd.
Change-Id: Ibf79675bc23fcf129371bfd100f8e262121bc684
[ROCm/ROCR-Runtime commit: e17c67f049]
Unless HSA_USERPTR_FOR_PAGED_MEM is explicitly set, don't use userptr
for all paged memory. This will also allow us to work around some 4.9
issues, and then we can explicitly set HSA_USERPTR_FOR_PAGED_MEM for
all usage once those issues are resolved.
Change-Id: I25ce22b73ae6e93f1567f2318d9d2b47d4a44e69
[ROCm/ROCR-Runtime commit: c991951288]
The control stack memory for CWSR is allocate in kernel together with MQD
allocation.
Change-Id: Ib1c0ab9402df3431e9555649394320380d6c6dd8
Signed-off-by: shaoyun.liu <shaoyun.liu@amd.com>
[ROCm/ROCR-Runtime commit: 116e5c5e8b]
On SOC15 chips, the ABI for the create_queue ioctl is changed to
allow doorbell allocation independent of the queue ID. This is
necessary to accommodate doorbell routing to specific engines in
the BIF.
Change-Id: Ie98d0a758758149dd5fc09ae088afccc29904124
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 7de66d149b]
On gfx900 we need 64-bit for all doorbells and SDMA WPTRs.
Change-Id: I9b922e16442e967599ae3c928308451d5cc470b3
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: d7063dd102]
Use KFD_IOC_ALLOC_MEM_FLAGS_COHERENT when allocating fine-grained
memory and doorbell BOs so that they will be mapped with MTYPE_UC
on GFX9 hardware.
Change-Id: I51adf45b13105f479e6bcdaf54955b467920ee9a
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
[ROCm/ROCR-Runtime commit: 8cb89b6926]
This is in preparation for gfx900, which uses 64-bit doorbells. We
maintain the same number of doorbells per process by making the
doorbell page size bigger.
KFD will need to implement the same rule.
Change-Id: I3c4110869b191b83943b5a390a48edfc94d941d8
[ROCm/ROCR-Runtime commit: 48207af92a]
Existing code uses lockf to ensure exclusive PMC access of one process and
one TraceId. However Thunk spec allows hsaKmtPmcAcquireTraceAccess to get
exclusive access to the defined set of counters, not exclusive to one
process or one TraceId. Multiple counter sets of multiple TraceIds is
allowed if they meet the concurrent access limit evaluated by the hardware
/driver.
Change-Id: I59cacb855a707fe326a4070452fcbbd3c95ac223
[ROCm/ROCR-Runtime commit: 1025579c0b]
Existing code assumes all counters sent to hsaKmtPmcRegisterTrace belong
to one PMC block and this block is SQ. This patch considers cases when
counters are in different blocks, and removes the hard-coded SQ. As a
matter of fact, SQ is non-privileged so the user even shouldn't use SQ
counters to register/release trace. This patch also ignores
non-privileged blocks as what HSA Thunk spec describes.
This patch also records counters information in trace structure so
AcquireTrace can get counters information using that TraceId.
Change-Id: Ifa5741050553d4615baab01f7485a9e09435b019
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: cb60c5f18a]
Implement two new API for cross memory read and write operation.
- hsaKmtProcessVMRead
- hsaKmtProcessVMWrite
Add new ioclts necessary for the above APIs.
Change-Id: I0c153e3b4e1f32b7a8b102ad5c774d9ae9bfc2fa
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: e79521b556]
events.c and queues.c were accidently changed to 755 by change
fc70f0c30976f4021f7d763bfc10d76a76029553. Change them back.
Change-Id: If51c0b91139afc23e9051cf94c83d61fc20297e6
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
[ROCm/ROCR-Runtime commit: b3c3f7bae1]
When a new enough microcode build is running use a vendor AQL packet
to submit the PM4 IB.
Change-Id: Icd3e2b322c418477420ba4a29f4455ce340ef0d2
[ROCm/ROCR-Runtime commit: 4d62b9482a]
This avoids unnecessary evictions and failed restores due to the
munmap of userptr BOs that are just about to be freed.
Change-Id: Icf2f0b73991455556a201c54c05ea7e20af80f47
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[ROCm/ROCR-Runtime commit: 74ebfca9f0]
Should point directly to amd_queue_t.write_dispatch_id. Only noticeable
with HWS enabled which is not yet stable.
Change-Id: I169906d45225379a3ca2729ff04d298fdbb9a9fb
[ROCm/ROCR-Runtime commit: 28f51d5808]
Add IOMMUv2 to blocks returned by hsaKmtPmcGetCounterProperties(). IOMMU
information is read from sysfs.
Change-Id: I3a1c6f902f947913570a78700fc0ffc444e1dd72
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 9dadac6dc9]
Thunk follows Linux kernel coding convention to use tabs instead of
spaces.
Change-Id: I4eddcfa9a0513f16c869d9cc63f9f1dae0c39f83
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: d4dbf562a9]
Add gfx803 10/11 device IDs that were recently added to KFD.
Change-Id: Id40b117ae47bacedefa6e333fdfdf58dea92cd2d
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: 72b842a6dc]
- Build system fixes
- No user-mode high-precision timer by default, use clock_gettime
- Use C11 aligned_alloc pending C++17 std::aligned_alloc
Change-Id: I268365bdfd11d1e817a89584b9e086ee5b86e1dc
[ROCm/ROCR-Runtime commit: 9e575ea96a]
Gfx9 requires monotonic write pointer and doorbell.
Cound fields are 1-based compared with 0-based pre-Gfx9.
- Restructure implementation to use monotonic ring indices
- Remove redundant submission size checks (handled by AcquireWriteAddress)
- Unify copy/fill per-command limit (documentation is unclear)
Change-Id: I57c1675221d2e63aa319fee700d9951671e1bd65
[ROCm/ROCR-Runtime commit: 1cd46afe6d]
Note: Implementation same as 1.0 APIs for now.
The followup change will have the complete implementation.
Change-Id: Ife633f74ff27eee0bb9b0c46952cf5233b0114e8
[ROCm/ROCR-Runtime commit: a324f21a46]