They should start at H1
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: Id11a2599c4609255a1a9916f70b58adc41cdddb4
[ROCm/ROCR-Runtime commit: f94c1794bb]
Add HsaMemFlags Contiguous bit for hsaKmtAllocMemory to allocate
contiguous VRAM, to support RDMA device with limited scatter-gather
ability.
Check KFD ioctl minor version >= 17.
Change-Id: I0db00dad125b2b7be523f343082641f59b850423
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 97497c7efc]
HsaMemFlag new flags added and the number of the reserved bits is
reduced, and generate value overflow compilanation error.
The reserved bits is not used, remove the init.
Change-Id: I603596977dfd558ce31ead03711d7c5ce5ee5b71
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: e2d742ac6f]
Fix lazy pointer initialization for dedicated PC Sampling queue.
Previous implementation would always create a queue on GPU agent
creation instead of creating the queue on first use.
Change-Id: Icf300f2b162e59143ba61ba182d9bee6e1308fc1
[ROCm/ROCR-Runtime commit: f2751b7030]
Fix Musl libc NULL errors and unsupported pthread funcs for compatibility.
Also ensures cleanup and error handling irrespective of CPU affinity override.
Fix submitted by github dev - AngryLoki
https://github.com/ROCm/ROCR-Runtime/issues/181
Change-Id: Ia487315e504112be5d3370756f23f6e23b9ae4be
[ROCm/ROCR-Runtime commit: bc9cac97fe]
New hsa_amd_queue_get_info API to support:
- HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue
- HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue
completion signal.
Change-Id: I98842131bcbdd08552649791a5d43e578a615808
[ROCm/ROCR-Runtime commit: d6d5786051]
When doing a coredump, we try to park the wave and save its PC in
ttmp7/ttmp11, but these registers will be overwritten by PC Sampling
requests.
Change-Id: I60fb734eb3bed4ee3cc8d8bba9ec4a527fff9671
[ROCm/ROCR-Runtime commit: 3443fdf665]
Flush is used by the client to retrieve data that are currently stored
in the buffers. This is used by the client to retrieve current data when
the buffers are not full.
Change-Id: Ib8304dcdfb2797cb060ec72df4970d95cf6be348
[ROCm/ROCR-Runtime commit: 8abbf9475b]
Each time there is enough data to fill the client session buffer,
callback the client data ready function to transfer the buffer contents
to the client.
Change-Id: Id79775426fa6d22e00dc2ef6f55c439eacb9b2af
[ROCm/ROCR-Runtime commit: 5177d17f5d]
Retrieve data from the buffers previously set in the 2nd level trap
handler TMA. We use a double buffering mechanism to allow the 2nd level
trap handler to write to one buffer while we are copying data from the
other.
Co-authored by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Co-authored by: James Zhu <James.Zhu@amd.com>
Change-Id: I252c381ea06b8cf927c4f9af6ea59dedc3717fbb
[ROCm/ROCR-Runtime commit: 855e454671]
Allocate required device and host buffers to be able to interact with
the 2nd level trap handler.
Change-Id: If99de5aacf956ca57ecafc7b04b797be9c9decaa
[ROCm/ROCR-Runtime commit: 8d666dea01]
Code is valid for gfx9 GPUs excluding gfx94x.
1st level trap handler will use TTMP13[22] to indicate host trap and
TTMP13[21] to indicate stochastic trap.
For each PC sampling method (hosttrap and stochastic), we use a double
buffering mechanism to transfer data between GPU and host.
The GPU will dump data into one buffer while CPU may be reading data
from the other buffer. There are 2 separate signals, one for each
buffer.
When signal != 0, the buffer belongs to the GPU and the GPU can write
to it. Once the buffer has reached the high watermark, the GPU will
set the signal to 0 to wake up the host and so that the host can try
to switch the buffers and read the data.
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: If3eb0913e52fb4788059a71e5feca334612f3d5d
[ROCm/ROCR-Runtime commit: 431a70471e]
Create dedicated CP queue with highest priority for PC Sampling. Reduce
the highest priority that LRT's can set for existing API so that PC
Sampling queue will always have highest priority over any other CP
queues
Change-Id: Ia70d74415edc83b4862a3e18dbdbd7cebe73ab47
[ROCm/ROCR-Runtime commit: a83f872a23]
Create PC Sampling APIs for start and stop functions. And create stub
for flush function.
Change-Id: I7a093b29dc87e34ac06faaae6cac2be50e4663e1
[ROCm/ROCR-Runtime commit: a842247482]
Implement PC Sampling session create and destroy APIs.
Change-Id: I93370d3d01b74ee15e71b8b0e20feb8f0066a3dc
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Vladimir Indic <Vladimir.Indic@amd.com>
Change-Id: Ib0c64356a1a4616b12d5dbeebe16273fe2a84abe
[ROCm/ROCR-Runtime commit: 632f9e60f7]
Add new PC Sampling API to list the supported PC Sampling methods and
options on a specific agent. If there is already a PC Sampling session
active on this agent, the list of methods returned will be reduced to
methods that can be run simultaneously with the current active session.
Change-Id: I42ac2b8f30d5c368faf8ed4cf37ca4134db22985
[ROCm/ROCR-Runtime commit: 295acf6b27]
Create allocator helper function to provide fine-grained memory on
a specific agent.
Change-Id: I32ba9aceb9c9dc708b140a0c45158e6e7a018844
[ROCm/ROCR-Runtime commit: 71f1a6726c]
ExecutePM4() function can optionally accept extra arguments for
acquire fence scope, release fence scope andcompletion signal. When
a completion signal is provided, ExecutePM4() does not wait for the
commands to complete.
Change-Id: Ib2a433b7bce1cb6260be8b76fe902335bd5dfada
[ROCm/ROCR-Runtime commit: 721e56ef5c]
Hard limit for scratch is 4GB per XCC and checks in case user specifies
values exceeding this value
Change-Id: Ib3cade762ff66c7e7d6a2d311e482cacbcf2b0de
[ROCm/ROCR-Runtime commit: d7adc94e3f]
- Per-executable contexts should be used from now on
- Global contexts are left as is for now for backwards
compatibility and will be phased out in follow up
patches.
Change-Id: I6291abf865c7ed24ee71f5065e539afc23f5ce64
[ROCm/ROCR-Runtime commit: b983c19729]
This reverts commit 5c520f4544c654e5f18e05cabd1c63d64473cfab.
Reason for revert: This patch is introducing a synchronization related bug in Unit_hipGetSetDevice_MultiThreaded testcase.
Change-Id: I367e4d4f1d75b21658ac1127c58982894a97cedb
[ROCm/ROCR-Runtime commit: 244ad319ac]
The build tree was missing a level of nesting, causing diversions based
on in-tree/out-of-tree use.
KR: Also fixed kfdtest paths
Change-Id: I8638b6d6227daabddd8eaa2aa387ba578b8dfab8
Signed-off-by: Stella Laurenzo <stellaraccident@gmail.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: a180fea5ad]
Temporary change to set the AllocateGTTAccess flag and node_id
on MES devices.
Change-Id: I22385d11b17b76cfb44278fa0d8a09bc8721cea6
[ROCm/ROCR-Runtime commit: efe455c2fa]
When allocating memory for MES AQL queue structure, the PreferredNode
is set to the device index of GPU to hint the location where the BO
needs to be created. But we need to ignore the device index when calling
bind_mem_to_numa.
Change-Id: Iae69fe02bfd48c5a3bd495319f6f2706d6e8aea2
[ROCm/ROCR-Runtime commit: 541d0dbbae]
The function Init() called by one of the constructors of lazy_ptr is undefined.
Replacing with reset method sets the object to an uninitialized state and assigns a new constructor function
Fix submitted on github by zhoumin2 - https://github.com/ROCm/ROCR-Runtime/pull/184
Change-Id: I7d906d526ce7fe7e2548b01810e6395b13497bf3
[ROCm/ROCR-Runtime commit: 00b63f7452]
Add kfdtest test cases for pc sampling.
Change-Id: I49f4f8ebfa6569803acdc7dec895c1902ce0b280
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: daf99471a4]
Add pc sampling support.
Change-Id: I08199024ba5a8eb2845c048d499fc8fcd260d2e8
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: f94e2530fb]
Add pc sampling support
Change-Id: I2c472ce00ff8648904cf7e585687e81d3f493049
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 4f554988b6]
To allocate GTT memory for MES AQL queue structure, KFD will create GART
mapping for the memory to be accessed by MES.
Change-Id: Iae7b33d1e70861109f1551d3a71dc60dfde9de61
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 9fbe853fea]
The purpose of this patch is to add KFDQMTest.QueueLatency to
kfdtest.exclude file temporarily for the following ASIC filters:
-GFX940
-GFX941
-GFX942
This test is failing due to an issue with the way it was coded,
not due to an issue with the ASICs it is now blacklisted on.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ic993629a2400449f598e73fe616a4572a38e2310
[ROCm/ROCR-Runtime commit: 656234abb8]
Reduce test case size if running on emulator.
Also, refactor code as both test cases had more than 80% same code.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I5899ee24244a6f0aa6b56fa8a4701b0b1e344b9f
[ROCm/ROCR-Runtime commit: e738648c8f]
Reduce number of iteration for test case to run in reasonable amount of
time.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I19a7ec0d5f03c54d6691aae3cf7432754c7481cc
[ROCm/ROCR-Runtime commit: 66e3a09a42]
Was failing to link on AlmaLinux8.
Change-Id: Id7df245f1063c2bebd0f07efc352f1b9017eda0e
Signed-off-by: Stella Laurenzo <stellaraccident@gmail.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: 7c10e1e4f5]