Grafik Komit

2930 Melakukan

Penulis SHA1 Pesan Tanggal
David Yat Sin fa6f3477ad Remove assert for physical vs virtual memory size
On systems with more than 1 TB of memory per NUMA region, this triggers
unnecessary errors.

Change-Id: I1bc7f209b9c1739b516c9f6b0acf434488ac7b8d


[ROCm/ROCR-Runtime commit: e539c8dce2]
2024-04-24 08:43:23 -04:00
amd-jmacaran 278631d135 Add support for external CI builds using Azure Pipelines
Change-Id: I8f4de331f00317a959b86f7e5b7a1025ba03564b


[ROCm/ROCR-Runtime commit: 8a893ea0b8]
2024-04-23 21:10:49 -04:00
David Galiffi 495db71f32 Fixed MD linting issue regarding code blocks.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: Id0467b332bf033642a2d403090ffe598e41689f5


[ROCm/ROCR-Runtime commit: a8bd453243]
2024-04-23 16:48:15 -04:00
David Galiffi 6e5eed56e4 Fixed broken link to ROCm documentation
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: I9a6bc0ad08a060d83fdc3a0589dfc81c68ce2b0e


[ROCm/ROCR-Runtime commit: 975c5dd24a]
2024-04-23 16:47:50 -04:00
David Galiffi bd35a68731 Fixed MD linting issue regarding headers
They should start at H1

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: Id11a2599c4609255a1a9916f70b58adc41cdddb4


[ROCm/ROCR-Runtime commit: f94c1794bb]
2024-04-23 16:47:19 -04:00
David Galiffi 93869e57ba Update GitHub links to point to the new organization.
ie., RadeonOpenCompute --> ROCm

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Change-Id: I6724cbcbbb525f767af297e3986cd61fa69cd49f


[ROCm/ROCR-Runtime commit: c9103a00ef]
2024-04-23 16:46:42 -04:00
Philip Yang a580869abd libhsakmt: Support contiguous VRAM allocation flag
Add HsaMemFlags Contiguous bit for hsaKmtAllocMemory to allocate
contiguous VRAM, to support RDMA device with limited scatter-gather
ability.

Check KFD ioctl minor version >= 17.

Change-Id: I0db00dad125b2b7be523f343082641f59b850423
Signed-off-by: Philip Yang <Philip.Yang@amd.com>


[ROCm/ROCR-Runtime commit: 97497c7efc]
2024-04-23 14:27:12 -04:00
Your Name f4ff320797 libhsakmt: Remove HsaMemFlag reserved bit init
HsaMemFlag new flags added and the number of the reserved bits is
reduced, and generate value overflow compilanation error.

The reserved bits is not used, remove the init.

Change-Id: I603596977dfd558ce31ead03711d7c5ce5ee5b71
Signed-off-by: Philip Yang <Philip.Yang@amd.com>


[ROCm/ROCR-Runtime commit: e2d742ac6f]
2024-04-23 14:27:12 -04:00
David Yat Sin 5bf08d0d70 Fix queue creation for PC Sampling
Fix lazy pointer initialization for dedicated PC Sampling queue.
Previous implementation would always create a queue on GPU agent
creation instead of creating the queue on first use.

Change-Id: Icf300f2b162e59143ba61ba182d9bee6e1308fc1


[ROCm/ROCR-Runtime commit: f2751b7030]
2024-04-22 19:00:48 +00:00
Shweta.Khatri 4f4d215196 Fixing compilation errors related to MUSL libc
Fix Musl libc NULL errors and unsupported pthread funcs for compatibility.
Also ensures cleanup and error handling irrespective of CPU affinity override.

Fix submitted by github dev - AngryLoki
https://github.com/ROCm/ROCR-Runtime/issues/181

Change-Id: Ia487315e504112be5d3370756f23f6e23b9ae4be


[ROCm/ROCR-Runtime commit: bc9cac97fe]
2024-04-17 07:14:15 -04:00
David Yat Sin ed7cdb88e4 Adding queue information queries
New hsa_amd_queue_get_info API to support:

- HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue

- HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue
completion signal.

Change-Id: I98842131bcbdd08552649791a5d43e578a615808


[ROCm/ROCR-Runtime commit: d6d5786051]
2024-04-11 12:53:48 -04:00
David Yat Sin 395fd2d230 PC Sampling: Disable coredump when sessions active
When doing a coredump, we try to park the wave and save its PC in
ttmp7/ttmp11, but these registers will be overwritten by PC Sampling
requests.

Change-Id: I60fb734eb3bed4ee3cc8d8bba9ec4a527fff9671


[ROCm/ROCR-Runtime commit: 3443fdf665]
2024-04-11 12:53:43 -04:00
David Yat Sin 52fc6e8619 PC Sampling: Convert timestamps to system time
Convert timestamps inside samples to system time

Change-Id: I5fad9a6887fa27c0ded9aa9b5f251cba2868f88f


[ROCm/ROCR-Runtime commit: 49e56ce782]
2024-04-11 12:53:37 -04:00
David Yat Sin a84d407118 PC Sampling: Implement lost sample count
Change-Id: Idfdfbac71c1813dd7a97c301619cf8ce83713c53


[ROCm/ROCR-Runtime commit: 547c9cb143]
2024-04-11 12:53:31 -04:00
David Yat Sin 3d13db3c39 PC Sampling: Implement flush
Flush is used by the client to retrieve data that are currently stored
in the buffers. This is used by the client to retrieve current data when
the buffers are not full.

Change-Id: Ib8304dcdfb2797cb060ec72df4970d95cf6be348


[ROCm/ROCR-Runtime commit: 8abbf9475b]
2024-04-11 12:53:24 -04:00
David Yat Sin b0d88f22dd PC Sampling: Push data to PC Sampling client
Each time there is enough data to fill the client session buffer,
callback the client data ready function to transfer the buffer contents
to the client.

Change-Id: Id79775426fa6d22e00dc2ef6f55c439eacb9b2af


[ROCm/ROCR-Runtime commit: 5177d17f5d]
2024-04-11 12:53:17 -04:00
David Yat Sin d8b7c22609 PC Sampling: Retrieve data from trap handler
Retrieve data from the buffers previously set in the 2nd level trap
handler TMA. We use a double buffering mechanism to allow the 2nd level
trap handler to write to one buffer while we are copying data from the
other.

Co-authored by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Co-authored by: James Zhu <James.Zhu@amd.com>

Change-Id: I252c381ea06b8cf927c4f9af6ea59dedc3717fbb


[ROCm/ROCR-Runtime commit: 855e454671]
2024-04-11 12:53:12 -04:00
David Yat Sin b11294e555 PC Sampling: Update 2nd level trap handler
Update 2nd level trap handler when PC Sampling is enabled

Change-Id: I95bf2bca8057d2f8313923c7f012f033e12ccc3a


[ROCm/ROCR-Runtime commit: efdb72fd71]
2024-04-11 12:53:06 -04:00
David Yat Sin bb10ff65c2 PC Sampling: Allocate resources to retrieve data from trap handler
Allocate required device and host buffers to be able to interact with
the 2nd level trap handler.

Change-Id: If99de5aacf956ca57ecafc7b04b797be9c9decaa


[ROCm/ROCR-Runtime commit: 8d666dea01]
2024-04-11 12:53:00 -04:00
Joseph Greathouse 5c61936483 PC Sampling: Add gfx9 2nd trap handler for PC Sampling
Code is valid for gfx9 GPUs excluding gfx94x.

1st level trap handler will use TTMP13[22] to indicate host trap and
TTMP13[21] to indicate stochastic trap.

For each PC sampling method (hosttrap and stochastic), we use a double
buffering mechanism to transfer data between GPU and host.
The GPU will dump data into one buffer while CPU may be reading data
from the other buffer. There are 2 separate signals, one for each
buffer.
When signal != 0, the buffer belongs to the GPU and the GPU can write
to it. Once the buffer has reached the high watermark, the GPU will
set the signal to 0 to wake up the host and so that the host can try
to switch the buffers and read the data.

Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: If3eb0913e52fb4788059a71e5feca334612f3d5d


[ROCm/ROCR-Runtime commit: 431a70471e]
2024-04-11 12:52:54 -04:00
David Yat Sin c3f9368b8f PC Sampling: Create dedicated CP queue
Create dedicated CP queue with highest priority for PC Sampling. Reduce
the highest priority that LRT's can set for existing API so that PC
Sampling queue will always have highest priority over any other CP
queues

Change-Id: Ia70d74415edc83b4862a3e18dbdbd7cebe73ab47


[ROCm/ROCR-Runtime commit: a83f872a23]
2024-04-11 12:52:48 -04:00
David Yat Sin bcdecc7ff4 PC Sampling: Add start stop and flush APIs
Create PC Sampling APIs for start and stop functions. And create stub
for flush function.

Change-Id: I7a093b29dc87e34ac06faaae6cac2be50e4663e1


[ROCm/ROCR-Runtime commit: a842247482]
2024-04-11 12:52:42 -04:00
David Yat Sin 566e2c60fd PC Sampling: Add create and destroy APIs
Implement PC Sampling session create and destroy APIs.

Change-Id: I93370d3d01b74ee15e71b8b0e20feb8f0066a3dc

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Vladimir Indic <Vladimir.Indic@amd.com>
Change-Id: Ib0c64356a1a4616b12d5dbeebe16273fe2a84abe


[ROCm/ROCR-Runtime commit: 632f9e60f7]
2024-04-11 12:52:35 -04:00
David Yat Sin 0a4415f202 PC Sampling: API to list supported configurations
Add new PC Sampling API to list the supported PC Sampling methods and
options on a specific agent. If there is already a PC Sampling session
active on this agent, the list of methods returned will be reduced to
methods that can be run simultaneously with the current active session.

Change-Id: I42ac2b8f30d5c368faf8ed4cf37ca4134db22985


[ROCm/ROCR-Runtime commit: 295acf6b27]
2024-04-11 12:52:30 -04:00
David Yat Sin 8165c03e7b PC Sampling: Create PC Sampling interfaces
Create new interface group for PC Sampling

Change-Id: I59b4cfe9f8d1ae313dc28be1d2ed49f750d8212b


[ROCm/ROCR-Runtime commit: 0bc244e10a]
2024-04-11 12:52:23 -04:00
David Yat Sin b79e044711 PC Sampling: Update public headers for new APIs
Change-Id: Ib9987efdb41d5f6d203e7e86f9b26809d020e04e


[ROCm/ROCR-Runtime commit: 6a7122b183]
2024-04-11 12:52:16 -04:00
David Yat Sin bc77ec44f6 Create fine-grained allocator
Create allocator helper function to provide fine-grained memory on
a specific agent.

Change-Id: I32ba9aceb9c9dc708b140a0c45158e6e7a018844


[ROCm/ROCR-Runtime commit: 71f1a6726c]
2024-04-11 12:52:10 -04:00
David Yat Sin 6b38aecae9 Extend ExecutePM4() to accept completion signal and fences
ExecutePM4() function can optionally accept extra arguments for
acquire fence scope, release fence scope andcompletion signal. When
a completion signal is provided, ExecutePM4() does not wait for the
commands to complete.

Change-Id: Ib2a433b7bce1cb6260be8b76fe902335bd5dfada


[ROCm/ROCR-Runtime commit: 721e56ef5c]
2024-04-11 12:51:52 -04:00
David Yat Sin f3db911e3c Add limit checks for HSA_SINGLE_SCRATCH_LIMIT
Hard limit for scratch is 4GB per XCC and checks in case user specifies
values exceeding this value

Change-Id: Ib3cade762ff66c7e7d6a2d311e482cacbcf2b0de


[ROCm/ROCR-Runtime commit: d7adc94e3f]
2024-04-11 14:03:25 +00:00
Konstantin Zhuravlyov 45eafcf4ea loader: allow but skip static relocations for code object v2+
Change-Id: I4ae14cb5e740d7d45810b75038b15a0b94d2bf0b


[ROCm/ROCR-Runtime commit: 08c94463de]
2024-04-09 11:39:18 -04:00
Konstantin Zhuravlyov ae24ca1528 Switch to per-executable contexts in the loader
- Per-executable contexts should be used from now on
  - Global contexts are left as is for now for backwards
    compatibility and will be phased out in follow up
    patches.

Change-Id: I6291abf865c7ed24ee71f5065e539afc23f5ce64


[ROCm/ROCR-Runtime commit: b983c19729]
2024-04-09 10:31:51 -04:00
Shweta Khatri 89e51e1f63 Revert "Use HybridMutex for IPC locks"
This reverts commit 5c520f4544c654e5f18e05cabd1c63d64473cfab.

Reason for revert: This patch is introducing a synchronization related bug in Unit_hipGetSetDevice_MultiThreaded testcase.

Change-Id: I367e4d4f1d75b21658ac1127c58982894a97cedb


[ROCm/ROCR-Runtime commit: 244ad319ac]
2024-04-02 12:27:55 -04:00
Stella Laurenzo 2fea138040 Properly nest build time headers to match arrangement at install time.
The build tree was missing a level of nesting, causing diversions based
on in-tree/out-of-tree use.
KR: Also fixed kfdtest paths

Change-Id: I8638b6d6227daabddd8eaa2aa387ba578b8dfab8
Signed-off-by: Stella Laurenzo <stellaraccident@gmail.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>


[ROCm/ROCR-Runtime commit: a180fea5ad]
2024-04-01 17:10:40 -04:00
David Yat Sin b19929b090 Temporary: Set AllocateGTTAccess and node_id for MES
Temporary change to set the AllocateGTTAccess flag and node_id
on MES devices.

Change-Id: I22385d11b17b76cfb44278fa0d8a09bc8721cea6


[ROCm/ROCR-Runtime commit: efe455c2fa]
2024-03-29 19:38:19 +00:00
David Yat Sin 78ad630632 Set NUMA region to 0 when using GTTAccess flag
When allocating memory for MES AQL queue structure, the PreferredNode
is set to the device index of GPU to hint the location where the BO
needs to be created. But we need to ignore the device index when calling
bind_mem_to_numa.

Change-Id: Iae69fe02bfd48c5a3bd495319f6f2706d6e8aea2


[ROCm/ROCR-Runtime commit: 541d0dbbae]
2024-03-29 17:17:56 +00:00
Konstantin Zhuravlyov 2a7fb7a808 Add R_AMDGPU_ABS32 support
Change-Id: I0ee0302d919ede44765adf02eab15015573efef2


[ROCm/ROCR-Runtime commit: 9e8f185397]
2024-03-26 18:47:29 -04:00
Konstantin Zhuravlyov 853ccdecbb Add dynamic relocation types (NFC)
Change-Id: I1b443003077ba241f34444da293e362266c2ae92


[ROCm/ROCR-Runtime commit: c5e74b7d0a]
2024-03-26 18:47:05 -04:00
Konstantin Zhuravlyov ec66509986 Rename existing relocation types to legacy/v1 (NFC)
Change-Id: Ided7f656c34131b8067a19c0d3b2955fc8823628


[ROCm/ROCR-Runtime commit: b2c32ad6cb]
2024-03-26 18:46:50 -04:00
Shweta.Khatri 565dbac2d4 Replace lazy_ptr's Init() with reset() method
The function Init() called by one of the constructors of lazy_ptr is undefined.
Replacing with reset method sets the object to an uninitialized state and assigns a new constructor function

Fix submitted on github by zhoumin2 - https://github.com/ROCm/ROCR-Runtime/pull/184

Change-Id: I7d906d526ce7fe7e2548b01810e6395b13497bf3


[ROCm/ROCR-Runtime commit: 00b63f7452]
2024-03-26 15:07:34 -04:00
Mihai Preda 69c283ef3d Free the malloc'ed event on page alloc failure
Change-Id: I009a9a6e2f67545c51470e86eac1adb78d6181b4
Signed-off-by: Mihai Preda <preda@users.noreply.github.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>


[ROCm/ROCR-Runtime commit: fa0c182325]
2024-03-26 09:57:26 -04:00
James Zhu 8fd711ac59 kfdtest: add kfdtest test cases for pc sampling
Add kfdtest test cases for pc sampling.

Change-Id: I49f4f8ebfa6569803acdc7dec895c1902ce0b280
Signed-off-by: James Zhu <James.Zhu@amd.com>


[ROCm/ROCR-Runtime commit: daf99471a4]
2024-03-26 02:06:19 -04:00
David Yat Sin 19c7b5b4f6 libhsakmt: add PC sampling support
Add pc sampling support.

Change-Id: I08199024ba5a8eb2845c048d499fc8fcd260d2e8
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>


[ROCm/ROCR-Runtime commit: f94e2530fb]
2024-03-25 15:53:38 +00:00
Shweta.Khatri cef0e09844 Convert some comments to Doxygen-style comments
hsa_ext_amd.h - Fix provided by github developer - Mátyás Aradi
Github request - https://github.com/ROCm/ROCR-Runtime/pull/187

Change-Id: I63e4175caebd10be0151f21bd5f048dd011aaf06


[ROCm/ROCR-Runtime commit: 02a40e9272]
2024-03-25 11:47:14 -04:00
David Yat Sin a88194702c libhsakmt: Update Linux header for PC Sampling
Add pc sampling support

Change-Id: I2c472ce00ff8648904cf7e585687e81d3f493049
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>


[ROCm/ROCR-Runtime commit: 4f554988b6]
2024-03-22 15:06:10 +00:00
Philip Yang 96206ef888 libhsakmt: Add memory alloc flag GTTAccess
To allocate GTT memory for MES AQL queue structure, KFD will create GART
mapping for the memory to be accessed by MES.

Change-Id: Iae7b33d1e70861109f1551d3a71dc60dfde9de61
Signed-off-by: Philip Yang <Philip.Yang@amd.com>


[ROCm/ROCR-Runtime commit: 9fbe853fea]
2024-03-20 10:41:51 -04:00
Ori Messinger 6d04547b22 kfdtest: Exclude KFDQMTest.QueueLatency test
The purpose of this patch is to add KFDQMTest.QueueLatency to
kfdtest.exclude file temporarily for the following ASIC filters:
  -GFX940
  -GFX941
  -GFX942

This test is failing due to an issue with the way it was coded,
not due to an issue with the ASICs it is now blacklisted on.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ic993629a2400449f598e73fe616a4572a38e2310


[ROCm/ROCR-Runtime commit: 656234abb8]
2024-03-19 14:37:49 -04:00
David Belanger ddcccab5ba kfdtest: Update QueuePriority* tests for emulation
Reduce test case size if running on emulator.
Also, refactor code as both test cases had more than 80% same code.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I5899ee24244a6f0aa6b56fa8a4701b0b1e344b9f


[ROCm/ROCR-Runtime commit: e738648c8f]
2024-03-19 11:30:05 -04:00
David Belanger dd4079f8f1 kfdtest: Update PM4EventInterrupt for emulation
Reduce number of iteration for test case to run in reasonable amount of
time.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I19a7ec0d5f03c54d6691aae3cf7432754c7481cc


[ROCm/ROCR-Runtime commit: 66e3a09a42]
2024-03-19 11:29:38 -04:00
David Yat Sin 7382f5b5b3 Fix uninialized variables
Change-Id: Ie5da4547fa764e55162aff287cbb338ed4324093


[ROCm/ROCR-Runtime commit: 9d842dd1d8]
2024-03-14 15:20:56 -04:00
Xinmudotmoe 3b6146e7db Update doorbell->size to support large default PAGE_SIZE kernels
Change-Id: I6fbe10ae3309f2f935d6782366cedb56dc1438c3
Signed-off-by: Xinmudotmoe <xinmu@xinmu.moe>
Signed-off-by: Kent Russell <kent.russell@amd.com>


[ROCm/ROCR-Runtime commit: 0c06bec272]
2024-03-13 09:18:42 -04:00