It seesm the Release() function is not reliable and can cause segfaults.
This is a temporary work-around until the Release() function is fixed.
Change-Id: I95470a800c6153673e4b8f4fe46a646903325074
[ROCm/ROCR-Runtime commit: ac5fb8be9e]
If pthread_attr_setaffinity_np function exists use it instead of
pthread_setaffinity_np as pthread_setaffinity_np seems to fail to set
the affinity settings on some systems.
Change-Id: Icd8b17039699ac10d9cd5c4dbb6ac44630673949
[ROCm/ROCR-Runtime commit: 57b93e02a4]
Bumping HSA_AMD_INTERFACE_VERSION_MINOR version to 5 to account for
previously added GPU agent query: HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES
Change-Id: Ic8cfdcfb7bad6f3d1e0b3d68f505a62074fc26b9
[ROCm/ROCR-Runtime commit: b6829f7a72]
Support contiguous physical memory allocation flag. Allocations with
this flag will have contiguous physical memory. This is dependent on KFD
support for this flag and the AllocateKfdMemory(..) function call will
fail when it is not supported.
Change-Id: I6c51c8b061f7b026fdcc2aa2c37c74ecc13d95b6
[ROCm/ROCR-Runtime commit: 9af225e1b1]
On systems with more than 1 TB of memory per NUMA region, this triggers
unnecessary errors.
Change-Id: I1bc7f209b9c1739b516c9f6b0acf434488ac7b8d
[ROCm/ROCR-Runtime commit: e539c8dce2]
Fix lazy pointer initialization for dedicated PC Sampling queue.
Previous implementation would always create a queue on GPU agent
creation instead of creating the queue on first use.
Change-Id: Icf300f2b162e59143ba61ba182d9bee6e1308fc1
[ROCm/ROCR-Runtime commit: f2751b7030]
Fix Musl libc NULL errors and unsupported pthread funcs for compatibility.
Also ensures cleanup and error handling irrespective of CPU affinity override.
Fix submitted by github dev - AngryLoki
https://github.com/ROCm/ROCR-Runtime/issues/181
Change-Id: Ia487315e504112be5d3370756f23f6e23b9ae4be
[ROCm/ROCR-Runtime commit: bc9cac97fe]
New hsa_amd_queue_get_info API to support:
- HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue
- HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue
completion signal.
Change-Id: I98842131bcbdd08552649791a5d43e578a615808
[ROCm/ROCR-Runtime commit: d6d5786051]
When doing a coredump, we try to park the wave and save its PC in
ttmp7/ttmp11, but these registers will be overwritten by PC Sampling
requests.
Change-Id: I60fb734eb3bed4ee3cc8d8bba9ec4a527fff9671
[ROCm/ROCR-Runtime commit: 3443fdf665]
Flush is used by the client to retrieve data that are currently stored
in the buffers. This is used by the client to retrieve current data when
the buffers are not full.
Change-Id: Ib8304dcdfb2797cb060ec72df4970d95cf6be348
[ROCm/ROCR-Runtime commit: 8abbf9475b]
Each time there is enough data to fill the client session buffer,
callback the client data ready function to transfer the buffer contents
to the client.
Change-Id: Id79775426fa6d22e00dc2ef6f55c439eacb9b2af
[ROCm/ROCR-Runtime commit: 5177d17f5d]
Retrieve data from the buffers previously set in the 2nd level trap
handler TMA. We use a double buffering mechanism to allow the 2nd level
trap handler to write to one buffer while we are copying data from the
other.
Co-authored by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Co-authored by: James Zhu <James.Zhu@amd.com>
Change-Id: I252c381ea06b8cf927c4f9af6ea59dedc3717fbb
[ROCm/ROCR-Runtime commit: 855e454671]
Allocate required device and host buffers to be able to interact with
the 2nd level trap handler.
Change-Id: If99de5aacf956ca57ecafc7b04b797be9c9decaa
[ROCm/ROCR-Runtime commit: 8d666dea01]
Code is valid for gfx9 GPUs excluding gfx94x.
1st level trap handler will use TTMP13[22] to indicate host trap and
TTMP13[21] to indicate stochastic trap.
For each PC sampling method (hosttrap and stochastic), we use a double
buffering mechanism to transfer data between GPU and host.
The GPU will dump data into one buffer while CPU may be reading data
from the other buffer. There are 2 separate signals, one for each
buffer.
When signal != 0, the buffer belongs to the GPU and the GPU can write
to it. Once the buffer has reached the high watermark, the GPU will
set the signal to 0 to wake up the host and so that the host can try
to switch the buffers and read the data.
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: If3eb0913e52fb4788059a71e5feca334612f3d5d
[ROCm/ROCR-Runtime commit: 431a70471e]
Create dedicated CP queue with highest priority for PC Sampling. Reduce
the highest priority that LRT's can set for existing API so that PC
Sampling queue will always have highest priority over any other CP
queues
Change-Id: Ia70d74415edc83b4862a3e18dbdbd7cebe73ab47
[ROCm/ROCR-Runtime commit: a83f872a23]
Create PC Sampling APIs for start and stop functions. And create stub
for flush function.
Change-Id: I7a093b29dc87e34ac06faaae6cac2be50e4663e1
[ROCm/ROCR-Runtime commit: a842247482]
Implement PC Sampling session create and destroy APIs.
Change-Id: I93370d3d01b74ee15e71b8b0e20feb8f0066a3dc
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Vladimir Indic <Vladimir.Indic@amd.com>
Change-Id: Ib0c64356a1a4616b12d5dbeebe16273fe2a84abe
[ROCm/ROCR-Runtime commit: 632f9e60f7]
Add new PC Sampling API to list the supported PC Sampling methods and
options on a specific agent. If there is already a PC Sampling session
active on this agent, the list of methods returned will be reduced to
methods that can be run simultaneously with the current active session.
Change-Id: I42ac2b8f30d5c368faf8ed4cf37ca4134db22985
[ROCm/ROCR-Runtime commit: 295acf6b27]
Create allocator helper function to provide fine-grained memory on
a specific agent.
Change-Id: I32ba9aceb9c9dc708b140a0c45158e6e7a018844
[ROCm/ROCR-Runtime commit: 71f1a6726c]
ExecutePM4() function can optionally accept extra arguments for
acquire fence scope, release fence scope andcompletion signal. When
a completion signal is provided, ExecutePM4() does not wait for the
commands to complete.
Change-Id: Ib2a433b7bce1cb6260be8b76fe902335bd5dfada
[ROCm/ROCR-Runtime commit: 721e56ef5c]
Hard limit for scratch is 4GB per XCC and checks in case user specifies
values exceeding this value
Change-Id: Ib3cade762ff66c7e7d6a2d311e482cacbcf2b0de
[ROCm/ROCR-Runtime commit: d7adc94e3f]
- Per-executable contexts should be used from now on
- Global contexts are left as is for now for backwards
compatibility and will be phased out in follow up
patches.
Change-Id: I6291abf865c7ed24ee71f5065e539afc23f5ce64
[ROCm/ROCR-Runtime commit: b983c19729]
This reverts commit 5c520f4544c654e5f18e05cabd1c63d64473cfab.
Reason for revert: This patch is introducing a synchronization related bug in Unit_hipGetSetDevice_MultiThreaded testcase.
Change-Id: I367e4d4f1d75b21658ac1127c58982894a97cedb
[ROCm/ROCR-Runtime commit: 244ad319ac]
Temporary change to set the AllocateGTTAccess flag and node_id
on MES devices.
Change-Id: I22385d11b17b76cfb44278fa0d8a09bc8721cea6
[ROCm/ROCR-Runtime commit: efe455c2fa]
The function Init() called by one of the constructors of lazy_ptr is undefined.
Replacing with reset method sets the object to an uninitialized state and assigns a new constructor function
Fix submitted on github by zhoumin2 - https://github.com/ROCm/ROCR-Runtime/pull/184
Change-Id: I7d906d526ce7fe7e2548b01810e6395b13497bf3
[ROCm/ROCR-Runtime commit: 00b63f7452]
- hsa_api_trace.h contains C++
- rocprofiler-sdk needs to include the table version number defines (*_MAJOR_VERSION and *_STEP_VERSION) for the HSA API in it's public headers
- rocprofiler-sdk needs it's public headers to be C-compatible so hsa_api_trace_version.h was created
Change-Id: Ieece990b3b7775cb0446b545c9e3391c5f691c61
[ROCm/ROCR-Runtime commit: 5402842d5f]
When deferring a dmabuf export on an import call, there may be a
failure to export as the GEM object is not referenced by the kernel
mode driver. To get around this, do a non-deferred export and
immediately close the dmabuf FD to keep FD creation to a minimum.
This way, the GEM object will have a kernel mode driver reference
when a deferred export is done.
Also a bad dmabuf FD sent over a socket may not be received by an import
reader and this can cause a hang.
Set a 10 second timer so that importer is not blocking indefinitely.
Change-Id: I11a9b5ec64aa2e16fd6aecdf46c34e4eb56ccfd0
[ROCm/ROCR-Runtime commit: eb2100daad]
Extracts and creates a core dump ELF file from a fault event, using
core dump front end. GFX11 is not supported.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I5ae154e886f39ab3ce7bbae5803efb27a96c7e2e
[ROCm/ROCR-Runtime commit: cbeddf9eb6]
When inspecting waves on architectures where SPI may not initialize TTMP
registers, the debugger cannot reliably know if the trap handler was
entered and if it saved valuable information in TTMP registers.
This patch uses the status.skip_export bit (unused by the compute
shaders) to indicate that it got executed before halting a wave.
This is done except for gfx940, where ttmp11[31] can be used (as long as
TTMP registers are always initialized by SPI for this architecture). It
could be possible to be more selective as architectures always
initializing TTMP registers do not require this step, but always doing
is makes maintenance simpler.
Change-Id: I5c4148c78062f7ffa049ac7856c2edc82dbc77d1
[ROCm/ROCR-Runtime commit: 5d3f6a63f1]
Work around SDMA hang in non-SPX modes for non-APU devices by disabling
ganging.
Root cause of hang not found.
non-APU xGMI modes have only 1 link between socket devices anyways so
there's likely no real system level gain in ganging intra-socket.
Change-Id: Ia4eda2f85cbf25151d3dbcf50cc45b8b775c60e2
[ROCm/ROCR-Runtime commit: ed462035fa]
Gang items have to wait on dependency signals as well as the leader.
Copies should not start if shaders are still operating on memory
to be copied.
Change-Id: I99703b420045ebcba2c9da39ec64678129dc140f
[ROCm/ROCR-Runtime commit: ed260ea970]
This allows the VA to be recorded in ROCr so that they are not
treated as an invalid pointer in future API calls.
Change-Id: I8d1d8ef9816a984c89d30a2179b0ce8940fef1da
[ROCm/ROCR-Runtime commit: f2006d6899]
- add rocprofiler-register to CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS when found
- add rocprofiler-register to CPACK_RPM_BINARY_PACKAGE_REQUIRES when found
- remove report_tool_load_failures_explicit_
- add HSA_TOOLS_DISABLE_REGISTER flag
- add HSA_TOOLS_REPORT_REGISTER_FAILURE
- use HSA_TOOLS_REPORT_REGISTER_FAILURE instead of HSA_TOOLS_REPORT_LOAD_FAILURE
- changed rocprofiler-register message to not include the word "error"
Change-Id: Ib7fd7f14c42758a54c347874018281bb1b5477a6
[ROCm/ROCR-Runtime commit: 7ce263b0e4]
At hsa_shutdown(), scratch_lock_ may be gone. Blit queues don't need it.
Change-Id: Ic132ac8a6be31fb2f0623137115608b0b222f077
[ROCm/ROCR-Runtime commit: 24633c7a85]
If two attach requests to the same piece of shared memory occur,
a double export or premature dmabuf fd close can occur since the export
and close on demand calls are not atomic.
Use a reference counter on shared memory dmabuf FDs that have
already been opened to avoid this problem.
Change-Id: I14a59209c0385e32582af42a57b33b1c6838a9b1
[ROCm/ROCR-Runtime commit: 1f63ea3476]
Add rocrtst to test mapping non-contiguous memory to a
single VA range
Change-Id: Id2e57f83512f8b482456b2b1925586951ada7400
[ROCm/ROCR-Runtime commit: b77ade9c64]
Add test to verify whether GPU shaders can read memory created using VMM
APIs.
Split VMM rocrtst to two separate groups: Basic and Access tests
Change-Id: Iead8d46125580c71ccd582e967c8e2e891e75c5e
[ROCm/ROCR-Runtime commit: 99e31e43aa]