Implement PC Sampling session create and destroy APIs.
Change-Id: I93370d3d01b74ee15e71b8b0e20feb8f0066a3dc
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Vladimir Indic <Vladimir.Indic@amd.com>
Change-Id: Ib0c64356a1a4616b12d5dbeebe16273fe2a84abe
[ROCm/ROCR-Runtime commit: 632f9e60f7]
Add new PC Sampling API to list the supported PC Sampling methods and
options on a specific agent. If there is already a PC Sampling session
active on this agent, the list of methods returned will be reduced to
methods that can be run simultaneously with the current active session.
Change-Id: I42ac2b8f30d5c368faf8ed4cf37ca4134db22985
[ROCm/ROCR-Runtime commit: 295acf6b27]
Create allocator helper function to provide fine-grained memory on
a specific agent.
Change-Id: I32ba9aceb9c9dc708b140a0c45158e6e7a018844
[ROCm/ROCR-Runtime commit: 71f1a6726c]
ExecutePM4() function can optionally accept extra arguments for
acquire fence scope, release fence scope andcompletion signal. When
a completion signal is provided, ExecutePM4() does not wait for the
commands to complete.
Change-Id: Ib2a433b7bce1cb6260be8b76fe902335bd5dfada
[ROCm/ROCR-Runtime commit: 721e56ef5c]
Hard limit for scratch is 4GB per XCC and checks in case user specifies
values exceeding this value
Change-Id: Ib3cade762ff66c7e7d6a2d311e482cacbcf2b0de
[ROCm/ROCR-Runtime commit: d7adc94e3f]
- Per-executable contexts should be used from now on
- Global contexts are left as is for now for backwards
compatibility and will be phased out in follow up
patches.
Change-Id: I6291abf865c7ed24ee71f5065e539afc23f5ce64
[ROCm/ROCR-Runtime commit: b983c19729]
This reverts commit 5c520f4544c654e5f18e05cabd1c63d64473cfab.
Reason for revert: This patch is introducing a synchronization related bug in Unit_hipGetSetDevice_MultiThreaded testcase.
Change-Id: I367e4d4f1d75b21658ac1127c58982894a97cedb
[ROCm/ROCR-Runtime commit: 244ad319ac]
Temporary change to set the AllocateGTTAccess flag and node_id
on MES devices.
Change-Id: I22385d11b17b76cfb44278fa0d8a09bc8721cea6
[ROCm/ROCR-Runtime commit: efe455c2fa]
The function Init() called by one of the constructors of lazy_ptr is undefined.
Replacing with reset method sets the object to an uninitialized state and assigns a new constructor function
Fix submitted on github by zhoumin2 - https://github.com/ROCm/ROCR-Runtime/pull/184
Change-Id: I7d906d526ce7fe7e2548b01810e6395b13497bf3
[ROCm/ROCR-Runtime commit: 00b63f7452]
- hsa_api_trace.h contains C++
- rocprofiler-sdk needs to include the table version number defines (*_MAJOR_VERSION and *_STEP_VERSION) for the HSA API in it's public headers
- rocprofiler-sdk needs it's public headers to be C-compatible so hsa_api_trace_version.h was created
Change-Id: Ieece990b3b7775cb0446b545c9e3391c5f691c61
[ROCm/ROCR-Runtime commit: 5402842d5f]
When deferring a dmabuf export on an import call, there may be a
failure to export as the GEM object is not referenced by the kernel
mode driver. To get around this, do a non-deferred export and
immediately close the dmabuf FD to keep FD creation to a minimum.
This way, the GEM object will have a kernel mode driver reference
when a deferred export is done.
Also a bad dmabuf FD sent over a socket may not be received by an import
reader and this can cause a hang.
Set a 10 second timer so that importer is not blocking indefinitely.
Change-Id: I11a9b5ec64aa2e16fd6aecdf46c34e4eb56ccfd0
[ROCm/ROCR-Runtime commit: eb2100daad]
Extracts and creates a core dump ELF file from a fault event, using
core dump front end. GFX11 is not supported.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I5ae154e886f39ab3ce7bbae5803efb27a96c7e2e
[ROCm/ROCR-Runtime commit: cbeddf9eb6]
When inspecting waves on architectures where SPI may not initialize TTMP
registers, the debugger cannot reliably know if the trap handler was
entered and if it saved valuable information in TTMP registers.
This patch uses the status.skip_export bit (unused by the compute
shaders) to indicate that it got executed before halting a wave.
This is done except for gfx940, where ttmp11[31] can be used (as long as
TTMP registers are always initialized by SPI for this architecture). It
could be possible to be more selective as architectures always
initializing TTMP registers do not require this step, but always doing
is makes maintenance simpler.
Change-Id: I5c4148c78062f7ffa049ac7856c2edc82dbc77d1
[ROCm/ROCR-Runtime commit: 5d3f6a63f1]
Work around SDMA hang in non-SPX modes for non-APU devices by disabling
ganging.
Root cause of hang not found.
non-APU xGMI modes have only 1 link between socket devices anyways so
there's likely no real system level gain in ganging intra-socket.
Change-Id: Ia4eda2f85cbf25151d3dbcf50cc45b8b775c60e2
[ROCm/ROCR-Runtime commit: ed462035fa]
Gang items have to wait on dependency signals as well as the leader.
Copies should not start if shaders are still operating on memory
to be copied.
Change-Id: I99703b420045ebcba2c9da39ec64678129dc140f
[ROCm/ROCR-Runtime commit: ed260ea970]
This allows the VA to be recorded in ROCr so that they are not
treated as an invalid pointer in future API calls.
Change-Id: I8d1d8ef9816a984c89d30a2179b0ce8940fef1da
[ROCm/ROCR-Runtime commit: f2006d6899]
- add rocprofiler-register to CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS when found
- add rocprofiler-register to CPACK_RPM_BINARY_PACKAGE_REQUIRES when found
- remove report_tool_load_failures_explicit_
- add HSA_TOOLS_DISABLE_REGISTER flag
- add HSA_TOOLS_REPORT_REGISTER_FAILURE
- use HSA_TOOLS_REPORT_REGISTER_FAILURE instead of HSA_TOOLS_REPORT_LOAD_FAILURE
- changed rocprofiler-register message to not include the word "error"
Change-Id: Ib7fd7f14c42758a54c347874018281bb1b5477a6
[ROCm/ROCR-Runtime commit: 7ce263b0e4]
At hsa_shutdown(), scratch_lock_ may be gone. Blit queues don't need it.
Change-Id: Ic132ac8a6be31fb2f0623137115608b0b222f077
[ROCm/ROCR-Runtime commit: 24633c7a85]
If two attach requests to the same piece of shared memory occur,
a double export or premature dmabuf fd close can occur since the export
and close on demand calls are not atomic.
Use a reference counter on shared memory dmabuf FDs that have
already been opened to avoid this problem.
Change-Id: I14a59209c0385e32582af42a57b33b1c6838a9b1
[ROCm/ROCR-Runtime commit: 1f63ea3476]
Add rocrtst to test mapping non-contiguous memory to a
single VA range
Change-Id: Id2e57f83512f8b482456b2b1925586951ada7400
[ROCm/ROCR-Runtime commit: b77ade9c64]
Add test to verify whether GPU shaders can read memory created using VMM
APIs.
Split VMM rocrtst to two separate groups: Basic and Access tests
Change-Id: Iead8d46125580c71ccd582e967c8e2e891e75c5e
[ROCm/ROCR-Runtime commit: 99e31e43aa]
The wavefront size is currently only exposed as an agent level
attribute. This is not correctyl, because while the agent has a default
wave front size that is usually correct, it can easily be overridden via
options like -mwavefrontsize64 on various ISAs. The wavefrontsize
attribute is actually more of a calling convention that is consistent
within a callgraph. Because the root of each call graph is a kernel in
this architecture, we need to be able to query this on a per-kernel
basis. This information is already avialable in the kernel descriptor
packet, but it wasn't exported.
This patch adds HSA_CODE_SYMBOL_INFO_KERNEL_WAVEFRONT_SIZE as a new
option to query on the executable symbol.
Change-Id: I744815c89cc9d4c82f25479bdd48ae1f32e859ff
[ROCm/ROCR-Runtime commit: 9e26cbac14]
Instead of caching shared memory fds for export on the exporter side,
only export the FD in the async handler when requested.
The importer should request export fd closure once import is done.
Change-Id: I469e0cd1749beeb9c506c8a6461745fb039d9c3b
[ROCm/ROCR-Runtime commit: e911335cee]
ToolsApiTable's version was incorrectly default initialized to 0.
Fixes error in commit fc889669
Change-Id: I41e9301a9c33b119ee50f6164d21ddf11dc188c4
[ROCm/ROCR-Runtime commit: 8e312471dc]
Prevents OOM-Killer trigger,if all physical and swap mem gets fully used
Change-Id: I70d558fa9c06fe6217e62d57e11aec6a089aa0bb
[ROCm/ROCR-Runtime commit: 13800cc6d5]
Adjust code to allow the use of non-contiguous chunks of memory to be
mapped within a single VA range.
Change-Id: Ida21ba202927229347b3a32d9b7106df10819cf5
[ROCm/ROCR-Runtime commit: f7de85082e]
Add tests to catch whether ROCr breaks ABI compatibility with the
hsa_amd_pointer_info API in case the hsa_amd_pointer_info struct is
extended.
Change-Id: I4e69bf30db9791e59f895b2798b87985c41242e5
[ROCm/ROCR-Runtime commit: 776da1a3f7]
Add new tools table and functions to notify in case of an event
Change-Id: I47f0c2f3c8e02d7bcb74d649903eb4f86721c154
[ROCm/ROCR-Runtime commit: a67af3807f]
Currently, the definition of hsa_amd_memory_fault_reason_t tries to
set a constant of 0x8000_0000 by using the definition "1 << 31".
However, the 1 in this definition is a signed integer by C++ rules.
On our architectures, shifting a signed integer by 31 results in
signed integer overflow. Signed integer overflow results in
undefined behavior.
Forcing the 1 to be unsigned avoids this.
Change-Id: I860431eeede4eff29598f646abf3c1337b048d71
[ROCm/ROCR-Runtime commit: 1d6691e06b]
Fix gang factor overwrite of 0 if there are no xGMI SDMAs
on the device and gang factor is 1.
Change-Id: I041d4b4ae87fb68f224ee4dedb758c6f06c022a9
[ROCm/ROCR-Runtime commit: 1dd4a7dc18]
Users can import device memory without specifying the target node.
DMA buf imports return a Thunk handle that's not useful for
gpu mapping calls.
Fix this by using the import node information to re-import and
map with the correct target GPU.
Also fix IPC detach calls by deregistering the Thunk handle
import immediately during attach instead of failing to do it later
on detach since Thunk handles aren't placed into ROCr allocation
map.
Finally refactor the IPC attach function for cleaner logic flow.
Change-Id: Ib2bf178110b2be98bd6917c765f724e4e613f5f2
[ROCm/ROCR-Runtime commit: a3efd13a2f]
We should also close the client side dmabuf fd after importing for target
nodes.
Change-Id: I74f61dd65bebb03dc002f5df7301efd1ef8d9603
[ROCm/ROCR-Runtime commit: 15691ae460]
Optimizations include:
- Greedy gang by placing gang leaders on first D2D sdma blit context
to avoid dead locking with other gang leaders and items. Note that
this is fine since we can't avoid an oversubscription problem when
there is only 1 xGMI link anyways, so treat all xGMI links as a single
pipe for ganging.
- Non-leader gang items don't have to poll on dependency signals so this
opens up more non-blocking SDMA channels.
- unlock gang lock when gangs are not needed.
- Change gang factor lookup from vector pair to map and register all
gpus in gang factor lookup regardless of link type so that we can take
advantage of the O(logN) direct key/value lookup time.
Fixes include:
- HSA_PAGE_SIZE_4KB was an incorrect macro to use for gang size limit.
As a result, small copies ended up ganging and hitting latency limit.
Use hardcoded 4096 bytes instead.
- Cap auxillary gang factor to the number of non-XGMI SDMA engines.
Change-Id: Ic23fde131502906a807134a04599aa6d012e8cbb
[ROCm/ROCR-Runtime commit: 62f3f250ce]
The old max memory search algorithm is using Binary Search
algorithm to find last successful memory allocation. But each
successful memory allocation takes times. Since the unsuccessful
memory allocation returns very quick. Changing the search algorithm
to find first successful memory allocation starting from MAX, each
testing step with granularity interval will speed up this test.
Change-Id: Idada3c6f750c94f3bb223f4f3bff4e4ebd3e98f7
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: caedadcc6f]
Applies the following changes:
add version number to documentation left navigation bar and page title
add an "About" section with a license page
enable htmlzip, pdf, epub formats when publishing on Read the Docs
set pdf title, author, copyright, and version
rename .sphinx/.doxygen to sphinx/doxygen
remove docBin from URL
update rocm-docs-core dependency
Change-Id: I947cf32cd42d9f4e55b1ddd324ad4a7e4ba3f3e3
[ROCm/ROCR-Runtime commit: 1c6ad56dc6]