Commit Graph

747 Commits

Author SHA1 Message Date
Sean Keely 4c6ea88cf5 Add HSA_CU_MASK
New environment variable HSA_CU_MASK allows users to
specify a cu mask to every queue allocated from any
GPU.  hsa_amd_queue_cu_set_mask is restricted from
escaping this mask.

A new API hsa_amd_queue_cu_get_mask is added to query
the current cu mask.

Change-Id: I846c03a5faaca9b95067c31db84b59cc9fce2f03


[ROCm/ROCR-Runtime commit: 4455250be1]
2021-07-29 02:23:34 -05:00
Sean Keely c7606d1dfc Provide hwloc dependency.
Some distros do not provide the proper hwloc version for rocrtst.
This packages the required version.

Change-Id: Iebc68250c33f309d6b50e850a0553685bac50563


[ROCm/ROCR-Runtime commit: 2c35469617]
2021-07-26 23:56:14 -04:00
Sean Keely 127ed837dc Revert "Revert "Split packaging into binary and dev packages.""
Correct deb and rpm package conflict declarations.
hsa-ext-rocr-dev was to be replaced.  Now that two packages
replace this package remove conflicts so that they do not block
eachother.

Change-Id: If25ea6cfd3d6d00398fd0a8d179860d3a92dc907


[ROCm/ROCR-Runtime commit: 770a42cb42]
2021-07-26 20:42:25 -05:00
Sean Keely 5b9b2f2468 Revert "Split packaging into binary and dev packages."
This reverts commit d12dbc3518.

Change-Id: I33cbcffe5695c4e45ebce37ce56177006a5e0f62


[ROCm/ROCR-Runtime commit: d2ccf44085]
2021-07-26 19:23:46 -05:00
Sean Keely d12dbc3518 Split packaging into binary and dev packages.
Conform with normal packaging behavior where a binary
and its development headers are in separate packages.

Change-Id: I91c58ea271a8e1c710c213060bca6d58d69287e6


[ROCm/ROCR-Runtime commit: 2c32cbea00]
2021-07-26 17:01:36 -05:00
Sean Keely 6bef819142 Add package splitting names to PROVIDES.
Preparation for splitting the package.  rocm-dev meta package
should be updated after this is merged and before splitting the
packages to avoid build breaks.

Change-Id: Iaad54ee72207285eaaa99e88cf1949bea7f29001


[ROCm/ROCR-Runtime commit: bea17130f7]
2021-07-23 18:33:09 -05:00
Aaron Liu eb72821574 Add gfx1035 for yellow carp
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I1e3e44352b5825fc0f249c39aed703d4990995ca


[ROCm/ROCR-Runtime commit: 4032070c3e]
2021-07-22 13:48:31 +08:00
Sean Keely 32766de851 Add support for reporting vm faults through the queue error handler.
Under xnack we can now identify the queue which generated a vm fault.
This allows users to identify which queue, and therefore which
dispatch, a vm fault came from.

Change-Id: If72ff3de05800f2b811aa7842a15eedff8b5e45a


[ROCm/ROCR-Runtime commit: 59ee761f81]
2021-07-16 18:03:26 -05:00
Laurent Morichetti 0a2ad007a4 Fix incorrect packet index in ttmp6
ttmp6.packet_index is reported as 0 for all waves, regardless of the
dispatch packet position in the queue, due to an issue in the clearing
of the previous trap_id and saved status.halt bit.

Fixed TTMP6_SAVED_STATUS_HALT_MASK to only be one bit, 1<<29.

Change-Id: Ia4934e51123a40d71de658efc387a1f3a6344f05


[ROCm/ROCR-Runtime commit: ef1955ad42]
2021-07-16 18:03:26 -05:00
Jay Cornwall b65eb065c3 Report union of wave errors as a bitmask in trap handler
Also fix incorrect PC increment on host trap.

Change-Id: Ic8bbf2b90f9f879ba62b558b909d010a8939a663


[ROCm/ROCR-Runtime commit: f3d942b67f]
2021-07-16 18:03:26 -05:00
Jay Cornwall 690aef5c9e Clear queue error code when not handling exceptions
If left non-zero the event loop will keep reinvoking the callback,
preventing AqlQueue::ExceptionHandler from running.

Change-Id: If85fbaf62f04ffd327ecf9d649aa23afad4442ce


[ROCm/ROCR-Runtime commit: 8d4608ed0e]
2021-07-16 18:03:26 -05:00
Jay Cornwall 06cc198b57 Add new trap handler, bump debug API version
Also fix hsaKmtRuntimeEnable error handling. Continue if ioctl fails.

Change-Id: I754ccba5910ccfef6f1ada1415593ef89ce33aba


[ROCm/ROCR-Runtime commit: 7e4088309d]
2021-07-16 18:03:26 -05:00
Sean Keely 0bb9344674 Initialize new exception handler state.
Change-Id: Ibcb699760837b9ec1508d6af948a272a81ddcd02


[ROCm/ROCR-Runtime commit: 0159aea4c9]
2021-07-16 18:03:26 -05:00
Sean Keely 5c9500d50b Support debugging hw exceptions.
Change-Id: I9780147294af2e9457fa54693580735452ee2ae6


[ROCm/ROCR-Runtime commit: 206e87d28b]
2021-07-16 18:03:26 -05:00
Sean Keely 183531963c Always execute the first satisfied async signal handler.
Certain special signals do not carry their updates via their signal
value.  These signals are wrappers around special KFD events, of
which the only current instance informs about VM faults.  We either
need to check each signal for this special event type or rely on
the checking done in hsa_amd_signal_wait_any.  Since there will always
be a small number of these signals it doesn't make much since to
penalize the performance path with this check.  Additionally we know
that the signal indicated by hsa_amd_signal_wait_any is satisfied so
don't need to recheck it's conditions.

Change-Id: I9fc6298300ad543d823ecd28ca8fab4ad26c23ef


[ROCm/ROCR-Runtime commit: 3d6a18b67c]
2021-06-24 02:45:31 -05:00
Sean Keely 4f2d6f763b Correct clang build error.
Clang now warns about set but unused variables.  It also now
recognizes -Wno-error=unused-but-set-variable so this patch moves
that option back to the general options list.

Change-Id: Id800e87eb688b9441b14380e2246ad586179f31a


[ROCm/ROCR-Runtime commit: 26808295f8]
2021-06-23 15:04:58 -05:00
Sean Keely d139055431 Locate kernel directory from device name.
Search child directories when locating device code.

Change-Id: I51515f002ad60878a2be0b6e9ee6416c67a1d799


[ROCm/ROCR-Runtime commit: 74bcd6ee90]
2021-06-17 22:57:21 -04:00
Sean Keely b8533eec6d Add agent info query for HSA_AMD_AGENT_INFO_SVM_DIRECT_HOST_ACCESS.
Allows determining if the host can directly access HMM memory that
is physically resident in vram.

Change-Id: Ie452eedd0e27fe1b511afd416f5a1cd01b3d84e8


[ROCm/ROCR-Runtime commit: 9e53cab613]
2021-06-17 03:45:26 -04:00
Sean Keely 0100fa968c Allocate any size vram request through the fragment allocator.
Enables the fragment allocator to handle >2MB allocations, maintaining
good TLB alignment.  Prior code contained a bug that caused the effective
API granule for vram allocations >2MB to be bumped to 2MB.

Also adjusts the block cache's block retention heuristic to not
count discarded blocks as in use.  This will reduce block retention
when a significant amount of large blocks or IPC is in use.

Change-Id: I30bd85eb87951df822211f799d9cfe579ab109c6


[ROCm/ROCR-Runtime commit: 8adbda1c18]
2021-06-10 19:30:54 -05:00
Sean Keely 426a9f4df0 Remove unused GpuAgent.local_region_ member.
Change-Id: I99526e6b1f64e810f7fed5d922c540d252a46d80


[ROCm/ROCR-Runtime commit: 981c6bd8c3]
2021-06-07 19:59:58 -04:00
Sean Keely c35562e45b Add debugging checks for packet type in the scratch handler.
Change-Id: I84a6f18548ac39349595e3a1c8a5a9ff27d4e178


[ROCm/ROCR-Runtime commit: bd59789f0b]
2021-06-07 15:36:18 -04:00
Sean Keely 67029106d9 Limit reporting of GPU_ONLY signal waits from host.
Such waits must spin but are functionally correct.

Change-Id: I4992852f04da788495c6f566c46a3dffaf38397c


[ROCm/ROCR-Runtime commit: 3323e18f3e]
2021-06-03 15:26:40 -05:00
Sean Keely 2c4b216db6 Allow limiting debug warning messages.
Add macro debug_warning_n to stop printing a message after
N instances.

Change-Id: Id5f84b11eb63b3a20bd2bcb2ea8f10a066b457ef


[ROCm/ROCR-Runtime commit: ca8387768e]
2021-06-03 15:25:55 -05:00
Sean Keely 76c8067786 Improve async handler performance.
Under high async handler load signal retention and event sorting
become bottlenecks.  This change processes more handlers in a
single pass to amortize wait_any overheads.

Change-Id: I8b276e102db647e3858e120547aa0c6fca85ab4c


[ROCm/ROCR-Runtime commit: 6b398eb72c]
2021-06-02 23:52:07 -05:00
Sean Keely 530385e8e8 Add Read Mostly attribute support.
Change-Id: Ia7c60edacb892cbf14bdb50350c0a0a627e53964


[ROCm/ROCR-Runtime commit: f6c2aa1c78]
2021-06-01 23:39:12 -05:00
Sean Keely 2cf9abaa06 Recognize gfx1034 in image device family id.
Change-Id: I2a529b5e91fae9f3697ddbccaaf0e97c87d59837


[ROCm/ROCR-Runtime commit: 7361fc18ee]
2021-05-25 16:43:20 -05:00
Chris Freehill 75190cb229 Add gfx1034 support
Change-Id: I2d4bfcb9012704daf7de10739c966827bd2a09e2


[ROCm/ROCR-Runtime commit: 8cb686fdc5]
2021-05-25 16:43:16 -05:00
Mike (Tianxin) Li f60550207c Revert "Get the size of VGPR and SGPR register file"
This reverts commit 2ae10ae479.

Change-Id: I9988218ad1d2b6182d92aad09d18a95e77e46c01


[ROCm/ROCR-Runtime commit: 36c54c63f7]
2021-05-18 15:01:30 -04:00
Mike Li 2ae10ae479 Get the size of VGPR and SGPR register file
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: Ifa515ad7e1df1dd27f25f1e919b0053049531063


[ROCm/ROCR-Runtime commit: 344ed757e0]
2021-05-13 11:54:41 -04:00
Sean Keely 9827a48f6b Update README.md
Remove reference to finalizer and images libs.

Change-Id: Ic673da77bb13dea77b477d7bfe799fc2c028ab2a


[ROCm/ROCR-Runtime commit: 5f0e39df63]
2021-05-10 17:53:19 -05:00
Sean Keely 4cad5320c9 Correct merge error.
Old memory properties info name used after removing branches.
This caused the CPU coarse grain pool to initialize with random
bits.

Change-Id: I397bc5ecf09fab69bdf1d7fafadcf54d71b64070


[ROCm/ROCR-Runtime commit: 0439dc90cd]
2021-05-06 18:40:56 -05:00
Sean Keely aff2056789 Add exception forwarding to tools API callbacks.
Prevents poorly written tools which throw in tools interface
callbacks from causing ROCr to catch and return a generic error
code.

Change-Id: I2f5bf7104dc7d4ee688eb48423c7ffdb06bd7702


[ROCm/ROCR-Runtime commit: c9ce27a640]
2021-05-04 02:14:20 -05:00
Sean Keely ed740bfebc Correct scratch in use computation.
Old logic did not consider memory held in the scratch cache to be
free when deciding whether or not to reclaim.

Change-Id: I7f7c7549c72d743edbf7c53489fe9a453dc4177a


[ROCm/ROCR-Runtime commit: 0b7d9db964]
2021-04-22 20:07:25 -04:00
Sean Keely 13c8e534d3 Report HMM driver support status.
Implements HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED.

Change-Id: If5182edcc1fa067fa514aa2c1bd326c4c42d1b64


[ROCm/ROCR-Runtime commit: ee8b1b64ad]
2021-04-21 21:44:42 -05:00
Sean Keely 1aae64e251 Revert "Revert SVM and XNACK support."
This reverts commit da41352a93.

Conflicts:
	opensrc/hsa-runtime/core/util/flag.h

Change-Id: I16daf41588e6139126d66af54b0693de2e7e39f3


[ROCm/ROCR-Runtime commit: 77046a1aaa]
2021-04-21 14:49:43 -05:00
Sean Keely 58813ab760 Ensure ROCr created threads have no CPU affinity.
Change-Id: I53828dbaf055b65b61bdd11f0eadfcc806596821


[ROCm/ROCR-Runtime commit: 3127d1ffdc]
2021-04-19 19:47:06 -05:00
Konstantin Zhuravlyov c79bc5fb22 Update documentation of hsa_ven_amd_loader_iterate_executables
Clarify behavior of hsa_ven_amd_loader_iterate_executables during
concurrent calls of executable creation and destruction.

Change-Id: Idc3e3981d4fcc0d58d9f1b7a7578deed20aa490b


[ROCm/ROCR-Runtime commit: 1bdc2f6854]
2021-04-16 20:51:48 -04:00
Konstantin Zhuravlyov b095fec147 Expose iterator for executables
Change-Id: I0c5d39fc33c15a6eb8ee10ff181c2dcf2e042675


[ROCm/ROCR-Runtime commit: 15e54d684d]
2021-04-16 20:51:48 -04:00
Konstantin Zhuravlyov 1c7abea61a Remove loaders.c/hpp
Change-Id: Ida507c2dd2de9172f250172f9c45a639953cb412


[ROCm/ROCR-Runtime commit: e826c365ea]
2021-04-16 20:51:48 -04:00
Mengbing Wang a69a3946c9 Add allocation size limit of 1/2 vram size in rocrtstPerf.Memory_Async_Copy test.
Add the hard limit of allocation size to be 1/2 available vram
to avoid allocation failure when allocation size equals to vram size.

Add printing block size in each round to report progress for long running
test

Add the block size skip info in result form(if any tests skipped).

Affected test:
rocrtstPerf.Memory_Async_Copy

Data Size             Avg Time(us)         Avg BW(GB/s)          MinTime(us)          Peak BW(GB/s)
  128M             638759.570200              0.195692		637569.991000               0.196057
  256M            1270058.822400              0.196841		1268425.758000               0.197095
Notice: Data Size larger than 512M is skipped due to hard limit of 1/2 vram size

Signed-off-by: Mengbing Wang <mengbing.wang@amd.com>
Change-Id: I4c4cea74a608272cc29d222b9399af26b34d7473


[ROCm/ROCR-Runtime commit: cf10c3bc35]
2021-04-16 02:23:48 -04:00
Mike Li 3258d72d3b Get GPU cache information from KFD
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: I8dc8c97ae81c3747b7cd88cf2cdb7a9e4694a88d


[ROCm/ROCR-Runtime commit: d077606e22]
2021-04-13 10:29:34 -04:00
Tony Tye e20cccb6e4 Add support for gfx909 and gfx90c
Change-Id: I88158789cdda44a173e3ca26d2c96b8e0ea0e221


[ROCm/ROCR-Runtime commit: a97c14abea]
2021-04-08 22:37:30 +00:00
Sean Keely 2b25548eb0 Remove emulator SRAMECC override controls.
Change-Id: Iea9e7870dbf517032f34cebec673c90226b96960


[ROCm/ROCR-Runtime commit: 243e29ba8e]
2021-04-02 02:11:05 -04:00
Sean Keely da41352a93 Revert SVM and XNACK support.
KFD is not ready yet.

Change-Id: I61deb292ddb92185d33504c2115169888d56e211


[ROCm/ROCR-Runtime commit: 5bd153974d]
2021-04-02 02:10:59 -04:00
Ramesh Errabolu 29fa097a82 Override Cpu-Gpu link-weight for Alebaran until a proper fix is available
Change-Id: I1fbc38b788f71cc9c9fc62295223286004689bf9


[ROCm/ROCR-Runtime commit: 25f3dc305f]
2021-04-02 02:10:54 -04:00
Sean Keely dd42ca6dbe Squash merge of cfreehil/amd-temp-gfx90a onto amd-staging.
Includes some workarounds and HMM.
Conflicts:
	opensrc/hsa-runtime/core/runtime/amd_topology.cpp
	opensrc/hsa-runtime/core/util/flag.h

Change-Id: I22976f07964a43dbb228a6231777dbd599112b8d


[ROCm/ROCR-Runtime commit: 7333c77e22]
2021-04-02 02:10:15 -04:00
Sean Keely ea1f545fcc Correct hsa_agent_iterate_isas return code for CPUs.
When no isa's are available no callbacks should be invoked.  This
is not an error and should return success.

Change-Id: Ie4048aa8cbe5c3fdf5431f6a865021549ecf8a13


[ROCm/ROCR-Runtime commit: 4197461b7f]
2021-04-01 00:08:22 -04:00
Sean Keely 465ada0234 Block ROCm 4.1+ running against 4.0 and prior kfd.
Sramecc is misreported in kfd 4.0 and prior.  To prevent possible
corruption due to d16 instructions, deny use of gfx906 with older
kfds and correct misreport for gfx908.  Denial of gfx906 may be
overridden by setting HSA_IGNORE_SRAMECC_MISREPORT=1.

Change-Id: I7d5c3a716fad01c348f8b88cd508cedbf914c989


[ROCm/ROCR-Runtime commit: 45fbe5b192]
2021-04-01 00:03:32 -04:00
Cole Nelson 7cd0a8435b hsa-runtime: add ENABLE_LDCONFIG to support multi-version install
Depends-On: I58fdf1d0b4e864b5a61ffe8e335d430d424811ab
Change-Id: I0cb6f8711ea5033e84b7e45ce20e7e23d84005c3
Signed-off-by: Cole Nelson <cole.nelson@amd.com>


[ROCm/ROCR-Runtime commit: 72fa4a17fa]
2021-03-26 18:37:04 -04:00
Mengbing Wang 97918cbd7c limit the memory allocation on vram to 3/4 of vram size.
1. As we cannot ganrantee that 100% apu vram are free to be allocated, limit
the allocation size be no more than 3/4 of vram size.
2. Keep the old 1GB allocation limit for dGPU case.
3. Add the alignment check for alloc_size.

Affected tests:

rocrtstStress.Memory_Concurrent_Allocate_Test
rocrtstStress.Memory_Concurrent_Free_Test

Change-Id: Id0023de132024d02f80980ae4237d9d74d9e27d3
Signed-off-by: Mengbing Wang <mengbing.wang@amd.com>


[ROCm/ROCR-Runtime commit: d5855c1658]
2021-03-23 18:59:42 +08:00