Граф коммитов

419 Коммитов

Автор SHA1 Сообщение Дата
Sean Keely 2fbacccaed Correct handling of failed lazy_ptr constructors.
Contructor function must not be attempted twice even if the construction
attempt returns nullptr.

Change-Id: I75353e5e511769a96e4332f7f60887f6559c1cd5
2020-05-08 22:23:46 -04:00
Tony bccb25fc33 Make HSA_QUEUE_TYPE_COOPERATIVE a queue type value
- Correct defintion of HSA_QUEUE_TYPE_COOPERATIVE to be a queue type
  and not a bit mask.
- Correct implementation of hsa_queue_type_t to treat is as an
  enumeration type and not a bit mask. In particular
  HSA_QUEUE_TYPE_COOPERATIVE is a distinct queue type that uses the
  multi producer protocol, and is not a bit set value.

Change-Id: I9415be8853671e5511e16e306caf16020e8c84af
2020-05-07 19:24:19 -04:00
Matt Arsenault 96d4140609 Do not use llvm-dis to pick the triple
There are a couple problems with this. First, llvm-dis is an unstable
llvm development tool and 3rd party users should generally not rely on
it. The text format is unstable, and the regex here isn't even
explicitly looking for the target triple field, so it could
accidentally find something else. Second, picking the target to
compile based on the library you are linking is a fundamentally
backwards decision. The target you're compiling for changes the
library you would want to link. The device libraries are only ever
compiled with amdgcn-amd-amdhsa. If we had a second triple, this
should be explicitly building for any it cares about.

Change-Id: I3bae8398f60f78df61ab2177aa9e83f47ec6dea4
2020-05-06 13:28:39 -04:00
Laurent Morichetti df03a377f5 Check all s_endpgm instructions
The ROCR trap handler should check for all end program instructions
and not halt on them. Mask off the imm16 before comparing the
instruction to the s_endpgm opcode.

Change-Id: I669ffc7f5b699d7daf0c8ec5761ed7bb193f07a7
2020-05-04 19:52:53 -04:00
Sean Keely 3da81968cb Update addrlib with latest Mesa source.
Change-Id: Idd8cdaac9ad370397d62f6a32687ca7bc7d7462b
2020-05-01 20:33:09 -04:00
Sean Keely 1440da3e15 Remove dead code from image_manager_xx.cpp
Image swizzle mode will be set by the preferred surface info
function.

Change-Id: I41e639be53cafbb4db6cf15c159aa2bd457ec5be
2020-05-01 20:32:45 -04:00
Sean Keely 7e3db20826 Move Images code to hsa-runtime folder
Change-Id: I53c1845d985ac3e9708d952865009c0021f3bb4f
2020-04-30 19:35:57 -05:00
Ramesh Errabolu 1a3ee2fd03 Update Image code base to use addrlib from mesa
Change-Id: I31355d7fc3db423c16772cf105e9b6b59a3a6307
2020-04-30 19:35:56 -05:00
Laurent Morichetti 00da82f951 Add debugger support for wave halted at launch
New trap handler ABI: Record in ttmp11[8:7] the event that caused the
trap handler to be entered. We currently record 2 events, trap_raised
if an s_trap instruction was executed, or excp_raised if an exception
(MEM_VIOL or ILLEGAL_INST) was raised.

Change-Id: Ie278c8277437b3b67c2737dcd1a12fe6511df428
2020-04-29 19:29:56 -04:00
Kent Russell 33133ebd07 CMakeLists: Support static building of hsa-runtime
Remove the hard-coding of "SHARED" as the lib type, and move any
SO-specific linking to only happen if the .so exists in the first place

Change-Id: I3f0bfd5c03f19b2425423b4dc8eed8fd87acc1d6
2020-04-27 20:52:07 -04:00
Sean Keely 675f73cda9 Adapt to new LLVM location in repo build.
This will reenable incremental PSDB builds.

Change-Id: I2311c124b06b544202f7c1db31b6607f2580194e
2020-04-27 17:58:35 -04:00
Austin Kerbow 87202d4408 Update IsaRegistry for backend changes
Changes in the compiler are being made to add controls for XNACK and SRAM ECC
for all targets which can support these features. By default the conservatively
correct settings of XNACK on and SRAM ECC on will be used. This change is to
facilitate these backend updates.

Change-Id: I2fd6b6bc1d32937737e7f56d8e08c70fe781c745
2020-04-25 04:45:28 -04:00
Sean Keely 7712c7e743 Correct IPC fragment validation.
IPC create must only be used on whole ROCr allocations.
Fragments were allowing handle creation with offsets.

Change-Id: I1faa96d36bc7a6199bdc2e3ff1b8871d1a36a2fa
2020-04-24 00:08:53 -04:00
Sean Keely 3fe891d5da Suppress Finalizer loading attempts.
This has been the default mode for a while now since we don't
distribute or build the finalizer.  Removing the attempt cleans
up debug mode messages that are causing confusion.

Change-Id: I8162c95abd5bbedaa22b90191f7a384a34c388ae
2020-04-18 00:06:42 -04:00
Sean Keely 9fe44ed675 Don't lock KFD allocated system memory.
Lock API suceeds but the GPU still faults on the address.
This should be fixed in Thunk and/or KFD as well.

Change-Id: I8b2fbcae61ab181e4fe7f0b64e43a5f0772efb24
2020-04-17 21:45:01 -04:00
Ramesh Errabolu 30f46e4e24 Stop building and packaging Tools library
Change-Id: Iee430c24e32ea7412f21564fe8970749e4954b91
2020-04-15 13:58:33 -04:00
Laurent Morichetti 5f783494f1 Return a file URI for elf images in shared objects
Iterate the loaded shared objects to see if the given elf image binary
is part of a loaded segment.

Change-Id: I074cacd99eb5b59f883f4ce2bd901e0e35a660b8
2020-04-14 15:22:43 -04:00
Nathan O 6d5781bb14 Fix hsa_amd_agents_allow_access documentation
- Update the documentation comment in hsa_ext_amd.h, which contained
   contradictory and incorrect information about an argument to the
   hsa_amd_agents_allow_access function.

Change-Id: I60b0dbbdc761078cd81906bc2c63a27d7e6b53e1
2020-04-10 18:26:13 -04:00
Ramesh Errabolu 89f7ef224c Extend Rocr Visible Devices functionality to include UUIDs
Change-Id: Ia2892e4033717556a422fe33dec0294fe2ca9e28
2020-04-09 00:42:53 -05:00
Ramesh Errabolu 45958c727d Extend ROCr to surface UUID of GPU devices that suppport
Change-Id: I478db68d69a01578770403fa695f9e6391637573
2020-04-08 19:19:22 -05:00
Pruthvi Madugundu 241cdfdd01 Updating the hsa include directory symlink creation
- Symlink creation is corrected only for deb packages
- It is follow up package of http://git.amd.com:8080/c/hsa/ec/hsa-runtime/+/334403
- configure_file() is called to update the scripts with proper cmake variable values

Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: I0e833ead265166411e83593fd57265a9ab356904
2020-04-01 02:17:11 -07:00
Sean Keely f3b532b42d Fix Debian package build on CPack 3.10.
CPack now incorrectly adds two copies of directory symlinks when
building Debian packages.  This causes dpkg to see a file conflict
and fail installing.

The correct long term solution is to remove the symlink and use a
flat directory structure.  This patch adds the symlink in the post
install script as a workaround until we can switch to flat layout.

Change-Id: I879b6cbc2661c19df3db639cb42fba0972fddb93
2020-03-20 21:35:32 -05:00
Sean Keely a1c2439213 Update asserts and comments for pointer info.
Checks for an IPC memory error and updates comments relevant
to rocr_visible_devices.

Change-Id: I9d2f2dd27f3fa04881d17387cce2692bc046edb2
2020-02-24 09:08:48 -05:00
Sean Keely 9c35780836 Report HDP registers at all times.
HDP will now be used for coarse grain kernarg so needs to be
reported without consideration of fine grain vram over pcie.

Change-Id: I648167299faa583876a3d8685c3b3c4d8d31ebf9
2020-02-24 09:08:17 -05:00
Ramesh Errabolu 627991b1c1 Update how code references publicly available ROCr headers
Change-Id: I357c51eb713a23704d4fee71081be46a73a71806
2020-02-21 20:01:11 -05:00
Sean Keely dc165c92bc Add env key HSA_NO_SCRATCH_THREAD_LIMITER.
Setting to 1 prevents the scratch handler from reducing peak occupancy.
Scratch allocations that would normally reduce peak occupancy will
instead fail.

Diagnostic for TF and PyTorch.

Change-Id: I2d7ea47077eb5cf708251c8aa3fd183ad4261be0
2020-02-21 17:09:26 -05:00
Sean Keely 6c556002d8 Correct scratch retry logic.
scratch_used_large_ was uninitialized leading to the observed hang.
DynamicScratchHandler would wait for a large scratch release despite no
large scratch having yet been allocated.  Fixes .

The patch also removes a potential race between AddScratchNotifier and
ReleaseQueueScratch.  The race condition does not exist today since both
scratch alloc and release run on the same thread.  The changes will
prevent this potential race from manifesting if the async event handler
is ever updated to use multiple threads.

Also enhances scratch occupancy reduction reporting.  Reporting now
prints the initial request size as well as the allocated size and the
effect on occupancy this has.  Occupancy is computed in terms of the
requesting dispatch grid size so may be >100%.

Change-Id: I0fc5ee01467ff4c29bdd25d545177c97862c3bd9
2020-02-21 17:09:26 -05:00
Sean Keely d53fe07687 Insert zero sized pool on CPU agents without attached memory.
Ensures that all CPU agents will have a pool handle to allocate
system memory.  These pools will have no numa binding since the
node their owning Agent represents has no installed memory.

Change-Id: I9f72b455d633646839753c6719ff7f6a4c41f7c4
2020-02-21 17:05:10 -05:00
Saleel Kudchadker c57f3da1dc Reset link_map map in the constructor
Change-Id: I8a6ad3bc0fca790dec2992cacf9288068b3bcaa3
2020-02-19 15:29:35 -08:00
Pruthvi Madugundu e931fd424b Adding RUNPATH to find libhsakmt.so for Centos and SLES
- This new path is required when libhsaruntime.so is referred
from the top level ROCm lib directory.
- Once ROCm stack lib/lib64 structure is flatten, RUNPATH in all
the libraries needs to be updated.

Change-Id: I369131ce93e14958ec57a54701671f2bfd8d522a
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
2020-02-14 17:43:29 -05:00
Sean Keely 3e9aca0f34 Support stripped binaries and remove unneeded attributes.
Attribute optimize(0) doesn't appear to be helpful helpful.  This
prevents optimization in the function but not at call sites to the
function.  The function may still be inlined since it has no side
effect (in some cases that we currently don't support).

Having a side effect prevents a call site optimization that allows
removal of a noinline function call with no side effect.  Call site
optimization should only happen (in GCC at least) when using whole
program optimization so this may be stronger than we strictly need.

Also added _amdgpu_r_debug to the exported symbol list (global) and
switched to the standard macro for an exported symbol (HSA_API).
Without being in the global list the debugger will not find this
symbol if the binary has been stripped.

Change-Id: Ieb00175ccc55fda4491deee44711cd55b3f24aeb
2020-01-21 20:08:02 -05:00
Freddy Paul 4b2256ac21 ADD libhsakmt.so RUNPATH for SLES
For SLES libhsakmt.so is located in <ROCm install>/lib64

Change-Id: I038dd80b65b4a493ac37981649b02f1b35caea88
2020-01-16 22:20:14 -05:00
Laurent Morichetti 19e1fb3a4e Fix a build error when compiling with clang
Check __clang__ before __GNUC__ as clang defines both.

Change-Id: I9963f8e0665efb4cb08bd3886fb38fee42dd9861
2020-01-15 18:52:53 -08:00
Srinivasan Subramanian 54d94d02bd Avoid shared library conflicts across multiple ROCM version
Adding patch number based on ROCM build/release to have unique
file name for libraries across multiple versions of ROCM.

Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>

Change-Id: I58d665b0e7d577b5bd7a6000d1202a0242672727
2020-01-15 01:08:23 -05:00
Qingchuan Shi d63886190f fix optimize(0) for clang.
Change-Id: I83bc57d42815f37445ae97bf6950147e3358ac45
2020-01-13 20:53:40 -05:00
Ramesh Errabolu be5782e198 Cpack rule encoding dependencies was broken
Change-Id: I2b8df7d255cdffd6b42713f0b59df2aeef83a607
2019-12-23 10:51:55 -06:00
Sean Keely 22a601292d Disable SDMA on gfx10.
Lack of cache controls only allow operating SDMA at
agent scope.  All copy APIs are defined at system scope so may
result in data errors.

Change-Id: I9cd10007defddcbf8feb14a2e3daa1ba17c0489f
2019-12-20 17:25:47 -06:00
Ramesh Errabolu d015d78de3 Support the building and packaging of legacy ROCr tools by itself
Change-Id: I2247bf7a46ee93495340f7b2603b09dc6b667443
2019-12-18 19:20:03 -06:00
Chris Freehill 4c22962024 Import PAL version of addrlib with initial gfx1xxx support
Change-Id: I439930a5cbf5b13a359ec164e75c6828af8c668d
2019-12-12 21:38:01 -06:00
Sean Keely 0a43a107b1 Initial GWS queue support.
Queues should transition to ref counting for all queues eventually.
That cleanup will be part of shared queue pooling support.

Change-Id: I217ff5d573156678b9559da6fb81baa8cd31c617
2019-12-09 21:21:17 -05:00
Ramesh Errabolu 144017e148 Add package dependencies for Images and Tools
Change-Id: Id77ba0e81d3b3e872153cdd7680338dd70319026
2019-12-06 16:38:21 -06:00
Sean Keely d2a50a0048 Allow disabling scratch reclaim.
Debug and RCCL NPI assist feature.

Change-Id: I2cb76f0a086fa341465df3ede26965ab713bc3b4
2019-11-20 02:41:58 -05:00
Sean Keely 35c1ffa863 Raise large scratch allocation limit for RCCL.
Temporary workaround for 2.10 release.  RCCL, compiler, or firmware
must be corrected and this code reverted before another ASIC release.

Change-Id: I27851353289b93df9acb72d28b8c6ccb9f7f7d7a
2019-11-20 02:41:27 -05:00
Laurent Morichetti 5774d9162b Fix a typo in INSN_S_ENDPGM_OPCODE encoding.
The correct s_endpgm instruction encoding is 0xBF810000.

Change-Id: I03f304762dcaced5bf3fa4f069da7a0b287d1cd2
2019-11-12 11:54:17 -08:00
Qingchuan Shi 16a20cfb8c Adding code object list in loader.
Change-Id: Iab3541287bd56276fd32615ee59fcd590de84ca0
2019-10-30 20:31:51 -04:00
Jay Cornwall 78e754935c Merge debugger trap handler into ROCr trap handler
Debugger path is taken for (trap_id >= 3) and single step exceptions.
Other traps/exceptions behave as before.

Change-Id: I276c0eb69953709968353a57717ee017d22348a2
2019-10-30 13:56:06 -04:00
Sean Keely 851ee799c4 Correct strip command.
Strip should only apply to the output target library.  Symlinks
with .so endings which will be relocated during install will cause
strip to fail, aborting the build.

Change-Id: Ieb598c2cec5277d9d14c8afa88b91ca2c7f4412d
2019-10-30 01:24:43 -05:00
Sean Keely 6c3acda664 Adapt to new versioning.
Using branch point for count since last change since we don't
have questions answered on tags yet.

Removed unused CMake files.
Restructured CMake to use the cache rather than only commandline
and be ccmake & cmake-gui friendly.
Dependency search paths are added for the Repo tree layout.

Search paths still needed for install paths.

Simplified packaging.  hsa-ext-rocr-dev package and contents now
build from the package CMake rather than being 3 separate projects.

Not applying new version number or new install paths!

Change-Id: Ibea50dc8a6ab091e91857f78833f5379a4511547
2019-10-29 04:21:17 -05:00
Jay Cornwall 906cd84186 Disable SDMA HDP flush on gfx10
Not currently functional, triggering SRBM write protection.

Change-Id: Ib0b832357e3df5a6a0d0b46648515ec9bd70f017
2019-09-14 14:08:47 -04:00
Jay Cornwall e0358d7dc2 Set MTYPE field in SDMA fence command on gfx10
This is the only SDMA command with an MTYPE field.

Change-Id: Ice146ace9c3e8e7aff038e1e004be73c070f48fe
2019-09-14 14:07:57 -04:00