- Symlink creation is corrected only for deb packages
- It is follow up package of http://git.amd.com:8080/c/hsa/ec/hsa-runtime/+/334403
- configure_file() is called to update the scripts with proper cmake variable values
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: I0e833ead265166411e83593fd57265a9ab356904
[ROCm/ROCR-Runtime commit: 241cdfdd01]
CPack now incorrectly adds two copies of directory symlinks when
building Debian packages. This causes dpkg to see a file conflict
and fail installing.
The correct long term solution is to remove the symlink and use a
flat directory structure. This patch adds the symlink in the post
install script as a workaround until we can switch to flat layout.
Change-Id: I879b6cbc2661c19df3db639cb42fba0972fddb93
[ROCm/ROCR-Runtime commit: f3b532b42d]
Checks for an IPC memory error and updates comments relevant
to rocr_visible_devices.
Change-Id: I9d2f2dd27f3fa04881d17387cce2692bc046edb2
[ROCm/ROCR-Runtime commit: a1c2439213]
HDP will now be used for coarse grain kernarg so needs to be
reported without consideration of fine grain vram over pcie.
Change-Id: I648167299faa583876a3d8685c3b3c4d8d31ebf9
[ROCm/ROCR-Runtime commit: 9c35780836]
Setting to 1 prevents the scratch handler from reducing peak occupancy.
Scratch allocations that would normally reduce peak occupancy will
instead fail.
Diagnostic for TF and PyTorch.
Change-Id: I2d7ea47077eb5cf708251c8aa3fd183ad4261be0
[ROCm/ROCR-Runtime commit: dc165c92bc]
scratch_used_large_ was uninitialized leading to the observed hang.
DynamicScratchHandler would wait for a large scratch release despite no
large scratch having yet been allocated. Fixes .
The patch also removes a potential race between AddScratchNotifier and
ReleaseQueueScratch. The race condition does not exist today since both
scratch alloc and release run on the same thread. The changes will
prevent this potential race from manifesting if the async event handler
is ever updated to use multiple threads.
Also enhances scratch occupancy reduction reporting. Reporting now
prints the initial request size as well as the allocated size and the
effect on occupancy this has. Occupancy is computed in terms of the
requesting dispatch grid size so may be >100%.
Change-Id: I0fc5ee01467ff4c29bdd25d545177c97862c3bd9
[ROCm/ROCR-Runtime commit: 6c556002d8]
Ensures that all CPU agents will have a pool handle to allocate
system memory. These pools will have no numa binding since the
node their owning Agent represents has no installed memory.
Change-Id: I9f72b455d633646839753c6719ff7f6a4c41f7c4
[ROCm/ROCR-Runtime commit: d53fe07687]
- This new path is required when libhsaruntime.so is referred
from the top level ROCm lib directory.
- Once ROCm stack lib/lib64 structure is flatten, RUNPATH in all
the libraries needs to be updated.
Change-Id: I369131ce93e14958ec57a54701671f2bfd8d522a
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
[ROCm/ROCR-Runtime commit: e931fd424b]
Attribute optimize(0) doesn't appear to be helpful helpful. This
prevents optimization in the function but not at call sites to the
function. The function may still be inlined since it has no side
effect (in some cases that we currently don't support).
Having a side effect prevents a call site optimization that allows
removal of a noinline function call with no side effect. Call site
optimization should only happen (in GCC at least) when using whole
program optimization so this may be stronger than we strictly need.
Also added _amdgpu_r_debug to the exported symbol list (global) and
switched to the standard macro for an exported symbol (HSA_API).
Without being in the global list the debugger will not find this
symbol if the binary has been stripped.
Change-Id: Ieb00175ccc55fda4491deee44711cd55b3f24aeb
[ROCm/ROCR-Runtime commit: 3e9aca0f34]
Adding patch number based on ROCM build/release to have unique
file name for libraries across multiple versions of ROCM.
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: I58d665b0e7d577b5bd7a6000d1202a0242672727
[ROCm/ROCR-Runtime commit: 54d94d02bd]
Lack of cache controls only allow operating SDMA at
agent scope. All copy APIs are defined at system scope so may
result in data errors.
Change-Id: I9cd10007defddcbf8feb14a2e3daa1ba17c0489f
[ROCm/ROCR-Runtime commit: 22a601292d]
Queues should transition to ref counting for all queues eventually.
That cleanup will be part of shared queue pooling support.
Change-Id: I217ff5d573156678b9559da6fb81baa8cd31c617
[ROCm/ROCR-Runtime commit: 0a43a107b1]
Temporary workaround for 2.10 release. RCCL, compiler, or firmware
must be corrected and this code reverted before another ASIC release.
Change-Id: I27851353289b93df9acb72d28b8c6ccb9f7f7d7a
[ROCm/ROCR-Runtime commit: 35c1ffa863]
We should be using bin/clang, not the build/lightining/bin/clang since
build/ is the project's internal build directory. This patch corrects
this where possible.
However, lightning does not install all it's end user files under out/
so some headers can not be found anywhere in out/ in an incremental build.
This header (opencl-c.h) if fetched out of the lightning source tree if
necessary.
Change-Id: I083d8b27bb39dd615fba3bb0711a789318f95e77
[ROCm/ROCR-Runtime commit: f62017f4a5]
Debugger path is taken for (trap_id >= 3) and single step exceptions.
Other traps/exceptions behave as before.
Change-Id: I276c0eb69953709968353a57717ee017d22348a2
[ROCm/ROCR-Runtime commit: 78e754935c]
Strip should only apply to the output target library. Symlinks
with .so endings which will be relocated during install will cause
strip to fail, aborting the build.
Change-Id: Ieb598c2cec5277d9d14c8afa88b91ca2c7f4412d
[ROCm/ROCR-Runtime commit: 851ee799c4]
Using branch point for count since last change since we don't
have questions answered on tags yet.
Removed unused CMake files.
Restructured CMake to use the cache rather than only commandline
and be ccmake & cmake-gui friendly.
Dependency search paths are added for the Repo tree layout.
Search paths still needed for install paths.
Simplified packaging. hsa-ext-rocr-dev package and contents now
build from the package CMake rather than being 3 separate projects.
Not applying new version number or new install paths!
Change-Id: Ibea50dc8a6ab091e91857f78833f5379a4511547
[ROCm/ROCR-Runtime commit: 6c3acda664]
- Use new buffer resource descriptor layout
- Handle wave32 scratch allocation error from CP
- Make wavefront size a property of scratch allocation requests
- Repurpose wave64-specific amd_queue_t.scratch_workitem_byte_size field
- Clear index_stride field in V# on gfx10, calculated per-dispatch by CP
Change-Id: If2acdf6430772abd4d6a8c792fc8c11260764dda
[ROCm/ROCR-Runtime commit: f8d0ccd159]
This is mostly un-edited from Perforce. We will make other required
edits in future commits.
Change-Id: I55a809f2f23f03d60e4dcd1fb947ad558e737027
[ROCm/ROCR-Runtime commit: 08841faf4c]
doorbell_queue_map should always be allocated or we will need to
add branches around all accesses.
Change-Id: I994c0eaf4be62c1a4a37bd06894272dba1fc1da6
[ROCm/ROCR-Runtime commit: f9d3796db8]