نمودار کامیت

732 کامیت‌ها

مولف SHA1 پیام تاریخ
David Yat Sin cc48dfdbff Use mwaitx when busy-waiting signals
Use mwaitx instructions when busy waiting for signals to reduce CPU
energy usage.
This can be disabled by setting HSA_ENABLE_MWAITX=0

Change-Id: Ic207895a491b2bf6dacba47ef0921df3faad5b5a
2023-02-22 16:55:43 +00:00
David Yat Sin 0ed1568afc Add function for parse CPUID information
Used to detect whether mwaitx instruction is supported

Change-Id: I66fe906325aa523c8815133cf782df3a17a7edab
2023-02-22 16:55:42 +00:00
Ranjith Ramakrishnan 3636d487c9 File reorg backward compatibility message changed to #error
Change-Id: I699dee834865ee573a516d58b8b8faa1da4f288a
2023-02-14 21:46:43 -08:00
Jonathan Kim 30920fc94d Add interface to DMA copy directly to a target engine.
Change-Id: Ic87cfeabb11c1a465f98f3f444d39955f5300525
2023-02-13 13:50:49 -05:00
Jonathan Kim 8f27f495c6 Make SDMA engine availability status queryable.
Report the availability of SDMA engines for memory copies.

Change-Id: Ie31b02d6b65355122bb8c98bc73700a59bee166e
2023-02-13 13:50:49 -05:00
Jonathan Kim 4f283d9bb3 Make the number of per agent SDMA engines queryable.
Change-Id: Iae1cc9b7ec783fdda05f9384f0ad0327ea1a8cc3
2023-02-13 13:50:49 -05:00
Cordell Bloor 5873a78d58 Fix static initialization order
Change-Id: I1d51e150b526d050b988fe5a422644667a561cd7
2023-02-09 13:51:08 -05:00
David Yat Sin 59685f4492 Add flag for external memory allocations
ROCr internally uses the same allocation_map_ list to track memory
allocations that are both for internal allocations and allocations by
users of ROCr library. In some edge cases, the library user would call
hsa_amd_pointer_info on an invalid pointer, but ROCR would return the
pointer as valid because this pointer belongs to a memory range that
was allocated internally within ROCr. Adding a flag to differentiate
between internal and external allocations.

Change-Id: I98c52bd85f3985d1ba1b0e3101d2254b003412cf
2023-02-09 13:21:43 -05:00
Sean Keely 27596aef0c Track size of pending operations in blits.
Track and report the size, in bytes, of pending unexecuted blit
commands.  To be used in copy ganging.

Change-Id: Ia7453ff88571e927df771c6c819b73c17e67708e
2023-02-06 12:38:40 -05:00
Konstantin Zhuravlyov f115a3505c Compile image blit kernels with code object v4
Change-Id: I4b1923fe8f22dda1277409794d0856419228eceb
2023-02-02 17:33:15 -05:00
Shweta Khatri 8aac885318 Fixes hang due to change in order of initialization of libraries
Fixes hang due to change in order of initialization of libraries
that have cyclical dependencies and they call hsa_init() during their
initialization phase.
This implementation looks for a symbol called "HSA_AMD_TOOL_PRIORITY"
across all loaded shared libraries using dynamic section entries of the
loaded lib instead of using dlopen and dlsym for the same purpose.

Change-Id: I4865f2fd18dd186ec311a432ec38fbb5583805d2
2023-01-26 01:17:22 -05:00
David Yat Sin e30be76f37 Add query for IOMMU support
Reporting whether IOMMU V2 is supported.
IOMMU V1 support is not relevant to user, so not reporting it.

Change-Id: I77389484a87a352da9c2f7b2a5d9de264f90ee53
2023-01-19 11:33:21 -05:00
David Yat Sin 722794e258 Add memory pool query to return location
Change-Id: I240b77119d7b8ccfc5ff6a3190d6669d69f243e8
2023-01-19 08:45:05 -05:00
David Yat Sin a4f898ad15 Add env variable to print image SRD contents
Add environment variable HSA_IMAGE_PRINT_SRD to print contents of SRD
registers for image functions

Change-Id: Ifb47a73dcfad8745ee7445e20de96e1021b80bd6
2023-01-13 11:01:04 -05:00
Alexander Turek f7e3782b42 isa: Add fix for hsa_isa_iterate_wavefronts always returns 64
Currently, Wavefront::GetInfo(HSA_WAVEFRONT_INFO_SIZE.. always returns
64. Instead, return the proper wavefront size based on the ISA.

Temporarily, we only return 1 wavefront size for each ISA. As we do not
have mechanism from upper layers to determine correct wavefront when
there are multiple wavefronts supported. We are temporarily
returning 32 for all gfx1xxx cards even though they support 64 as the
kernels for gfx1xxx are compiled for wavefront-32 by default.

Change-Id: Ic6c2917b7e6d3704daf742d243f5ec7f49430de9
2023-01-12 08:40:07 -05:00
Shweta Khatri ed0a1be2c3 Enforce uncached memory on AllocatePCIeRW request
Change-Id: Ib5a624ab979220d50205448ef37b4550672fb97d
2023-01-11 16:52:15 -05:00
Ranjith Ramakrishnan dbf8905dd1 Revert "Remove RPATH/RUNPATH from ROCm libraries"
This reverts commit ac66865385.

Reason for revert:  is blocked due to new proposal. so reverting the changes 

Change-Id: Id9b8cc1560ba3eea6e484e67df3fdc647da9f37d
2023-01-10 13:52:02 -05:00
Shweta Khatri e72329ab76 Fixed GFX11 Texture, Buffer and Sampler Resource Descriptor definitions
Change-Id: I101806f9f91ec2ad78339dabc98375bd09946dd0
2023-01-05 15:40:47 -05:00
Ranjith Ramakrishnan 5c90c762f9 Corrected libelf package name in depends list
libelf1 package contains libelf.so.1. Updated the package name
Improvement: Removed the initialization of cmake_install_libdir in  source code
Build scripts is initializing  the variable to "lib" and passed as build argument

Change-Id: I16a8cdc4c231487410c1114b818e9d01df4854de
2022-12-15 23:30:22 -08:00
David Yat Sin 6bfe57aeb2 Add Stream Performance Monitor(SPM) APIs
Change-Id: I0d48782887814ef245b7e0182e2d5570aa8c3f50
2022-12-08 13:56:29 -05:00
David Yat Sin ecdebef0b9 Add agent info for fw and sdma ucode
Add two new agent info fields:
HSA_AMD_AGENT_INFO_UCODE_VERSION
HSA_AMD_AGENT_INFO_SDMA_UCODE_VERSION

Change-Id: I51cb853724b23a26e945e5c1ac32c16d0cb3bc31
2022-12-07 19:07:31 -05:00
raghavmedicherla 5727a10a1b [hsa-runtime] Modify elfsection checks in amd_elf_image class
Modified If condition checks in GElfImage::pullElf() of amd_elf_image.cpp to
 check using section types instead of a string check.

Change-Id: I1ab92f0a9118fb2382652a1cc900a3150cbee2da
2022-12-05 14:42:02 -05:00
David Yat Sin e39ad34d9c Check for debug support after parsing topology
Thunk keeps an internal cache of system topology that can be used to
speed up subsequent calls to hsaKmtAcquireSystemProperties(). This cache
is cleared by calling hsaKmtReleaseSystemProperties() at the beginning
of BuildTopology().
hsaKmtRuntimeEnable() also calls hsaKmtAcquireSystemProperties() inside
Thunk. Move call to hsaKmtRuntimeEnable() after BuildTopology() so that
we can re-use Thunks internal cache.
Parsing of of topology can take ~150 ms on systems for large number of
nodes.

Change-Id: I741709d49d67d244f5fbd707fe8f01ab923bb153
2022-12-02 11:26:00 -05:00
Shweta Khatri 8751e65b79 Fixed callback method for dl_iterate_phdr api which is called for each loaded shared object
Simplified the callback method. Also fixed the way, loaded shared object were getting appended into a string vector,
which was not being passed to this callback method.

Change-Id: I68661dd73f61a11c42fa92f670e8e7b6ffcb5711
2022-11-21 19:00:34 -05:00
Ranjith Ramakrishnan a34804ed3e Change pragma message to warning
File reorganization feature was implemented with backward compatibility
The backward compatibility support will be deprecated in future release.
Changed the #pragma message to #warning for a smooth transition

Change-Id: Ibaedc1873bc764d25f74d9ca9416077d084e332d
2022-11-17 09:38:24 -08:00
David Yat Sin b9d1ad8604 Revert "Correct limit query return type to match spec ABI."
This reverts commit 7826d4ca2d.

Changing the parameter sizes breaks backward ABI.

Change-Id: Iff14b7c11294f0931f36fcfd42fff11a492d4205
2022-11-14 19:13:58 -05:00
David Yat Sin cb71e2d715 Allow page-aligned len for ipc_memory_create
Previous versions of HIP will call hsa_amd_ipc_memory_create with then
len aligned to granularity. Temporarily allow this so that we go not
break backward compability. Will remove this after 2 releaes

Change-Id: I6b5ac2cad5d32d62c803637cf1a2c6deebc03169
2022-11-09 15:01:47 +00:00
David Yat Sin c1e836b6ab Use paged memory for queues on MEC devices
MES devices need GART mappings and therefore need non-paged memory. But
using non-paged memory introduces performance regression where it can
take over 80 ms to see the signal changes if the memory is in the wrong
NUMA node. Currently, we cannot control NUMA affinity when allocating
non-paged memory. Using non-paged memory allocation only on devices that
have MES scheduler

Change-Id: Ib27fb01d75247aa4f2bb2aa4503c6af5a98afda0
2022-11-04 13:23:21 +00:00
David Yat Sin 0e4c7336ff Use os::createThread to launch SVM profiler thread
Using previous method of std::thread for SVM profiler task was causing
segfaults on thread launch on RHEL 8 if libhsa-runtime library is loaded
using dlopen.

Change-Id: Ic010cd6ae9bc6e6ed0605de02b93f6aae8ed3e97
2022-11-03 10:52:11 -04:00
Jonathan Kim f9edf73cd7 Fix doorbell offset fetch for GFX11
Transient exec usage is not required for GFX11 and will result in a NULL
return of s_sendmsg_rtn if directly returned to exec_lo.

Directly fetch and mask the doorbell ID to ttmp3 for GFX11 instead.

Change-Id: Ie17ed69d68d84ab18869b1c7871a0ed0482cd661
2022-11-02 11:55:37 -04:00
Ranjith Ramakrishnan 76cf5d2edc Add libelf-dev to package depends list
In ubuntu, the package depends list was not showing libelf. Added the same

Change-Id: I713951bd7181f44d667561aaf437f85c6cd783b0
2022-10-31 13:07:55 -07:00
David Yat Sin b4f26534eb No-Op for allow access on imported IPC
If hsa_amd_agents_allow_access is called for an imported IPC handle,
ignore the request as this pointer will already have these pointers
mapped to other GPUs during IPCAttach()

Change-Id: I4bf33ed57e93b5a3ead749d4f87ab6f2750bed58
2022-10-25 22:38:47 +00:00
David Yat Sin 18547173e9 Early return for invalid pointer queries
If a user queries the pointer info on an invalid pointer,
hsaKmtQueryPointerInfo will return error or unknown pointer. The other
fields in HsaPointerInfo are invalid, so we do not return them to the
user.
Also removing the assert and returning unknown pointer instead. As the
assert will not trigger in release builds.
hsaKmtQueryPointerInfo may also return unknown pointer for userptrs as
they are not always tracked by thunk. Adjusting code to still treat
these pointers as valid in this case.

Change-Id: Idf5cd8b61cd532d31b072f449839d223369bb138
2022-10-21 15:28:48 -04:00
Freddy Paul ac66865385 Remove RPATH/RUNPATH from ROCm libraries
:Since all public interface libraries are present in
same folder RUNPATH/RPATH is not required in the library itself.
Application shall provide the required RPATH/RUNPATH to load all
libraries.

Change-Id: I1d1ba920bf291eb89bd1f4c0fd0cfd80c7d739bd
2022-10-21 11:05:06 -04:00
David Belanger a0d3db6e8d Initial changes for gfx1101, based on gfx1100/gfx1102 implementation.
Change-Id: I949c1027ccabf38b4f924590e42e7327dc550f73
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
2022-10-13 09:28:39 -04:00
David Yat Sin 39632a713e Use user requested size for memory fragments
Amount of memory requested by user may be aligned-up internally to
the memory pool granularity. The extra padded memory should not be
considered when validating pointers from the user. Also return the
user requested size when user queries pointer information.

Change-Id: I28b25448ea03c836b44fafdb34b7330cf6887424
2022-10-07 21:32:49 +00:00
David Yat Sin 9cb10a3dd8 Fix compile warnings and remove unused variables
Change-Id: I7acaee5e9cf218b358ffaf0e3af6067faf6f3d2a
2022-10-06 10:11:17 -04:00
Sean Keely 7826d4ca2d Correct limit query return type to match spec ABI.
Change-Id: I2eeed1f4b79d10c7d9ab0fd36c0146063053c76a
2022-10-04 01:48:26 +00:00
Jeremy Newton 1621936e32 Implement RPM Recommends for libdrm
What we want for libdrm-amdgpu is for it to be a recommended package.
Either libdrm or libdrm-amdgpu can be used, but we recommend the latter.

Using "SUGGESTS" does not seem like a strong enough requirement, but
CPACK does not support RPM recommends. Although, it does allow
customizing the RPM SPEC file template. By generating a template, which
is done by setting:

-DCPACK_RPM_GENERATE_USER_BINARY_SPECFILE_TEMPLATE=1

This template file can be trivially modified to allow adding a line to
implement CPACK_RPM_PACKAGE_RECOMMENDS.

Fixes 

Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I34467b1ba878827ced9b8db74977967815732552
2022-10-03 12:42:51 -04:00
David Yat Sin dd255d31b8 Fix uninitialized variable warning
Fix warning when using valgrind

Change-Id: Ie59eaa990b9b5d339a178a2c6f9f4fac0e34e925
2022-09-08 09:10:00 -04:00
Lang Yu d0e7c617df Query agent family id from roct
Add agent info query HSA_AMD_AGENT_INFO_ASIC_FAMILY_ID.
Then we can remove the codes to parse family id.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: I3ac4746d3015e89b32322ebc0f8a3084f98677a4
2022-08-25 10:15:43 -04:00
David Yat Sin 0647960019 Revert "Change search path to use RPATH"
This reverts commit c904cc5856.

The change from using RUNPATH to RPATH was not approved formally.
Reverting this patch until this gets approved.

Change-Id: Ibc1a8f9d5dfa6694adacccfd9e3b0d053660e848
2022-08-23 07:28:14 -04:00
Jonathan Kim 2b75a73ce7 Report no cooperative launch support with CU masking
The allocation logic of the SPI does not take into account compute
user thread management settings for masking CUs with the exception of
skipping fully disabled SEs.  This means that occupancy limited
dispatches such as cooperative launch may over allocate onto hardware
resources that are not immediately available, resulting in a potential
barrier logic hang as occupying work groups are waiting on enqueued
work groups to reach the barrier.

Further work will have to be done to get the per-SA CU enablement count
from the KFD in order to correctly clip the cooperative CU limit based
on the CU mask, which will require breaking the current ABI.

For now, report that cooperative launch is not supported while a CU
mask has been applied to prevent potential shader hangs.

Change-Id: I8be4bb47d65ceb62d805f36ef6ef3996d756021f
2022-08-22 08:22:28 -04:00
David Yat Sin c904cc5856 Change search path to use RPATH
Change default behavior for library search to use RPATH instead of
RUNPATH.

Change-Id: I328766006d02c2a8c76a3b1e0780ae5ca678ed86
2022-08-21 19:14:27 -04:00
David Yat Sin df3fe8c2fb Add env variable to disable CPU affinity override
New environment variable HSA_OVERRIDE_CPU_AFFINITY_DEBUG to
enable/disable overriding CPU affinity.

Default value is enabled(1).

This is a temporary variable and may be removed in the future.

Change-Id: Id6a7c611730471ddc276ca333fde1e57046bf32a
2022-08-19 11:07:49 -04:00
David Yat Sin a7db31c5d1 Expose memory executable bit for SVM ranges
Add support to expose executable bit.

Change-Id: I054f5c3173822c369dd9908eec5c449459600ce1
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2022-08-17 12:05:42 -04:00
David Yat Sin 86e4cb1ddd Add max enum value to hsa_agent_info_t
Add max enum value to force size of enum and avoid clang compile
warnings.

Change-Id: I9cdf529517cc605a5039c3a924fd718ece16029d
2022-08-10 11:11:36 -04:00
David Yat Sin 117495fe88 Fix image LUT for gfx11
For gfx11 the image type table has some different values compared to
previous asic families (e.g TYPE_SRGB). Creating a new LUT class to
use these new values.

Change-Id: Ifdfc6cd29bfd5f4ec2643c848fcb9986eb874f9e
2022-08-04 11:23:28 -04:00
Yifan Zhang daa01b8d57 Add gfx1103 support
This patch adds gfx1103 support

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I7f1d580059fcd501bce2c8fea894637960c29bc1
2022-08-04 11:23:28 -04:00
David Yat Sin 574bea4a4c Use FAMILY_GFX1103 for gfx1103
Also adding elf entry

Change-Id: Id47ec379f2880961022b4607eb7f106b7e9d7048
2022-08-04 11:23:28 -04:00