Commit-Graf

47 Incheckningar

Upphovsman SHA1 Meddelande Datum
Jason Tang 5692511e22 SWDEV-333471 - Remove HIP_FORCE_QUEUE_PROFILING
HIP_FORCE_QUEUE_PROFILING has been replaced by GPU_FORCE_QUEUE_PROFILING.

Change-Id: Ic32ecdf829a2725ace84e76abab8a81c8790e13f


[ROCm/clr commit: dd49cf0fa0]
2022-08-29 10:21:09 -04:00
Jason Tang fb753e489d SWDEV-333471 - Add GPU_FORCE_QUEUE_PROFILING
To support both hip and ocl. HIP_FORCE_QUEUE_PROFILING will be replaced with this later on.

Change-Id: I6d3514b1568ff049584ed9fd74bbdb3e4f4bf0c3


[ROCm/clr commit: d92b3a2d90]
2022-08-19 10:51:41 -04:00
Saleel Kudchadker 5ba419ac66 SWDEV-301667 - Increase kern arg pool
Change-Id: Ie4b087ae4aec08fccaaa7958cdf545e4e27ac5c1


[ROCm/clr commit: 6a77f73050]
2022-07-12 18:42:29 -04:00
German Andryeyev 6a225063d4 SWDEV-335142 - Increase max dispatch limit for capture
Change-Id: I929808476a75f4c360cd9368b777e1a0109fdb82


[ROCm/clr commit: 8e5205bb3a]
2022-05-02 15:59:25 -04:00
German Andryeyev 4b4137ae63 SWDEV-332512 - Add ROC_SIGNAL_POOL_SIZE
Default value is 32 HSA signals in the pool.

Change-Id: Icb69413d3ff6ef228d9a9e22fd024e72c6d8ebe4


[ROCm/clr commit: 7975a07112]
2022-04-14 17:32:00 -04:00
Saleel Kudchadker 3d0100c5ab SWDEV-301667 - Add cache state for a device
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker

Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0


[ROCm/clr commit: 8eeaa998c0]
2022-04-12 12:27:31 -04:00
Satyanvesh Dittakavi acfa45bd5c SWDEV-326397 - P2P copies to take SDMA path if there is no pending dispatch
Change-Id: I50cfb8d77f7882151a20a1de7aaf5219b1695b7d


[ROCm/clr commit: c1b95b09bf]
2022-03-29 14:59:11 +00:00
German Andryeyev 0d5a8a5b9d SWDEV-311271 - Add a key to control memory pool feature
Change-Id: Ibd929592b802e65d0e1a4fd9689050bce5059e98


[ROCm/clr commit: a02ae1b851]
2022-03-25 19:07:14 -04:00
German Andryeyev 3c4f97f66c SWDEV-286150 - Remove GSL backend
Change-Id: Iba9a997ee7d5ff6ac00d5888ff189a4514958fe9


[ROCm/clr commit: 525a1bbf1a]
2022-02-09 17:16:39 -05:00
German Andryeyev 5ad02b78c4 SWDEV-305016 - Improve MGPU scaling in Tensorflow
Add a threshold for ROCR/SDMA P2P transfers. ROCR copy path
requires extra barriers in compute for synchronization. That costs
extra performance with tiny transfers.
Reduce active wait time to 10us. Tensorflow uses extra thread
per GPU with constant hipEventQuery() calls. Longer active waits
in ROCr affect CPU performance.

Change-Id: I9020358438615fa2d4617f862f00a562f0a588e7


[ROCm/clr commit: 008133cf41]
2021-12-08 11:59:37 -05:00
German Andryeyev 7821cddb3e SWDEV-257789 - Initial change to skip kernel arg copy
The optimization is controlled with ROCR_SKIP_KERNEL_ARG_COPY.
This is initial check-in for experiments. Extra changes are
necessary for full support:
- handle graph capture with the original sysmem alloc
- avoid memobject references, otherwise there is a race condition with
reusage of the arg buffer
- Remove arg setup from hip

Change-Id: Ib0af710f93e79834711fa4049a7c66093711e68b


[ROCm/clr commit: 7e12cf6318]
2021-10-28 20:35:35 -04:00
German Andryeyev 51f7944fcb SWDEV-303567 - Increase the size of AQL queue
ROC_AQL_QUEUE_SIZE will control the size of AQL queue.
The current sefault value is 4096.

Change-Id: Icd2a4ee3ba554c06aa05b08defd922d2c63e43fd


[ROCm/clr commit: 7fe696b6ef]
2021-10-06 08:27:36 -04:00
Sourabh 936e0836a8 SWDEV-292525 - [vdi] Path to streamOps shaders
Implementation to use a blit kernel to perform
a hipStreamWait/write instead of an AQL packet.

Change-Id: I462671ed5cec37144dfe97ff66439249196117c1


[ROCm/clr commit: cbb8d82bdb]
2021-09-27 13:59:35 -04:00
Saleel Kudchadker 1bf9b39cf8 SWDEV-301667 - Kern arg placement
Add a env var ROC_USE_FGS_KERNARG to toggle kernel arg placement
By default its in Fine Grain Kernel arg segment for supported asics.

Change-Id: I3d57ed69a1a4db2b392b0438ead499f3ddca4716


[ROCm/clr commit: e29b9c00ee]
2021-09-02 12:36:49 -04:00
Jason Tang d1a3931d68 SWDEV-1 - Disable OpenCL support for gfx8 in ROCm path
Change-Id: Ie1e0c0d6273edf6b734909447c2a08252cba305b


[ROCm/clr commit: 7f83bcdb45]
2021-08-31 12:48:47 -04:00
vpykhtin a3b0a8aed0 SWDEV-1 - OpenCL binary substituion feature based on source program text hash matching.
This patch allows to substitute binary for the opencl program. It supposed to be used as:

1. Run the opencl program with -save-temps.

2. Open the cl temp and find the following text in the program header:
    Hash to override:
	Source: 0xd66bcfa20e69e605
	Source + clang options: 0x656a9dd8aedcbfb6

3. Create config file (ascii text) with a pair(s):

    <hash> <path_to_binary_to_substitute>

    where hash is the hex value from step 2 (without leading 0x), you can use either hash
    depending on what you're going to match:
	only the source text of the program or along with it's clang options.

4. Set the env variable AMD_OCL_SUBST_OBJFILE to the path of your config file.

5. Rerun the opencl program.

Change-Id: I977c80fe529ea14458194918c6ddfbe2de6a8857


[ROCm/clr commit: 51cc9c2f8c]
2021-08-22 23:56:08 -07:00
agunashe 49f0546637 SWDEV-293742 - Update copyright end year VDI repo
Change-Id: I69d2fea4a7a43adf96ccea794270e4af991c5261


[ROCm/clr commit: d96481fb36]
2021-08-22 23:56:07 -07:00
Anusha GodavarthySurya a7c6b2d463 SWDEV-290901 - update ROC_ACTIVE_WAIT_TIMEOUT to 50us
Change-Id: Iba2f2bb882c4786a432a523cb0954761e5359e7f


[ROCm/clr commit: 20e2153e8b]
2021-08-22 23:56:07 -07:00
Saleel Kudchadker 9a3e9c9ad3 SWDEV-247372 - Active wait timeout env var
- Create an env var ROC_ACTIVE_WAIT_TIMEOUT to set active wait timeout
- Record profiling informaion if marker_ts_ property is valid.

Change-Id: If0d8aec8d9b0715027cf0f7c3dc8a4c722a6bae6


[ROCm/clr commit: b416ad7b9d]
2021-08-22 23:56:07 -07:00
Saleel Kudchadker 4b03f02a61 SWDEV-280773 - Honor CPU affinity with env var
Setting AMD_CPU_AFFINITY = 1 will make runtime honor core affinity that
the process may set. This is disabled by default as it can prevent
worker thread or any thread that runtime creates from getting scheduled
thus affecting performance.

Change-Id: Ibe4cc95e7b99caee5ce750b7bf66e09e999cc9a3


[ROCm/clr commit: 1398719b0d]
2021-05-11 18:21:56 -04:00
Vladislav Sytchenko 6ff739e446 SWDEV-1 - Remove VEGA10_ONLY macro
This is a relic of the past.

Change-Id: I888cf96368e321dcb000c1e59e9ed3f7c5dff7ab


[ROCm/clr commit: 3a12ab8fae]
2021-04-15 16:11:31 -04:00
Julia Jiang 7bb189c4c5 SWDEV-268186 - OCL ReBar optimization
Change-Id: I69d8bce8d48a5b6f94a05272c83ee91fbec1688c


[ROCm/clr commit: aef4ab1fc8]
2021-04-13 15:08:32 -04:00
Jason Tang 636bdbd0fa SWDEV-277559 - Remove AMDIL
The rest of AMDIL support will be removed along with orca backend.

Change-Id: I0462501e7147dc4b99870fd02034d0a4a0496e55


[ROCm/clr commit: 1a38be8972]
2021-04-09 14:15:15 -04:00
Sarbojit Sarkar 469f00e6f3 SWDEV-254329 - Fix for profiler ON/OFF
Change-Id: Iea72ae96ebe7ed95322dfc39d785ac326b47f6dc


[ROCm/clr commit: 14d54a7b29]
2021-03-02 02:16:14 -05:00
Saleel Kudchadker 4da1282882 SWDEV-272673 - Add changes to dump log to a file
Env var AMD_LOG_LEVEL_FILE would dump the log to file.
Change-Id: I6add4a1ae6788f376ce116797cc0573007502e73


[ROCm/clr commit: 0f14c54c04]
2021-02-15 10:28:06 -08:00
German Andryeyev f96e973378 SWDEV-257787 - Add engine tracking per signal
- The logic will trace compute, sdma read/write operations and
apply signals when necessary
- ROC_CPU_WAIT_FOR_SIGNAL, ROC_SYSTEM_SCOPE_SIGNAL
and ROC_SKIP_COPY_SYNC were added to control the tracking

Change-Id: I9e8e6174c63bf7784f7ab00964e2918c8667d364


[ROCm/clr commit: dbc7abaecf]
2021-01-25 12:34:45 -05:00
Aryan Salmanpour e8c7cf569f Add an environment variable for setting a global CU mask
Change-Id: I773b152023c7b8e1e679a42015748f9b23fd946d


[ROCm/clr commit: d03ee6eff6]
2020-11-20 10:05:09 -05:00
German Andryeyev a7adace36e Add direct dispatch simple hack for testing
The hack dosn't really track the commands status. It may be not
necessary for HIP, but will cause early resource release.

Change-Id: I791ad36dd8abd3b6b3d2c9b16a210a555c08ca64


[ROCm/clr commit: 532f0ae951]
2020-11-13 10:36:23 -05:00
Vladislav Sytchenko 3e6989c1c2 [PAL] Allow for embedding debug info into IBs
Change-Id: I4473b9c5aa36370d9af37f22a78f4414eaa21e01


[ROCm/clr commit: 2ec5a47c88]
2020-10-14 15:54:48 -04:00
Vladislav Sytchenko fcae92ce47 [PAL] Allow overriding reported asic revision
This is helpfull to do when debugging issues on lowend asics. Navi14 can be emulated as Navi10. So can Navi22 be emulated as Navi21.

Change-Id: I693ffd45a5b03657822afdc872781901bc69b65c


[ROCm/clr commit: 26d1b28b16]
2020-10-13 09:36:15 -04:00
German Andryeyev 2c21a44b40 Add option to skip AQL barrier
The change reuses HSA signals for dispatches as a wait signal.
Skipping the barrier requires to  disable L2 cache for sysmem
allocations and extra tracking for HDP access with the large bar.
ROC_BARRIER_SYNC=0 activates the new logic. Barrier sync is
still used by default.
ROC_ACTIVE_WAIT=1 enables unconditional active wait in ROCr.
The change also consolidated ROCr wait logic under single function.

Change-Id: I6bd1be30aa88258da1b1f9de319ef5a45852afd8


[ROCm/clr commit: d9397590de]
2020-10-06 08:37:12 -04:00
Saleel Kudchadker 820a456980 Add Queue profling param and toggle for HIP
Use signal timestamps if NDRange command takes forceProfile flag.

Change-Id: Ib7f187d781fd78a7346818afb3344a9378f4c104


[ROCm/clr commit: ec73340348]
2020-08-06 03:09:53 -04:00
Vlad Sytchenko ec1205b497 Revert "Added file logging for rocclr & HIP"
This reverts commit bc5075c2c5.

This change broke the legacy-complib build in p4. It seems that we can't use any flags in debug.cpp.

Change-Id: I17bb83651b85d6f415d9074634b479658fd4c3f9


[ROCm/clr commit: 20c24cae93]
2020-06-23 16:46:56 -04:00
Sarbojit Sarkar bc5075c2c5 Added file logging for rocclr & HIP
Change-Id: Ic0a54f6ee82d010b011739e0059778ed31833518


[ROCm/clr commit: 5f055d227d]
2020-06-23 04:30:36 -04:00
German Andryeyev 0a6056ac82 Initial HMM support
- Expose ROCclr interfaces for HIP usage
- ROCr interfaces aren't available in staging, thus control the
build with AMD_HMM_SUPPORT define

Change-Id: Iadc2bcc230e78d3b0dc22b235189c8cc80843446


[ROCm/clr commit: c5afd5d412]
2020-06-12 09:06:07 -04:00
Payam Ghafari a0fad24cf0 Revert "adding HIP_ENABLE_LAZY_KERNEL_LOADING flag"
This reverts commit 5540e98745.

Reason for revert: HIP_ENABLE_LAZY_KERNEL_LOADING is needed before the runtime is initialized, so this utility cannot be used

Change-Id: I49f8ddb98c9a85b9a77b8fd4b236d06b6b2b0f32


[ROCm/clr commit: ac8d1ba687]
2020-05-29 21:26:25 -04:00
German Andryeyev 3abb1347cf Revert "Revert "Reenable cooperative groups""
This reverts commit c6c208099b.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I93c45fae27e0a08b199542d44fb0d65fc74ea13c


[ROCm/clr commit: fb401bfe6d]
2020-05-25 14:11:58 -04:00
Aakash Sudhanwa c6c208099b Revert "Reenable cooperative groups"
This reverts commit 7b00a525ba.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I8954b37c354382804a139d80e2551c381fd9b2ed


[ROCm/clr commit: abc115bda8]
2020-05-19 18:21:48 -04:00
Jason Tang d917bbfc73 SWDEV-236894 - Rename LOG_LEVEL to AMD_LOG_LEVEL
Change-Id: Ibdfaf0fb615ac343c05d0fa3c3ace9cbb592ecf3


[ROCm/clr commit: 49224d95c7]
2020-05-19 17:32:24 -04:00
German Andryeyev 7b00a525ba Reenable cooperative groups
Change-Id: Ia43049ef550bffa6d21704dbd306ddb9c1d56af0


[ROCm/clr commit: 82dc1a6343]
2020-05-15 12:41:12 -04:00
Payam 5540e98745 adding HIP_ENABLE_LAZY_KERNEL_LOADING flag
Change-Id: Ia4425e00d97a25bcea656e2ade5cd3a5d92b4de6


[ROCm/clr commit: a3b730b595]
2020-05-13 13:06:55 -04:00
Saleel Kudchadker c38c8e5c3d Add env var to toggle large bar support in runtime
Use ROC_ENABLE_LARGE_BAR (0/1) to toggle. The support is
enabled by default.

Change-Id: I6cb93a46594cb6f5e90bf6057738330225efb553


[ROCm/clr commit: d10d691e76]
2020-05-12 13:20:06 -04:00
Saleel Kudchadker 6b7c6748b1 Add a threshold for forcing ROCr to take blit path
This workaround is to avoid performance penalty of SDMA engine
taking a while to clock up from a lower DPM state. Add env var
GPU_FORCE_BLIT_COPY_SIZE (1024 by default for HIP in KB). Forcing
Src and Dst agent to be amdgpu makes ROCr take blit copy path for
what otherwise should have been SDMA copy

Change-Id: I222f687155f86000d17d66d25182e490b6710463


[ROCm/clr commit: 5f64e6e7ad]
2020-04-28 17:11:24 -04:00
German Andryeyev d8cd26eb1b SWDEV-193956
[hipclang-vdi-rocm][perf]~45% to 50% of Performance drop on
rocBLAS_int8 test

- Enable AMD_OPT_FLUSH optimization by default to match HCC
- Disable CPU writes to GPU memory on boards with large bar,
because it requires HDP flush tracking.
- Enable L2 cache on kernel arguments, because L2 will be
invalidated on memory reuse .

Change-Id: I124cf250bdd4d19c523ce542c163813828f8fbdc


[ROCm/clr commit: 374f612b7c]
2020-02-18 14:26:00 -05:00
Saleel Kudchadker 0ddfa04517 Implement HIP_HIDDEN_FREE_MEM env var
Set value to 256Mb to reflect what HIP/HCC reserves
Change-Id: Icaadf79f60d3916965ac168da237d15b975b1fe4


[ROCm/clr commit: 0730b39adb]
2020-02-14 12:57:11 -05:00
Laurent Morichetti e284923583 Update copyright info
Change-Id: Ia4f9ff0f5f873b4223a8cca154188bb0d2f1abba


[ROCm/clr commit: b4c6143a2f]
2020-02-04 09:26:14 -08:00
Laurent Morichetti 011f3e945b Merge branch 'origin/pghafari/vdi-prototype' into lmoriche/amd-master
Change-Id: Id3b833d405596735becb3346f3b08c6da57033fe


[ROCm/clr commit: 20c7173849]
2020-01-30 20:12:13 -08:00