Commit Graph

64 Commits

Author SHA1 Message Date
Vlad Sytchenko 3a84fcd13e Handle the option USE_COMGR_LIBRARY correctly
This is a follow up to http://gerrit-git.amd.com/c/compute/ec/vdi/+/359563. The setting is now either ON or OFF, never "yes".

Change-Id: I031d013a8d239dc72ef610da81bd31b8b78a3ba8
2020-06-03 17:25:47 -04:00
Tao Sang fabfc42b68 Fix TC linux build issue due to previous Numa patch
Change-Id: I6068edaf38cac6fad187c8429707afdb727e8d41
2020-06-03 16:42:53 -04:00
Tao Sang aedb9590be Support Numa-aware cpu selection
Select cpu in terms of the smallest Numa distance for a GPU device.
This will improve performance of hipMemcpy in the mode of
hipMemcpyHostToDevice or hipMemcpyDeviceToHost for small buffer.
`

Change-Id: I2860f1f83b79be0dff7bf5e64cf68ab4448db0a1
2020-06-01 21:01:24 -04:00
German Andryeyev fb401bfe6d Revert "Revert "Reenable cooperative groups""
This reverts commit abc115bda8.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I93c45fae27e0a08b199542d44fb0d65fc74ea13c
2020-05-25 14:11:58 -04:00
kjayapra-amd 53a890b499 SWDEV-237467 - Return proper hip error codes incase of ROCclr IPC API failures.
Change-Id: I1d018918ed71f6d80846b3017f7a15f4ab496554
2020-05-22 22:10:15 -04:00
Aryan Salmanpour fec4adfd19 check for valid queue before accessing cuMask()
Change-Id: I8d4b0dbcd097c2ec5c31dea5a3d0060f0864a7e8
2020-05-20 16:23:09 -04:00
German Andryeyev f56a052243 Add missing memory allocation in printf
Change-Id: I452b676612b54f70106e7ef1bcb5ce2baf7b3ffc
2020-05-20 14:49:59 -04:00
German Andryeyev 2ce6bbebc4 Fix async mem clear
Optimization for the fence release removed a sync for mem fill.
Add simple const buffer management forr the filled pattern to avoid
pattern overwriting with the async fills.

Change-Id: I63773ac09ceec31d5396d24570e4647ff096326b
2020-05-20 11:13:41 -04:00
Chauncey Hui 0af9c06968 Modified IpcDetach to return status instead of void.
Change-Id: I68ed94b93f0383babe25eb046b4047d249a0fdc1
2020-05-20 03:38:21 -04:00
Aakash Sudhanwa abc115bda8 Revert "Reenable cooperative groups"
This reverts commit 82dc1a6343.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I8954b37c354382804a139d80e2551c381fd9b2ed
2020-05-19 18:21:48 -04:00
Jason Tang cd2a713d63 Add major/minor/stepping to device layer
Change-Id: If82ea55a46b166b243a98089a6e9c40ccfdb479f
2020-05-17 12:57:34 -04:00
Aryan Salmanpour fed94b8604 Add support for setting CU mask on ROCclr for ROCm backend
Change-Id: I0dbe2eeb33467fc0f24b26929119c10e9b455da7
2020-05-15 14:23:43 -04:00
German Andryeyev 82dc1a6343 Reenable cooperative groups
Change-Id: Ia43049ef550bffa6d21704dbd306ddb9c1d56af0
2020-05-15 12:41:12 -04:00
Christophe Paquot 6a5af4056e Use system scope for packet following sdma copies
SWDEV-234947
SWDEV-236298
Instead of forcing a barrier packet, just inject system scope on the next packet.

Change-Id: If9bcee23e08dfe5db731235e2fcb30582cbd4c1c
2020-05-15 12:20:06 -04:00
Matt Arsenault 3624b8df16 Fix missing target includes for GL/EGL headers
Change-Id: I9a31eae40cb7187dd0264ad5b9577fab96464b41
2020-05-14 16:56:34 -04:00
Matt Arsenault 3a7f2e3682 Improve usage of target_include_directories
Eliminates most of the global include_directories. The install header
paths are different from the build directory, so we have to separate
those for the exported target include paths.

Change-Id: I13e4c56c1218cb31c29a316422dc5fd1d09d8b1b
2020-05-13 17:25:58 -04:00
German Andryeyev 8904848abc Set CPU access flag for SVM
Make sure all GPUs have CPU access flag for the fine grain buffer.

Change-Id: Ifc843c2807e70a271b269192ae7859205ff458f3
2020-05-13 16:05:46 -04:00
German Andryeyev d2b9a57c4f Disable cooperative groups support
Change-Id: I1b526f2228d083ecad7907a6eaf37c1dd4428277
2020-05-12 14:31:10 -04:00
Saleel Kudchadker d10d691e76 Add env var to toggle large bar support in runtime
Use ROC_ENABLE_LARGE_BAR (0/1) to toggle. The support is
enabled by default.

Change-Id: I6cb93a46594cb6f5e90bf6057738330225efb553
2020-05-12 13:20:06 -04:00
Jason Tang b4f1239f34 device/rocm: split gfxVersion to major/minor/stepping
Change-Id: I1e437eaee30794147713d9516229211670f01d90
2020-05-12 12:17:13 -04:00
German Andryeyev ae4aceb55e Make sure the list of HSA agents is valid
If HIP_VISIBLE_DEVICES is active, then make sure the list of HSA
agents contains the valid agents

Change-Id: I584aad999a230ab7f88a0cfe20dcd0abe79c43a5
2020-05-11 15:49:30 -04:00
Christophe Paquot 3ed185307e Fix cooperative flag for hsa_queue creation in case they're not available
SWDEV-233766

Change-Id: If410ecfed61f2b3bb50b847cf2ededc573139494
2020-05-11 13:40:50 -04:00
Christophe Paquot 2a02026696 Add gpu().hasPendingDispatch() in the SDMA path
SWDEV-234947

Change-Id: I8aa501f8755d136708b0d12ee3c30229c238660d
2020-05-08 18:19:51 -04:00
Michael LIAO 12fcfee41d Fix build failure.
- Also fix `-Wreorder` warning. NFC.

Change-Id: I766fdc622c9107f901a55498bdc8fef3d821d1b7
2020-05-07 10:39:10 -04:00
Michael LIAO 503ef06555 Clear executable permission.
Change-Id: Ia0d363b1ba89d7947e5b5a55cb67edba86f0515e
2020-05-07 10:38:58 -04:00
Alex Xie bfbc8cd09b SWDEV-234684 - hipmemcpy optimization does not work in tests
Change-Id: I899d172c5b2af88c796fe9a36f97d15ac45caf94
2020-05-05 15:58:03 -04:00
Saleel Kudchadker 0fbc0a895b Disable small copy optimization for now
Change-Id: Ib7a4aa676bb60940e067c985eb19070bd63b2fc2
2020-05-05 11:52:42 -04:00
German Andryeyev 7302ebcfbc Optimize synch operations
- Stall the queue only for HSA copy operations

Change-Id: Ia3debcc0f36284c5f8cd2776d31674f3aeed04ea
2020-04-30 11:17:48 -04:00
Alex Xie 6c5a42b33c SWDEV-232894 Port hipMemcpy optimizations from HCC to VDI
Apply the optimization to change for OpenCL too.
Clean up some unnecessary checks.

Change-Id: I840261fe35baeeadeba7388e86779d482f509aad
2020-04-30 11:06:28 -04:00
Christophe Paquot b54c3f7db9 Couple of cleanups.
Remove queue limitation since we loop through HW queues now.
Add a DevLogError if we fail to create the hsa_queue. A ticket showed a regression there.

Change-Id: I4f58e405f88e75600a762f6d6352838c969cdb5e
2020-04-29 09:18:07 -07:00
Saleel Kudchadker 5f64e6e7ad Add a threshold for forcing ROCr to take blit path
This workaround is to avoid performance penalty of SDMA engine
taking a while to clock up from a lower DPM state. Add env var
GPU_FORCE_BLIT_COPY_SIZE (1024 by default for HIP in KB). Forcing
Src and Dst agent to be amdgpu makes ROCr take blit copy path for
what otherwise should have been SDMA copy

Change-Id: I222f687155f86000d17d66d25182e490b6710463
2020-04-28 17:11:24 -04:00
agodavar f149fe0803 P2PStating buffer allocation when P2P is not enabled between all GPUs
SWDEV-232580 & SWDEV-232580
Allocate p2p statging buffer when full P2P access is not available between all devices.
p2p staging buffer will eventually be used when required.

Change-Id: If8490ba7b1c52c432c1e942ae95421b9d2ec7097
2020-04-28 07:10:57 -04:00
Alex Xie 009d0b5f55 SWDEV-232894 Port hipMemcpy optimizations from HCC to VDI
Change-Id: I6bebe9ac503a9f80d067aeea8a848409ad210338
2020-04-27 14:53:58 -04:00
German Andryeyev 082cbfa1f5 Don't attempt to reuse the cooperative queue
Change-Id: I0e98e292a562715a7b395118f899af859f3e42bb
2020-04-27 09:18:05 -04:00
Matt Arsenault c60d7d860d Add comgr macros to public definition export
This should allow the cmake build for the opencl runtime to work
without manually adding these definitions. The PAL build also adds
these as private defines in its build, so change rocm to match. This
should probably be including these a config header to benefit other
builds, but this will at least avoid some clutter in the opencl build
for now.

Change-Id: I1044984b87ba3fc72e280e255ceea2dd9e3337ff
2020-04-24 12:12:54 -04:00
Matt Arsenault 350d54e198 Don't use include_directories for ROCR includes
Use the modern cmake, target specified method.

Change-Id: Icd7196bfccb85f255bbc01bc87c6667d961bb236
2020-04-24 11:05:40 -04:00
Matt Arsenault 83455f36c5 Modernize cmake usage for finding amd_comgr
Don't use find_path on the header, it's redundant with the interface
include directories on the imported target. Use the target specific
forms for including and linking it.

Change-Id: I3923143c992888ee7d5ee1130084ac2e5eaa0f3a
2020-04-24 11:03:27 -04:00
Matt Arsenault a36f19df51 Don't use CMAKE_SOURCE_DIR
This is almost never the correct thing to use since it breaks adding
this as a subproject build in a larger build. Switch to refer to
CMAKE_CURRENT_SOURCE_DIR, which is equivalent in a standalone build.

Change-Id: Ib8dbbc0668491f4227389b9a5b27da770b3bc5ce
2020-04-24 11:02:52 -04:00
German Andryeyev 89133a7301 SWDEV-232807
[ROCm][TCT][HIP] cooperative stream test case is failing.

Make sure lockXfer() in the blit manager returns a valid value.
Port the latest PAL backend logic into the ROCr backend.
This change doesn't fix the issue, reported in the ticket.

Change-Id: I54101a824f49a2dcfbbf5414cb5b3af41745306d
2020-04-23 15:01:02 -04:00
Michael LIAO 97f55b5c7f [vdi] Add device assertion support.
- Once device assertion occurs, abort the host execution as well.
- TODO: This's the initial support. As we need to drain hostcall queue
  to ensure device assertion message being flushed out, hostcall
  listener needs an interface to explicitly drain its queue.

Change-Id: I8a04400aa7109bfd054ae5777c41a4abbf0db4a9
2020-04-22 10:03:55 -04:00
kjayapra-amd 7458bf9964 SWDEV-229840 - Improve error messages on ROCCLR Layer.
Change-Id: Iab7d9156cdc206db86385aa05023a0095ed40f92
2020-04-19 20:01:49 -04:00
Matt Arsenault 55cc77d7d1 Fix -Winconsistent-missing-override warnings
Change-Id: I67d4a853045197ed28e5d616a4afc86f1d6a1d7c
2020-04-17 15:24:39 -04:00
kjayapra-amd 348bb0d59f SWDEV-229480 - Fixing type on commit: 6bd52dd052
Change-Id: I19f0bddb9db4140641c10a0e36ed4302c05efe2a
2020-04-13 11:48:35 -04:00
Vladislav Sytchenko c781f4d419 Don't call updateFreeMemory() if the allocation failed
Change-Id: I978cb2e463914f6a48b3d4a9057c0f67e7bdb646
2020-04-09 18:41:11 -04:00
kjayapra-amd 6bd52dd052 SWDEV-229480 - Fixing itoa conversion in LogError.
Change-Id: I9f11394c0e13e8c57d415c4f19fcd1de7935ef23
2020-04-09 18:30:35 -04:00
Alex Xie 43b9863e17 SWDEV-229731 - [Lnx][Rocm][Navi]Support images in full Opencl Conformance tests
1. Enable pitch workaround
2. When we use copy image, we don't need to create the custom pitch image
3. wrtBackImageBuffer_ stores device memory object, not amd image object.

Tests:
conformance kernel read / write test pass with this code change.

Change-Id: I7dca3127adde6ac83e78dd270a2256ebed55c60d
2020-04-04 09:43:03 -04:00
Alex Xie 3e247d2afd SWDEV-229731 - [Lnx][ROCm][Navi]Support images in full OpenCL conformance tests
Duplicate similar blit logic from PAL path

Tests:
1D Array image read/write tests and copy image tests passed

Change-Id: I838bbde252ad0108bfeb82c0c2b669881747c0af
2020-04-04 09:28:37 -04:00
kjayapra-amd cd7de89fc3 SWDEV-229840 - Improve Error Codes. RocKernel and RocProgram
Change-Id: I8f785308e0562a50924f8bdd02e88c92a759f01a
2020-04-02 22:30:53 -04:00
German Andryeyev 481d526859 SWDEV-184709 - support hipLaunchCooperativeKernel()
- Enable cooperative groups support, based on ROCr capability

Change-Id: I975bcea0af7865009eaed24454ce71d897ea8fc4
2020-04-01 12:13:33 -04:00
German Andryeyev 7ef8dfdfe7 SWDEV-184709 - support hipLaunchCooperativeKernel()
Add ROCr cooperative queue allocation

Change-Id: I1384482692f4080d31255b09e0f68a21ccad3da8
2020-03-30 16:09:09 -04:00