Add the hard limit of allocation size to be 1/2 available vram
to avoid allocation failure when allocation size equals to vram size.
Add printing block size in each round to report progress for long running
test
Add the block size skip info in result form(if any tests skipped).
Affected test:
rocrtstPerf.Memory_Async_Copy
Data Size Avg Time(us) Avg BW(GB/s) MinTime(us) Peak BW(GB/s)
128M 638759.570200 0.195692 637569.991000 0.196057
256M 1270058.822400 0.196841 1268425.758000 0.197095
Notice: Data Size larger than 512M is skipped due to hard limit of 1/2 vram size
Signed-off-by: Mengbing Wang <mengbing.wang@amd.com>
Change-Id: I4c4cea74a608272cc29d222b9399af26b34d7473
[ROCm/ROCR-Runtime commit: cf10c3bc35]
Includes some workarounds and HMM.
Conflicts:
opensrc/hsa-runtime/core/runtime/amd_topology.cpp
opensrc/hsa-runtime/core/util/flag.h
Change-Id: I22976f07964a43dbb228a6231777dbd599112b8d
[ROCm/ROCR-Runtime commit: 7333c77e22]
When no isa's are available no callbacks should be invoked. This
is not an error and should return success.
Change-Id: Ie4048aa8cbe5c3fdf5431f6a865021549ecf8a13
[ROCm/ROCR-Runtime commit: 4197461b7f]
Sramecc is misreported in kfd 4.0 and prior. To prevent possible
corruption due to d16 instructions, deny use of gfx906 with older
kfds and correct misreport for gfx908. Denial of gfx906 may be
overridden by setting HSA_IGNORE_SRAMECC_MISREPORT=1.
Change-Id: I7d5c3a716fad01c348f8b88cd508cedbf914c989
[ROCm/ROCR-Runtime commit: 45fbe5b192]
1. As we cannot ganrantee that 100% apu vram are free to be allocated, limit
the allocation size be no more than 3/4 of vram size.
2. Keep the old 1GB allocation limit for dGPU case.
3. Add the alignment check for alloc_size.
Affected tests:
rocrtstStress.Memory_Concurrent_Allocate_Test
rocrtstStress.Memory_Concurrent_Free_Test
Change-Id: Id0023de132024d02f80980ae4237d9d74d9e27d3
Signed-off-by: Mengbing Wang <mengbing.wang@amd.com>
[ROCm/ROCR-Runtime commit: d5855c1658]
Park the wave, if it is stopped, to avoid halting it at an s_endpgm
instruction if the architecture does not support it.
Free ttmp6 by converting the dispatch_ptr into a queue packet index
(25-bit) and storing it in ttmp7[24:0].
Save the exception PC in ttmp11[22:7] ttmp6[31:0].
Change-Id: Iaa3c5baf5b488c0b534044d338f12bffa63ddce2
[ROCm/ROCR-Runtime commit: ea6ee0aa81]
Scratch cache was not updated for IOMMUv2 systems previously.
This both negates the cache and causes segfault during scratch
release.
Change-Id: I71e81d6b642d65ca135868ff7225ea173529d458
[ROCm/ROCR-Runtime commit: 191664cd20]
hsa_amd_agent_iterate_memory_pools return HSA_STATUS_SUCCESS even if
no memory pool is found. Add a memory pool check.
jenkins@jenkins-System-Product-Name:~/rocrtst_tests/gfx902$ ./rocrtst64 --gtest_filter=rocrtstFunc.MemoryAccessTests
Note: Google Test filter = rocrtstFunc.MemoryAccessTests
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from rocrtstFunc
[ RUN ] rocrtstFunc.MemoryAccessTests
#### TEST NAME ####
RocR Memory Access Tests
#### TEST DESCRIPTION ####
This series of tests check memory allocationon GPU and CPU, i.e. GPU access
to system memory and CPU access to GPU memory.
#### TEST SETUP ####
The gpu device name is gfx902
Target HW Profile is HSA_PROFILE_FULL
Test can run on any profile. OK.
#### TEST EXECUTION ####
*** Memory Subtest: CPUAccessToGPUMemoryTest in Memory Pools ***
Segmentation fault (core dumped)
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: Ic335c4c98990b43f5d4842ab6d74855859a9048a
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
[ROCm/ROCR-Runtime commit: 27ae854cda]
This will create a deb and an rpm for rocrtst to make installing and
running it easier for non-ROCr devs.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I506baedc1471482e5808139cab5c28ae07ac8fb1
[ROCm/ROCR-Runtime commit: 9311789398]
APU doesn't have non-KERNARG memory pool for cpu agent or
a global memory pool for gpu agent. Current setup check
fails as below. Change to a APU specific check method.
[==========] Running 45 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 1 test from rocrtst
[ RUN ] rocrtst.Test_Example
#### TEST NAME ####
Test Case Example
#### TEST DESCRIPTION ####
Put a description of the test case here. Line breaks will be taken care of
on output, not here.
#### TEST SETUP ####
The gpu device name is gfx902
Target HW Profile is HSA_PROFILE_FULL
Test can run on any profile. OK.
/home/jenkins/hsa/runtime/rocrtst/common/base_rocr_utils.cc:180: Failure
Value of: rocrtst::ProcessIterateError(err)
Actual: 4096
Expected: HSA_STATUS_SUCCESS
Which is: 0
HSA_STATUS_ERROR: A generic error has occurred.
/home/jenkins/hsa/runtime/rocrtst/suites/test_common/test_case_template.cc:195: Failure
Value of: HSA_STATUS_SUCCESS
Actual: 0
Expected: err
Which is: 4096
rocrtst64: /home/jenkins/hsa/runtime/rocrtst/common/base_rocr_utils.cc:416: hsa_kernel_dispatch_packet_t* rocrtst::WriteAQLToQueue(rocrtst::BaseRocR*, uint64_t*): Assertion `test->main_queue()' failed.
../shunit2: line 977: 1382 Aborted (core dumped) ./rocrtst$ROCRTST_BLD_BITS "$ROCRTST_ARGS" --gtest_output=xml:"$gtest_xml"
failed (failed to run rocrtst)
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I03691bd4171b6e622231baf3dce4db2211eb47e7
[ROCm/ROCR-Runtime commit: 5977eb554f]
Legacy p2p copy path incorrectly transfered in whole pages rather than
the requested size.
Change-Id: I9aa7337754f9e32f587a0cc5305f8ffeb6196f10
[ROCm/ROCR-Runtime commit: 34ac62274a]
Replace the stop reasons ttmp11.trap_raised and ttmp11.excp_raised
with ttmp11.wave_stopped which indicates that the trap handler has
halted the wave as the result of an event (trap, single-step or
exception).
If the wave is stopped because of a trap, also record the trap_id in
ttmp11.saved_trap_id[7:0].
Save status.halt in ttmp11.saved_status_halt, so that it can be
restored when resuming a wave (changing a wave's state from stopped to
running or single-stepping).
Change-Id: I7322f59b60e8cc1b92bf5f067dba606a3109ef49
[ROCm/ROCR-Runtime commit: 9ca79d072a]
This patch is to let ROCr recognize new gfx10.3.3 ISA.
Change-Id: Ied23eee2752e14c19c8c0a6d7789fded9940e31e
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ROCm/ROCR-Runtime commit: feeb2f62e2]
To support single stepping the instruction preceding an s_endpgm,
unwind the PC by 8 bytes and set ttmp11[9] to notify the debugger
that the wave is halted with a modified PC.
Bump the debug r_version for this new trap handler ABI.
Change-Id: I55e4e0d65576f92da14a336266c31c513baab547
[ROCm/ROCR-Runtime commit: 8aec53969f]
Each SE must be assigned equal numbers of slots and slots
must be assigned in units of whole groups.
Change-Id: I8f3677237fa6f2e2d25e3e78210c5a7a0ad792f3
[ROCm/ROCR-Runtime commit: 7bc6aac5d2]
- Use consistent naming in Isa class.
- Remove unused Isa methods.
- Simplify Isa methods.
Change-Id: I7c4045d08fbfe0d94b3181db8ebc5e5ed8c8cc82
[ROCm/ROCR-Runtime commit: 6bbf6b1c9c]
Store target ID string in isa registry and use for returning agent and
isa name.
Change-Id: I72a20d8ff963c73d86392158aff3853e4c9bfdbd
[ROCm/ROCR-Runtime commit: 853ccc762e]
- Remove gfx800, gfx804 and gfx901 as they do not exist.
- Map the V2 note record of "AMD:AMDGPU:8:0:0" to gfx802 as they are
the same target just connected to a differnt motherboard.
- Correct typo for supporting gfx902:xnack+.
- Support agent names with a minor or stepping version greater than 9.
Change-Id: Ife933449f60ab4687e2aaab9baf4c9fc5b86339d
[ROCm/ROCR-Runtime commit: 12eb2764cd]
Add missing target names and make all parts consistent with which
targets are supported.
- Add gfx805 as a supported target.
- Add all ELF targets to genric code.
- Make offline loader match supported targets.
Change-Id: Idab4d69edc71645aecaa83aa55e29c1aeee4c1d6
[ROCm/ROCR-Runtime commit: b443397bcc]
Kernel argument size and alignment queries are not supported on
code object v3.
Change-Id: I1bdd34e2e62132f912ac39d80355efd3456df87c
[ROCm/ROCR-Runtime commit: 6182abf5e9]
Code object V2 had the ability to support the following queries:
- HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE
- HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE
- HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT
- HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT
However code object V3 onwards cannot support these as the kernel
descriptor changed. These queries need to be deprecated.
Until then return more reasonable values:
- For kernarg alignment return 16 which is the minimum alignment
required by the HSA standard.
- For kernarg size return the field from the kernel descriptor which
is a hint. If it is 0 then the compiler is not specifying the kernarg
size, or the kernel has no kernarg.
Change-Id: I19ce6cd0f3658a2bf62277492f39100ea5ab4256
[ROCm/ROCR-Runtime commit: ef755e4c82]
Avoids calling to KFD to map/unmap scratch allocations for
every large scratch using dispatch.
Change-Id: I9fab5705251ec82b03e4f2f2ca6da7cdccabefb9
[ROCm/ROCR-Runtime commit: 27e044ae4d]
Improves HIP event performance in directed benchmarks where
clock sync latency is significant.
Change-Id: I78b724a14a8f5b6a9a2b9f4d85afe9d8b81808a6
[ROCm/ROCR-Runtime commit: 32d0fcafa9]
The modern meaning of the construct if( NOT ON ) was added in CMake 2.8,
but when the cmake_minimum_required not set in user code and no policy
level is set in the CMake config, then CMake 2.8 features cannot be
used. In old CMake (the default), ON is interpreted as a variable, and
because it is not defined, it is considered false. The same is true of
OFF.
This change sets a variable as ON, so that old CMake interpretation is
correct, and the if works as expected regardless of policy version.
Change-Id: I67d7ed4ceaf8248eeb5a1c7f54009d72313f3f5d
[ROCm/ROCR-Runtime commit: 4a35f560f6]
Names test good:
hsa-rocr-dev_1.2.0.30900-crdnnv.415_amd64.deb
hsa-rocr-dev-1.2.0.30900-crdnnv.415.el7.x86_64.rpm
hsa-rocr-dev-1.2.0.30900-crdnnv.sles151.415.x86_64.rpm
http://confluence.amd.com/display/GPUCPT/Package+File+Naming
Note: rpm requires 'devel' instead of 'dev', to be a subsequent
patchset.
Change-Id: Id6a422f3c335448b52c70c77ed39c9041114b80f
Signed-off-by: Cole Nelson <cole.nelson@amd.com>
[ROCm/ROCR-Runtime commit: 90f2dd5b1b]