Graf commitů

459 Commity

Autor SHA1 Zpráva Datum
Oak Zeng 6da54291bf Revert "Create SDMA queue on specific engine"
This reverts commit 6087fd7bca.

Change-Id: Ia3e9db5fcba1fef80745c72c78b7c568b5c7315e
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 1923d2e335]
2019-01-21 10:37:32 -06:00
Oak Zeng c622a2d220 Revert "Add test to allocate SDMA queue on specific engine"
This reverts commit 5d15953efb.

Change-Id: I262d91afc60ba2618bf4a857f162ea5236d54131
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 742fa5d871]
2019-01-21 10:36:54 -06:00
Philip Cox fa45791c1a Initial gfx9 debugger node suspend/resume
Change-Id: I2a5dac3d02265c11f5b6985ab457e2d1caa0a033
Signed-off-by: Philip Cox <Philip.Cox@amd.com>


[ROCm/ROCR-Runtime commit: 37858f2311]
2019-01-11 09:00:54 -05:00
Philip Yang 4d5fb9f80e kfdtest: increase KFDPerformanceTest.P2PBandWidthTest timeout value
KFDPerformanceTest.P2PBandWidthTest[push, push] takes about 3 seconds
on 4 gfx906, the default g_TestTimeout 2 seconds is not enough to wait
for sDMA queue rptr is consumed. Use kfdtest command line option
--timeout=6000, the test is finished and result is reasonable twice as
P2PBandWidthTest[push, none]. Change P2PBandWidthTest wait timeout to 6
seconds.

Add timeout argument to function WaitOnValue, BaseQueue.Wait4PacketConsumption
SDMAQueue.Wait4PacketConsumption, PM4Queue.Wait4PacketConsumption with
default value is g_TestTimeOut.

Change-Id: I0aa04d644339feaeea695e41647ae66568beab9e
Signed-off-by: Philip Yang <Philip.Yang@amd.com>


[ROCm/ROCR-Runtime commit: b2e026fce3]
2019-01-04 12:53:55 -05:00
Kent Russell 0cf61d242b Add lib requirement in CMake file
Adding it to the DEBIAN/control won't work, since we use CMake to build
it. Add all required packages to the CMakeLists file

Change-Id: Iaf62f42e0f998d66038338fb2cf793d29c790205


[ROCm/ROCR-Runtime commit: 666f90440a]
2019-01-02 07:50:12 -05:00
Yong Zhao 5f38525112 Add -fPIC flag when building sp3 library
This will support the sp3 library built on one gcc version to be
compatible with another gcc version.

Change-Id: If67714bd63376dc781c56ed025be335fe54b2ba5
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>


[ROCm/ROCR-Runtime commit: 81b8815e1a]
2018-12-13 18:32:23 -05:00
Eric Huang 616392b642 libhsakmt: add RAS support v2
RAS feature enabling bit and errors return are implemented in
existed topology and event mechanism.

v2: change library interface.

Change-Id: I75807c080b5b26e8115240b05b3d7016cb05a31a
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 8ee93b3187]
2018-12-13 10:17:12 -05:00
Kent Russell c0fa8baec2 kfdtest: Add gfx900/gfx906 IDs to run_kfdtest.sh
Change-Id: Ib6ee418a432d1de79e2306b54d702132de3d06c5


[ROCm/ROCR-Runtime commit: bcc348e3b9]
2018-12-12 08:38:01 -05:00
Kent Russell 1e7469c682 libhsakmt: Add new gfx900 and gfx906 GPU IDs
Change-Id: I93b2b845c3edb2da55235a56516a851145745988


[ROCm/ROCR-Runtime commit: 53439669d9]
2018-12-12 08:36:40 -05:00
Eric Huang 58c2f26d25 Revert "libhsakmt: add RAS support"
This reverts commit 56b9bb17a7.

Change-Id: I739b17e057f2a8a0f4375741955209d2477c704a
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 29d11d02e8]
2018-12-08 19:42:33 -05:00
Kent Russell 34e6346848 Add more SDMA-related tests to SDMA_BLACKLIST
These tests all make use of an SDMAQueue in one way or another, so add
them to the SDMA_BLACKLIST to be 100% certain

Change-Id: Ic29e073c2f46249f3e5918145b13d276aec7bb33


[ROCm/ROCR-Runtime commit: 54807526b9]
2018-12-06 14:07:50 -05:00
Kent Russell 931dd817fa Add ZeroInitializationVram test to SDMA blacklist
This test uses SDMA, so add it to the SDMA list

Change-Id: I2dc2b0c4328e38e593d455de2103ebe1ef0adbc2


[ROCm/ROCR-Runtime commit: aa7c13264a]
2018-12-06 11:14:26 -05:00
Kent Russell 4a068e18dd Temporarily remove SDMA tests from gfx906
SDMA is being flaky, so remove SDMA tests from it for now

Change-Id: Ia3612566813f925804ab90d6235520da7cc65926


[ROCm/ROCR-Runtime commit: 3a2ec0111e]
2018-12-05 08:41:16 -05:00
Kent Russell 18f1cc0e5b Remove SDMAConcurrentCopies from gfx906 execution
This is intermittently causing VM faults and excessive evictions, which
causes the rest of the tests to fail. Take it out for now until someone
can investigate

Change-Id: I9c43890bc9f03a4a31efbc18df0df5e40a232c58


[ROCm/ROCR-Runtime commit: 381dba3932]
2018-11-28 10:01:35 -05:00
Eric Huang 56b9bb17a7 libhsakmt: add RAS support
RAS feature enabling bit and errors return are implemented in
existed topology and event mechanism.

Change-Id: I9b018bba80cf4a6998e42a7bff64318c689b1d2a
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 1fbe010354]
2018-11-23 11:42:34 -05:00
changzhu 6ab4bbe2a8 kfdtest: fix SDMACopyParams build error on redhat 7.2 in KFDTestUtilQueue.cpp
In file included from /usr/include/c++/4.8.2/algorithm:62:0,
                 from /home/jenkins/libhsakmt/tests/kfdtest/src/KFDTestUtilQueue.cpp:24:
/usr/include/c++/4.8.2/bits/stl_algo.h: In instantiation of ‘_RandomAccessIterator std::__unguarded_partition(_RandomAccessIterator, _RandomAccessIterator, const _Tp&, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<SDMACopyParams*, std::vector<SDMACopyParams> >; _Tp = SDMACopyParams; _Compare = bool (*)(SDMACopyParams&, SDMACopyParams&)]’:
/usr/include/c++/4.8.2/bits/stl_algo.h:2296:78:   required from ‘_RandomAccessIterator std::__unguarded_partition_pivot(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<SDMACopyParams*, std::vector<SDMACopyParams> >; _Compare = bool (*)(SDMACopyParams&, SDMACopyParams&)]’
/usr/include/c++/4.8.2/bits/stl_algo.h:2337:62:   required from ‘void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<SDMACopyParams*, std::vector<SDMACopyParams> >; _Size = long int; _Compare = bool (*)(SDMACopyParams&, SDMACopyParams&)]’
/usr/include/c++/4.8.2/bits/stl_algo.h:5499:44:   required from ‘void std::sort(_RAIter, _RAIter, _Compare) [with _RAIter = __gnu_cxx::__normal_iterator<SDMACopyParams*, std::vector<SDMACopyParams> >; _Compare = bool (*)(SDMACopyParams&, SDMACopyParams&)]’
/home/jenkins/libhsakmt/tests/kfdtest/src/KFDTestUtilQueue.cpp:351:66:   required from here
/usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: error: invalid initialization of reference of type ‘SDMACopyParams&’ from expression of type ‘const SDMACopyParams’
    while (__comp(*__first, __pivot))
                                   ^
/usr/include/c++/4.8.2/bits/stl_algo.h:2266:34: error: invalid initialization of reference of type ‘SDMACopyParams&’ from expression of type ‘const SDMACopyParams’
    while (__comp(__pivot, *__last))
                                  ^

Change-Id: I0fce0c7e6d0a0ce93b1e6522ee8f216615765568
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>


[ROCm/ROCR-Runtime commit: c15cf2e9c3]
2018-11-21 17:23:03 +08:00
Oak Zeng 5d15953efb Add test to allocate SDMA queue on specific engine
Change-Id: I5b5140e4119fc01db250d63cca7389cf80ec0d16
Signed-off-by: Oak Zeng <ozeng@amd.com>


[ROCm/ROCR-Runtime commit: af5b320c47]
2018-11-20 11:17:43 -05:00
shaoyunl 0ad77ef647 KFDTest: fix failure when run KFDTest on multi-GPU small bar system
On small bar multi-gpu system, hsaKmtMemoryMapToGPU will fail due to latest
kernel P2P sanity check. Swith to use hsaKmtMemoryMapToGPUNodes to fix
the failure

Change-Id: Id8b6329d1243df0e908cc9a171b5c7f9156f4a8b
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>


[ROCm/ROCR-Runtime commit: d8009b4fd3]
2018-11-19 16:09:31 -05:00
shaoyunl a8af1e5e56 Thunk: make scratch memory only map to its own GPU
Map scratch memory to the GPU that specified when allocate the memory

Change-Id: I788f9ef0dccb63b894a75e75cac5f94a60d7ec48
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>


[ROCm/ROCR-Runtime commit: 29b45b8c0a]
2018-11-19 10:26:31 -05:00
Oak Zeng 6087fd7bca Create SDMA queue on specific engine
Change-Id: Iece03795510d66b03324174203faa0ac9eb4fb7d
Signed-off-by: Oak Zeng <ozeng@amd.com>


[ROCm/ROCR-Runtime commit: acb80d7583]
2018-11-13 14:52:57 -05:00
Oak Zeng 6c760dcb74 Move m_Type to a local variable
BaseQueue class has a member function GetQueueType so m_Type
is duplicated.  m_Type is only used in one function. Move it to
a local variable.

Change-Id: Ice144cf723178dd628cb49261c23d10605f9ee7d
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 8d65e72045]
2018-11-13 14:52:17 -05:00
Oak Zeng f0eb1573e6 Create SDMA queue on specific engine
Change-Id: Id651ececda55b81b45e991bd8e6616674be48d8e
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 58b95e0a9d]
2018-11-13 14:52:17 -05:00
Oak Zeng b87f8459f4 Add more SDMA queue type
Those new types are used to create SDMA queue on specific engine

Change-Id: I91c3bcc14fef7404cf42b256a18651432e171091
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 5173e71810]
2018-11-13 14:52:01 -05:00
Oak Zeng 49dbd130f5 Use latest kfd_ioctl.h file
Change-Id: Icd7da4a305581c6857e17d59fbd0c3bd5101df3b
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 055f7c9c2c]
2018-11-13 14:51:46 -05:00
Felix Kuehling 6819730ea3 libhsakmt: Distinguish EPERM and EACCES
EPERM means "operation not permitted" and is returned when CGroup
access checks fail. EACCES means "permission denied" and is returned
when the device file permission bits or access control list don't
allow access.

EPERM can fail silently, since we assume the administrator disabled
a device on purpose in the CGroup. EACCESS should produce an error
message and an info message to check the device file permissions.

Change-Id: Iee4c5584c5fdc4e113c3d760dede6661097b4341
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: 5e4e19d47b]
2018-11-12 17:06:18 -05:00
Mike Li b3fdcfe3b9 Changed scripts to include running kfdtest in docker container
Change-Id: I822ff4869610df6abad846542d7c290b7a5aae79


[ROCm/ROCR-Runtime commit: 3afce42b57]
2018-11-07 16:09:12 -05:00
Gang Ba 9147adc1d5 Add code to support packet capture and replay in the Thunk
This feature only support dgpu for now.

Change-Id: Ic766ec06892c955dd605ecc335a776335edc0df2
Signed-off-by: Gang Ba <gaba@amd.com>


[ROCm/ROCR-Runtime commit: c54c1dbdcb]
2018-10-31 16:53:46 -04:00
Harish Kasiviswanathan 278287f045 libhsakmt: Support device controller cgroup
Device whiltelist controller cgroup allows to track and enforce open and
mknod restrictions on device files. Tasks should works with
/dev/dri/renderN devices that are whitelisted for its cgroup. If a
certain node is not whitelisted it is not an error condition.

Change-Id: I0b997423ccdc00aee98df5b6f04ed6794549604e
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: c1994e28f0]
2018-10-30 11:31:53 -04:00
Kent Russell c3aacd8463 Specify requirement of NUMA libs for Thunk
Add the numa libs to the thunk specs for DEB/RPM, so we can remove the
manual installation requirement


Change-Id: I5aadcf581b64e9a20aee9c1e1204af4715d1e990


[ROCm/ROCR-Runtime commit: 10edccb912]
2018-10-25 07:37:07 -04:00
Philip Cox 84b9ffbbbd Fix Debug Thunk spec mismatch
Move debug trap support capabilities to their own
structure to fix thunk spec vs header mismatch.



Change-Id: I6694601bfa36097502c8ab932e082d7a4645d5b2
Signed-off-by: Philip Cox <Philip.Cox@amd.com>


[ROCm/ROCR-Runtime commit: 105edd4bb4]
2018-10-24 11:32:12 -04:00
xinhui pan 11106ed72f kfdtest: blacklist KFDQMTest.SdmaEventInterrupt
On gfx900+, the test sometimes timeout due to cp fw bug.
Blacklist it until we address the root cause and have a fix.

Change-Id: Iff600a6f6dbd86c56e034f530484205520bced32
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 7a13bb4d66]
2018-10-19 15:29:54 -04:00
xinhui pan 4bf0f9f43c kfdtest: Add more debug information of sdma event interrupt test
We observe this test fails on gfx900+. Looks like the sdma packets are not
executed at all after we submit sometimes.

Run it with timeout 2s on gfx900.
[ RUN      ] KFDQMTest.SdmaEventInterrupt
[----------] SDMACopyData FAIL! 1485262707170 VS 1485262747814
[----------] Event On Queue 1:0 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1859427148
[          ] 2: 680148
[          ] 3: 6370
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[----------] SDMACopyData FAIL! 1485367669958 VS 1485367750022
[----------] Event On Queue 2:1 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1881615148
[          ] 2: 673629
[          ] 3: 6074
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[----------] SDMACopyData FAIL! 1485427671250 VS 1485427751238
[----------] Event On Queue 2:1 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1881508777
[          ] 2: 741629
[          ] 3: 6074
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[  FAILED  ] KFDQMTest.SdmaEventInterrupt (23675 ms)

Change-Id: I7c1b752537d89782570df20838bf976578614f75
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: ab4610cff7]
2018-10-19 15:29:54 -04:00
Yong Zhao e3f00a21ad kfdtest: Clean up the indentations in PM4ReleaseMemoryPacket::InitPacket()
Change-Id: I7f6b08697f6a68bf8c4a388c9f1cf3c3c8e6c81f
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>


[ROCm/ROCR-Runtime commit: d7e6d4706c]
2018-10-17 14:28:15 -04:00
Yong Zhao 569bdf3c84 kfdtest: Improve the SignalEvent test
Create an extra event so that the event id to test is non zero. That
way we can be sure the context id received in kernel ISR is non zero, which
is different from the default value 0 when context id is not set at all.

Change-Id: I7e261d1bbb783d5afd15558c7ac00493b1218cef
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>


[ROCm/ROCR-Runtime commit: 77bab8596f]
2018-10-17 14:27:54 -04:00
Gang Ba 197f731fbc drm/amdkfd: Added gfx904 and gfx803 for KFD.
Change-Id: I4406dc70c776926feaecca3f2146d65259a80517
Signed-off-by: Gang Ba <gaba@amd.com>


[ROCm/ROCR-Runtime commit: 52ec7f805e]
2018-09-25 08:17:44 -04:00
Mike Li 7cd87a5590 all_gpu_id_array: Handle GPU resource management
GPU Resource management can disable some of the GPU nodes.
The Kernel driver could be not aware of this.
Get from Kernel driver information of all the nodes and then filter it.

Change-Id: I4eeb126a5efce2192c35f5d2b72be1811e9ded32
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: 3144a84b9a]
2018-09-24 11:38:11 -04:00
Mike Li 150eaea0af kfdtest: Handle GPU resource management
Currently the FindDRMRenderNode function will access the sysfs
directly to find the render node. It doesn't work with the
GPU management changes. Have changed code to call hsaKmtGetNodeProperties
instead.

Change-Id: I3bb537a323bc1e8c49f38d8aabc60c13e268aecd
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: c3b47c0959]
2018-09-24 11:38:11 -04:00
Mike Li 3feaa41dd7 Output a error message only when open_drm_render_device failed unexpectedly.
Change-Id: I5b9587a8d5c7a900e9ab8611a25d0c49d34b4cef
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: f9bd960344]
2018-09-24 11:36:11 -04:00
xinhui pan 5b7d3a16c5 kfdtest: add P2POverheadTest
This is to measure the laterncy + overhead of sdma packet
consumption on p2p.
It is Similar with QueueLatency test. What's more, the queue's overhead
with different workload show more details.

test result on two gfx900.
[ RUN      ] KFDPerformanceTest.P2POverheadTest
[          ] Test (avg. ns) | Size	4	8	16	64	256	1024
[          ] -----------------------------------------------------------------------
[          ] [push]     [1 -> 0]	333	148	185	111	148	148
[          ] [push]     [1 -> 1]	370	222	333	74	148	111
[          ] [push]     [1 -> 2]	333	148	148	148	148	148
[          ] [push]     [2 -> 0]	111	333	259	148	148	148
[          ] [push]     [2 -> 1]	222	148	185	148	148	148
[          ] [push]     [2 -> 2]	222	111	370	111	74	148
[          ] [pull]     [1 -> 0]	370	296	296	148	185	148
[          ] [pull]     [1 -> 1]	185	333	222	148	222	148
[          ] [pull]     [1 -> 2]	222	444	259	148	185	111
[          ] [pull]     [2 -> 0]	148	148	148	148	148	148
[          ] [pull]     [2 -> 1]	148	148	148	148	148	148
[          ] [pull]     [2 -> 2]	185	148	148	74	222	296
[          ] [push|pull][1 -> 0]	1259	1222	1259	1074	1037	962
[          ] [push|pull][1 -> 1]	1037	1037	1037	740	740	1000
[          ] [push|pull][1 -> 2]	1259	1259	1296	1037	1000	1074
[          ] [push|pull][2 -> 0]	1037	1037	1037	1074	1037	1148
[          ] [push|pull][2 -> 1]	1037	1037	1037	1037	925	1074
[          ] [push|pull][2 -> 2]	666	666	740	740	703	925
[       OK ] KFDPerformanceTest.P2POverheadTest (459 ms)

Change-Id: I422263cb52f7ce184f6f1ff4466d04c239fbe9c9
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 918a45a430]
2018-09-24 09:28:00 -04:00
Harish Kasiviswanathan f709e5f94d Topology: Use processors available to the process
The existing call sysconf (_SC_NPROCESSORS_ONLN) provides the number of
processors available to the scheduler. When a KFD process is run under a
container environment, only a subset (cpuset) of processors are
available to the current process.

For getting CPU cache information use sched_getaffinity() to get the
number of processors available to the current process.

Change-Id: Ieac02f1f61c17e24ac34ba502968c69d3bc631cb
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: fb79a0efe2]
2018-09-21 10:31:59 -04:00
xinhui pan c61fffa876 kfdtest: Add P2P bandwidth test
The test measures the bandwidth between GPUs. Currently we do not
care numa topology as some products really support across PCI-e root
complex p2p.

test result on two gfx900 system.
[ RUN      ] KFDPerformanceTest.P2PBandWidthTest
[          ] Copy from node to node by [push, NONE]
[          ] [1 -> 0] 6.13477 - 6.12695 GB/s
[          ] [1 -> 2] 3.77734 - 3.76855 GB/s
[          ] [2 -> 0] 6.67676 - 6.6543 GB/s
[          ] [2 -> 1] 6.14453 - 6.12793 GB/s
[          ] Copy from node to node by [pull, NONE]
[          ] [1 -> 0] 6.10547 - 6.08105 GB/s
[          ] [1 -> 2] 9.65527 - 9.65039 GB/s
[          ] [2 -> 0] 6.49805 - 6.4873 GB/s
[          ] [2 -> 1] 8.95508 - 8.85254 GB/s
[          ] Full duplex copy from node to node by [push|pull, NONE]
[          ] [1 -> 0] 11.0986 - 11.0986 GB/s
[          ] [1 -> 2] 7.54297 - 7.54297 GB/s
[          ] [2 -> 0] 12.0264 - 11.9639 GB/s
[          ] [2 -> 1] 12.0469 - 12.0371 GB/s
[          ] Full duplex copy from node to node by [push, push]
[          ] [1 <-> 2] 11.7324 - 11.4541 GB/s
[          ] Full duplex copy from node to node by [pull, pull]
[          ] [1 <-> 2] 11.4824 - 11.0508 GB/s
[          ] Copy from node to multiple nodes by [push, NONE]
[          ] [1 -> [0...2]] 5.625 - 5.73633 GB/s
[          ] [2 -> [0...2]] 6.45801 - 6.4707 GB/s
[          ] Copy from multiple nodes to node by [push, NONE]
[          ] [[1...2] -> 0] 12.8379 - 12.2578 GB/s

Now we can get more timestamp info like below.

Copy from node to node by [push, NONE]
[1 -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-###############################
[1 : 1] ####################################################################################################
[1 -> 2]
[1 : 0] #--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-######################################
[1 : 1] ##################################################################################################-#
[2 -> 0]
[2 : 0] ##-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-#################
[2 : 1] ###############################################################################-#############-###-##
[2 -> 1]
[2 : 0] ##-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-####################
[2 : 1] ################################################################################-###-############-##

[snip]

Full duplex copy from node to node by [push, push]
[1 <-> 2]
[1 : 0] #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-####################################
[1 : 1] ################-###################################################-############-####-#############
[2 : 2] #-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##################
[2 : 3] #####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-#####-##
Full duplex copy from node to node by [pull, pull]
[1 <-> 2]
[1 : 0] ######################################################################-##-#-###############-####-###
[1 : 1] #-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-############################
[2 : 2] ##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-############
[2 : 3] #-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#########-#############
Copy from node to multiple nodes by [push, NONE]
[1 -> [0...2]]
[1 : 0] #-#--#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-###############################
[1 : 1] ########################################################################################-###-###-###
[2 -> [0...2]]
[2 : 0] ##-##-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-##################
[2 : 1] -################################################################################################-##
Copy from multiple nodes to node by [push, NONE]
[[1...2] -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-###############################
[1 : 1] ################################################################################################-#-#
[2 : 2] ##-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##################
[2 : 3] #########################-#########################-#########################-#########################
[       OK ] KFDPerformanceTest.P2PBandWidthTest (15982 ms)

Change-Id: Ia90044191d51650ccb220476d31fb317aa3ad6ce
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: e5a541eaf2]
2018-09-19 12:03:05 +08:00
xinhui pan 6b357be502 kfdtest: add KFDTestUtilQueue
Some infrastructures below,
Implement SdmaTimePacket which records the global GPU timestamp.

Introduce class AsyncMPSQ and AsyncMPMQ.
AsyncMPSQ is aka async multiple packet single queue. It takes a set of
packet when create and submits them to a GPU to run. While AsyncMPMQ is
aka async multiple packet multiple queue. It manages a set of AsyncMPSQ,
and use a forloop to do operations of AsyncMPSQ.

Implement sdma_multicopy helper functions.

Change-Id: I47e1d2ca9630113b2a1d85a0055f3f8ee629fb5f
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: f618b3f075]
2018-09-19 12:03:05 +08:00
Xiaojie Yuan ca0873a234 Use 'RecordProperty' to record performance scores
For following test cases:
- KFDQMTest.QueueLatency
- KFDQMTest.BasicCuMaskingLinear
- KFDQMTest.BasicCuMaskingEven
- KFDMemoryTest.MMBandWidth
- KFDMemoryTest.MMapLarge
- KFDMemoryTest.MMBench

v2: xml element cannot start with a number, so change the key name of
    MMBandWidth and MMBench accordingly
    xml element cannot contain whitespaces, so trim whitespaces in "VRAM  "
v3: introduce KFDLog-like way to use KFDRecord

Change-Id: Ifc3ed5657621252a7b39dccf1ef4f50a92593f77
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>


[ROCm/ROCR-Runtime commit: 247fa9f1e0]
2018-09-18 17:41:14 +08:00
xinhui pan 175bd1ed3d kfdtest: Do not set GTEST_FLAG throw_on_failure
This change is from commit a505c9bb("kfdtest: Do not set GTEST_FLAG
throw_on_failure").
But it is unexpected to reverted by commit b86f1456("kfdtest: Clean up
comments"). So add this change back.

Fix: b86f1456

Change-Id: Ia9e99c9ca17b99aab62b4db55017018ddae43dfb
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: a6287ba919]
2018-09-11 10:25:56 +08:00
xinhui pan 501d3878ae kfdtest: Fix queuelatency fail issue
The timestamp written by releaseMemory packet might still not be visible
when we fetch it.
To fix this bug, use event-based wait.

Change-Id: If2324eb3b3a632c711ee4dff4d03a93d5306c289
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 07bd97a864]
2018-09-10 21:17:29 -04:00
Felix Kuehling c08dca02d7 libhsakmt: Fix segfault on gfx801
Handle the case that svm.dgpu_aperture does not exist in vm_find_object.

Change-Id: Ic0983d4f321f1b6248514f2fa25162976e90bd75
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: be574169c1]
2018-09-10 14:39:05 -04:00
Harish Kasiviswanathan af0eadcee6 kfdtest: GetNodeIoLinkProperties: Display NodeFrom
Use the NodeFrom returned by hsaKmtGetNodeIoLinkProperties() to check
its correctness.

Change-Id: I6ce436dc7c5d5b192bee21156292bd3eff77f916
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 1fda429726]
2018-09-10 09:44:24 -04:00
Harish Kasiviswanathan a0cee77f82 Add cgroup support
Some nodes are unavailable based on the task's cgroup hierarchy. Handle
this situation by ignoring those nodes

Change-Id: I72f9e822d2ec8cf15732df95e427d5549a75b55d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 7876bb70a9]
2018-09-06 16:56:32 -04:00
Harish Kasiviswanathan ef0dad6679 iolinks: Handle GPU resource management
With GPU resource management, some nodes are unavailable based on the
cgroup hierarchy of the task. Kernel via sysfs specifies all the
iolinks. Skip the links which are not accessible.

Also iolinks specified by the kernel refer to sysfs Node IDs. Map it to
relevant user Node IDs

v2: NodeFrom mapped from sysfs Node to User Node

Change-Id: I95312ee6ca51b89fe9e6ca2a9185c2ea1e94afc4
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 866ef20054]
2018-09-06 16:56:07 -04:00
Harish Kasiviswanathan b3329ec72d Replace global variable _system with g_system
Change-Id: I452090473a5b46b32204f7f916bdcfdd3e8a47bd
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: f84a99e953]
2018-09-06 16:56:07 -04:00