نمودار کامیت

2930 کامیت‌ها

مولف SHA1 پیام تاریخ
Kent Russell 4a068e18dd Temporarily remove SDMA tests from gfx906
SDMA is being flaky, so remove SDMA tests from it for now

Change-Id: Ia3612566813f925804ab90d6235520da7cc65926


[ROCm/ROCR-Runtime commit: 3a2ec0111e]
2018-12-05 08:41:16 -05:00
Kent Russell 18f1cc0e5b Remove SDMAConcurrentCopies from gfx906 execution
This is intermittently causing VM faults and excessive evictions, which
causes the rest of the tests to fail. Take it out for now until someone
can investigate

Change-Id: I9c43890bc9f03a4a31efbc18df0df5e40a232c58


[ROCm/ROCR-Runtime commit: 381dba3932]
2018-11-28 10:01:35 -05:00
Eric Huang 56b9bb17a7 libhsakmt: add RAS support
RAS feature enabling bit and errors return are implemented in
existed topology and event mechanism.

Change-Id: I9b018bba80cf4a6998e42a7bff64318c689b1d2a
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 1fbe010354]
2018-11-23 11:42:34 -05:00
Ramesh Errabolu efc2ac9024 Initialize queue buffer with Invalid Pkt Headers
Change-Id: I4166f1359746ee6829b730bac2db358af72ab16e


[ROCm/ROCR-Runtime commit: 28c3f9a269]
2018-11-21 19:09:10 -05:00
Mark Searles 508124a012 Force object code v2 until v3 is supported
Change-Id: I4c2a64bf9bd515686d1f1d90aece2a9ac40e5685


[ROCm/ROCR-Runtime commit: 8ea836017a]
2018-11-21 10:06:08 -08:00
changzhu 6ab4bbe2a8 kfdtest: fix SDMACopyParams build error on redhat 7.2 in KFDTestUtilQueue.cpp
In file included from /usr/include/c++/4.8.2/algorithm:62:0,
                 from /home/jenkins/libhsakmt/tests/kfdtest/src/KFDTestUtilQueue.cpp:24:
/usr/include/c++/4.8.2/bits/stl_algo.h: In instantiation of ‘_RandomAccessIterator std::__unguarded_partition(_RandomAccessIterator, _RandomAccessIterator, const _Tp&, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<SDMACopyParams*, std::vector<SDMACopyParams> >; _Tp = SDMACopyParams; _Compare = bool (*)(SDMACopyParams&, SDMACopyParams&)]’:
/usr/include/c++/4.8.2/bits/stl_algo.h:2296:78:   required from ‘_RandomAccessIterator std::__unguarded_partition_pivot(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<SDMACopyParams*, std::vector<SDMACopyParams> >; _Compare = bool (*)(SDMACopyParams&, SDMACopyParams&)]’
/usr/include/c++/4.8.2/bits/stl_algo.h:2337:62:   required from ‘void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<SDMACopyParams*, std::vector<SDMACopyParams> >; _Size = long int; _Compare = bool (*)(SDMACopyParams&, SDMACopyParams&)]’
/usr/include/c++/4.8.2/bits/stl_algo.h:5499:44:   required from ‘void std::sort(_RAIter, _RAIter, _Compare) [with _RAIter = __gnu_cxx::__normal_iterator<SDMACopyParams*, std::vector<SDMACopyParams> >; _Compare = bool (*)(SDMACopyParams&, SDMACopyParams&)]’
/home/jenkins/libhsakmt/tests/kfdtest/src/KFDTestUtilQueue.cpp:351:66:   required from here
/usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: error: invalid initialization of reference of type ‘SDMACopyParams&’ from expression of type ‘const SDMACopyParams’
    while (__comp(*__first, __pivot))
                                   ^
/usr/include/c++/4.8.2/bits/stl_algo.h:2266:34: error: invalid initialization of reference of type ‘SDMACopyParams&’ from expression of type ‘const SDMACopyParams’
    while (__comp(__pivot, *__last))
                                  ^

Change-Id: I0fce0c7e6d0a0ce93b1e6522ee8f216615765568
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>


[ROCm/ROCR-Runtime commit: c15cf2e9c3]
2018-11-21 17:23:03 +08:00
Oak Zeng 5d15953efb Add test to allocate SDMA queue on specific engine
Change-Id: I5b5140e4119fc01db250d63cca7389cf80ec0d16
Signed-off-by: Oak Zeng <ozeng@amd.com>


[ROCm/ROCR-Runtime commit: af5b320c47]
2018-11-20 11:17:43 -05:00
shaoyunl 0ad77ef647 KFDTest: fix failure when run KFDTest on multi-GPU small bar system
On small bar multi-gpu system, hsaKmtMemoryMapToGPU will fail due to latest
kernel P2P sanity check. Swith to use hsaKmtMemoryMapToGPUNodes to fix
the failure

Change-Id: Id8b6329d1243df0e908cc9a171b5c7f9156f4a8b
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>


[ROCm/ROCR-Runtime commit: d8009b4fd3]
2018-11-19 16:09:31 -05:00
shaoyunl a8af1e5e56 Thunk: make scratch memory only map to its own GPU
Map scratch memory to the GPU that specified when allocate the memory

Change-Id: I788f9ef0dccb63b894a75e75cac5f94a60d7ec48
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>


[ROCm/ROCR-Runtime commit: 29b45b8c0a]
2018-11-19 10:26:31 -05:00
Sean Keely d79cd9abf3 Check max wave scratch limits.
HW has limited bits for wave scratch base address stride.  Enforcement
prevents programs with larger than supported scratch allocations from
running and clobbering neighboring scratch space.

Change-Id: I574da888e9d1d5e290a9c0025ba13b5ef9f1e5c0


[ROCm/ROCR-Runtime commit: 8e4177382a]
2018-11-16 20:59:20 -05:00
Sean Keely d5c5f476fb Disable forced explicit selection of public vs internal HSA interfaces.
Temporary to reenable OCL builds on TC.

Change-Id: Ia81f2f9a9dd10ae8ce9627313247a586a8711584


[ROCm/ROCR-Runtime commit: 269be0be2e]
2018-11-16 15:26:26 -06:00
Konstantin Zhuravlyov fde14b8588 Fix dynamic relocations:
- Process dynamic relocation even if there is
    no symbol associated to it.

Change-Id: Iaefee682ee52f5acda8280e5764e6d5fd992774a


[ROCm/ROCR-Runtime commit: a447d79430]
2018-11-14 15:25:41 -05:00
Oak Zeng 6087fd7bca Create SDMA queue on specific engine
Change-Id: Iece03795510d66b03324174203faa0ac9eb4fb7d
Signed-off-by: Oak Zeng <ozeng@amd.com>


[ROCm/ROCR-Runtime commit: acb80d7583]
2018-11-13 14:52:57 -05:00
Oak Zeng 6c760dcb74 Move m_Type to a local variable
BaseQueue class has a member function GetQueueType so m_Type
is duplicated.  m_Type is only used in one function. Move it to
a local variable.

Change-Id: Ice144cf723178dd628cb49261c23d10605f9ee7d
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 8d65e72045]
2018-11-13 14:52:17 -05:00
Oak Zeng f0eb1573e6 Create SDMA queue on specific engine
Change-Id: Id651ececda55b81b45e991bd8e6616674be48d8e
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 58b95e0a9d]
2018-11-13 14:52:17 -05:00
Oak Zeng b87f8459f4 Add more SDMA queue type
Those new types are used to create SDMA queue on specific engine

Change-Id: I91c3bcc14fef7404cf42b256a18651432e171091
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 5173e71810]
2018-11-13 14:52:01 -05:00
Oak Zeng 49dbd130f5 Use latest kfd_ioctl.h file
Change-Id: Icd7da4a305581c6857e17d59fbd0c3bd5101df3b
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 055f7c9c2c]
2018-11-13 14:51:46 -05:00
Sean Keely 799e40f3b9 Cache KFD Events used by user allocated InterruptSignals.
Change-Id: I7f102f880fea9c78febe28cd262f93ee77f03184


[ROCm/ROCR-Runtime commit: 4e8597681b]
2018-11-12 22:37:42 -06:00
Sean Keely ed18ee7f38 Add pooling for Signal ABI blocks (SharedSignal).
Makes better use of memory and greatly reduces mmap count.

Change-Id: Ib444cd1ccd144986adbcc7cec297a966e2c08bc7


[ROCm/ROCR-Runtime commit: 8323b2e1d7]
2018-11-12 22:37:28 -06:00
Felix Kuehling 6819730ea3 libhsakmt: Distinguish EPERM and EACCES
EPERM means "operation not permitted" and is returned when CGroup
access checks fail. EACCES means "permission denied" and is returned
when the device file permission bits or access control list don't
allow access.

EPERM can fail silently, since we assume the administrator disabled
a device on purpose in the CGroup. EACCESS should produce an error
message and an info message to check the device file permissions.

Change-Id: Iee4c5584c5fdc4e113c3d760dede6661097b4341
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: 5e4e19d47b]
2018-11-12 17:06:18 -05:00
Sean Keely 9652ba6de2 Remove legacy SVM region concept.
Also rename blit_agent to region_gpu and add comments to clarify
its role in deprecated region API support rather than to do blits.

Change-Id: I80b1043db2e1c5d40a58fc801eef70a688ea9169


[ROCm/ROCR-Runtime commit: 936ecd1885]
2018-11-09 06:27:53 -06:00
Sean Keely 3f55198dd5 Move VM fault handler init to after all devices are registered.
During registration we must not call any function that depends on registered
data as the lists are not yet complete.  This includes signal allocation since
allocating shared GPU mapped memory depends on the list of GPUs.

Change-Id: I94d59e847802c546c2a5a0d9f55fe5ac3fd1d878


[ROCm/ROCR-Runtime commit: dda9c17b45]
2018-11-09 03:10:08 -06:00
Mike Li b3fdcfe3b9 Changed scripts to include running kfdtest in docker container
Change-Id: I822ff4869610df6abad846542d7c290b7a5aae79


[ROCm/ROCR-Runtime commit: 3afce42b57]
2018-11-07 16:09:12 -05:00
Gang Ba 9147adc1d5 Add code to support packet capture and replay in the Thunk
This feature only support dgpu for now.

Change-Id: Ic766ec06892c955dd605ecc335a776335edc0df2
Signed-off-by: Gang Ba <gaba@amd.com>


[ROCm/ROCR-Runtime commit: c54c1dbdcb]
2018-10-31 16:53:46 -04:00
Harish Kasiviswanathan 278287f045 libhsakmt: Support device controller cgroup
Device whiltelist controller cgroup allows to track and enforce open and
mknod restrictions on device files. Tasks should works with
/dev/dri/renderN devices that are whitelisted for its cgroup. If a
certain node is not whitelisted it is not an error condition.

Change-Id: I0b997423ccdc00aee98df5b6f04ed6794549604e
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: c1994e28f0]
2018-10-30 11:31:53 -04:00
Kent Russell c3aacd8463 Specify requirement of NUMA libs for Thunk
Add the numa libs to the thunk specs for DEB/RPM, so we can remove the
manual installation requirement


Change-Id: I5aadcf581b64e9a20aee9c1e1204af4715d1e990


[ROCm/ROCR-Runtime commit: 10edccb912]
2018-10-25 07:37:07 -04:00
Philip Cox 84b9ffbbbd Fix Debug Thunk spec mismatch
Move debug trap support capabilities to their own
structure to fix thunk spec vs header mismatch.



Change-Id: I6694601bfa36097502c8ab932e082d7a4645d5b2
Signed-off-by: Philip Cox <Philip.Cox@amd.com>


[ROCm/ROCR-Runtime commit: 105edd4bb4]
2018-10-24 11:32:12 -04:00
Sean Keely 37aead15c7 Ensure runtime cleanup when hsa_init ref count reaches 0.
Delete the runtime object when the last hsa_shut_down occurs.

Change-Id: I2005d52d06702eaef166714fd5e471cc277924db


[ROCm/ROCR-Runtime commit: 9ec37b5103]
2018-10-22 19:32:00 -05:00
Evgeny e53c7c63c0 aqlprofile extension version check
Change-Id: If824764f199eca15a0341cdf6177d8d6353e29f3


[ROCm/ROCR-Runtime commit: d788a53972]
2018-10-22 15:36:57 -04:00
Sean Keely b8de13150b Report internal queue creation to tools.
Debug agent requires handles to internal queues for single step debugging.
Added tools only API hsa_amd_runtime_queue_create_register for reporting.

hsa_amd_runtime_queue_create_register sets a callback which is invoked
when internal queues are created.

Change-Id: Ia5190ae724fadba686c15f25b2cd085350eeff0e


[ROCm/ROCR-Runtime commit: 757502ccd6]
2018-10-20 23:12:27 -04:00
Sean Keely 5aa7af4280 Fully initialize GPU agents before loading tools.
Required for debug agent requires copy API and trap handler to be initalized
prior to loading.  Existing tools do not make use of internal queue or scratch
memory intercept which is what PostToolsInit allows.

PostToolsInit() will be removed in a following cleanup change.

Change-Id: If43377843808e3eff0defd9204910a67a852902f


[ROCm/ROCR-Runtime commit: 5975c465ad]
2018-10-20 23:12:14 -04:00
Sean Keely b0013a3e4d Refactor of Runtime::CopyMemory()
Change-Id: I32a7cb24d00660ff4471d121ef7b3c2eec8fced2


[ROCm/ROCR-Runtime commit: 6852282a07]
2018-10-20 14:38:50 -04:00
xinhui pan 11106ed72f kfdtest: blacklist KFDQMTest.SdmaEventInterrupt
On gfx900+, the test sometimes timeout due to cp fw bug.
Blacklist it until we address the root cause and have a fix.

Change-Id: Iff600a6f6dbd86c56e034f530484205520bced32
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 7a13bb4d66]
2018-10-19 15:29:54 -04:00
xinhui pan 4bf0f9f43c kfdtest: Add more debug information of sdma event interrupt test
We observe this test fails on gfx900+. Looks like the sdma packets are not
executed at all after we submit sometimes.

Run it with timeout 2s on gfx900.
[ RUN      ] KFDQMTest.SdmaEventInterrupt
[----------] SDMACopyData FAIL! 1485262707170 VS 1485262747814
[----------] Event On Queue 1:0 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1859427148
[          ] 2: 680148
[          ] 3: 6370
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[----------] SDMACopyData FAIL! 1485367669958 VS 1485367750022
[----------] Event On Queue 2:1 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1881615148
[          ] 2: 673629
[          ] 3: 6074
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[----------] SDMACopyData FAIL! 1485427671250 VS 1485427751238
[----------] Event On Queue 2:1 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1881508777
[          ] 2: 741629
[          ] 3: 6074
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[  FAILED  ] KFDQMTest.SdmaEventInterrupt (23675 ms)

Change-Id: I7c1b752537d89782570df20838bf976578614f75
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: ab4610cff7]
2018-10-19 15:29:54 -04:00
Yong Zhao e3f00a21ad kfdtest: Clean up the indentations in PM4ReleaseMemoryPacket::InitPacket()
Change-Id: I7f6b08697f6a68bf8c4a388c9f1cf3c3c8e6c81f
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>


[ROCm/ROCR-Runtime commit: d7e6d4706c]
2018-10-17 14:28:15 -04:00
Yong Zhao 569bdf3c84 kfdtest: Improve the SignalEvent test
Create an extra event so that the event id to test is non zero. That
way we can be sure the context id received in kernel ISR is non zero, which
is different from the default value 0 when context id is not set at all.

Change-Id: I7e261d1bbb783d5afd15558c7ac00493b1218cef
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>


[ROCm/ROCR-Runtime commit: 77bab8596f]
2018-10-17 14:27:54 -04:00
Konstantin Zhuravlyov 6d5b1f0bde Loader: Update license for AMDHSAKernelDescriptor.h
Change-Id: I3a48b595ba089ca8a25f878c056b04a417a2364f


[ROCm/ROCR-Runtime commit: 509bb777e0]
2018-10-12 14:51:05 -04:00
Sean Keely 5f454d102d Use ptrinfo rather than apertures in hsa_memory_copy
Apertures now overlap with the change to 48bit addressing which
precludes using aperture checks to discover buffer ownership.
Switches to ptrinfo to decide which device a buffer owned by.

This corrects faults in the legacy hsa_memory_copy api.

Change-Id: I5c7ce0216e1cdc96f836fc6fec9c3defdf4b9d90


[ROCm/ROCR-Runtime commit: 1e0d690948]
2018-10-11 13:34:53 -04:00
Konstantin Zhuravlyov dd2ab28ddb Loader: Add support for v3 object code.
Change-Id: I7215bd0c1277c2036bf0fadf5b23cb57fdf7f665


[ROCm/ROCR-Runtime commit: 386874da55]
2018-10-06 14:01:59 -04:00
Jay Cornwall e2454b084b Revert "Extend SDMA disable list until firmware stability resolved"
This reverts commit 795fd231b0.

Change-Id: I17b379e4d0e49a79dc8d4a60f01ea424fda24f02


[ROCm/ROCR-Runtime commit: f1ffbc3286]
2018-10-05 15:17:27 -04:00
Kent Russell 61249bc910 Only remove ldconf on uninstall
On update, the removal will occur AFTER the new package is installed,
due to some stupidity with how yum/rpm does things. Only remove it if
we're doing a pure uninstall

Change-Id: I4982610828d8bc1f2d8691b1e4ee1718c89413cc


[ROCm/ROCR-Runtime commit: ed9baefd75]
2018-10-03 08:10:06 -04:00
Evgeny 54428e93aa hsa_ven_amd_aqlprofile_pfn_t alias
Change-Id: Ia4a67ef0d2f8975f0e541e85c215afec76e9de5f


[ROCm/ROCR-Runtime commit: fdbe277f2a]
2018-09-26 14:10:21 -04:00
Gang Ba 197f731fbc drm/amdkfd: Added gfx904 and gfx803 for KFD.
Change-Id: I4406dc70c776926feaecca3f2146d65259a80517
Signed-off-by: Gang Ba <gaba@amd.com>


[ROCm/ROCR-Runtime commit: 52ec7f805e]
2018-09-25 08:17:44 -04:00
Mike Li 7cd87a5590 all_gpu_id_array: Handle GPU resource management
GPU Resource management can disable some of the GPU nodes.
The Kernel driver could be not aware of this.
Get from Kernel driver information of all the nodes and then filter it.

Change-Id: I4eeb126a5efce2192c35f5d2b72be1811e9ded32
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: 3144a84b9a]
2018-09-24 11:38:11 -04:00
Mike Li 150eaea0af kfdtest: Handle GPU resource management
Currently the FindDRMRenderNode function will access the sysfs
directly to find the render node. It doesn't work with the
GPU management changes. Have changed code to call hsaKmtGetNodeProperties
instead.

Change-Id: I3bb537a323bc1e8c49f38d8aabc60c13e268aecd
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: c3b47c0959]
2018-09-24 11:38:11 -04:00
Mike Li 3feaa41dd7 Output a error message only when open_drm_render_device failed unexpectedly.
Change-Id: I5b9587a8d5c7a900e9ab8611a25d0c49d34b4cef
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: f9bd960344]
2018-09-24 11:36:11 -04:00
xinhui pan 5b7d3a16c5 kfdtest: add P2POverheadTest
This is to measure the laterncy + overhead of sdma packet
consumption on p2p.
It is Similar with QueueLatency test. What's more, the queue's overhead
with different workload show more details.

test result on two gfx900.
[ RUN      ] KFDPerformanceTest.P2POverheadTest
[          ] Test (avg. ns) | Size	4	8	16	64	256	1024
[          ] -----------------------------------------------------------------------
[          ] [push]     [1 -> 0]	333	148	185	111	148	148
[          ] [push]     [1 -> 1]	370	222	333	74	148	111
[          ] [push]     [1 -> 2]	333	148	148	148	148	148
[          ] [push]     [2 -> 0]	111	333	259	148	148	148
[          ] [push]     [2 -> 1]	222	148	185	148	148	148
[          ] [push]     [2 -> 2]	222	111	370	111	74	148
[          ] [pull]     [1 -> 0]	370	296	296	148	185	148
[          ] [pull]     [1 -> 1]	185	333	222	148	222	148
[          ] [pull]     [1 -> 2]	222	444	259	148	185	111
[          ] [pull]     [2 -> 0]	148	148	148	148	148	148
[          ] [pull]     [2 -> 1]	148	148	148	148	148	148
[          ] [pull]     [2 -> 2]	185	148	148	74	222	296
[          ] [push|pull][1 -> 0]	1259	1222	1259	1074	1037	962
[          ] [push|pull][1 -> 1]	1037	1037	1037	740	740	1000
[          ] [push|pull][1 -> 2]	1259	1259	1296	1037	1000	1074
[          ] [push|pull][2 -> 0]	1037	1037	1037	1074	1037	1148
[          ] [push|pull][2 -> 1]	1037	1037	1037	1037	925	1074
[          ] [push|pull][2 -> 2]	666	666	740	740	703	925
[       OK ] KFDPerformanceTest.P2POverheadTest (459 ms)

Change-Id: I422263cb52f7ce184f6f1ff4466d04c239fbe9c9
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 918a45a430]
2018-09-24 09:28:00 -04:00
Scott Linder 42d4d4ebcf Apply dynamic relocations for STT_FUNC symbols
Required to support function calls through GOT table.

Change-Id: I174a0269fdd67369d38fe41855b7bd01f350b839


[ROCm/ROCR-Runtime commit: 47f0e6f7d3]
2018-09-23 21:42:32 -04:00
Harish Kasiviswanathan f709e5f94d Topology: Use processors available to the process
The existing call sysconf (_SC_NPROCESSORS_ONLN) provides the number of
processors available to the scheduler. When a KFD process is run under a
container environment, only a subset (cpuset) of processors are
available to the current process.

For getting CPU cache information use sched_getaffinity() to get the
number of processors available to the current process.

Change-Id: Ieac02f1f61c17e24ac34ba502968c69d3bc631cb
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: fb79a0efe2]
2018-09-21 10:31:59 -04:00
xinhui pan c61fffa876 kfdtest: Add P2P bandwidth test
The test measures the bandwidth between GPUs. Currently we do not
care numa topology as some products really support across PCI-e root
complex p2p.

test result on two gfx900 system.
[ RUN      ] KFDPerformanceTest.P2PBandWidthTest
[          ] Copy from node to node by [push, NONE]
[          ] [1 -> 0] 6.13477 - 6.12695 GB/s
[          ] [1 -> 2] 3.77734 - 3.76855 GB/s
[          ] [2 -> 0] 6.67676 - 6.6543 GB/s
[          ] [2 -> 1] 6.14453 - 6.12793 GB/s
[          ] Copy from node to node by [pull, NONE]
[          ] [1 -> 0] 6.10547 - 6.08105 GB/s
[          ] [1 -> 2] 9.65527 - 9.65039 GB/s
[          ] [2 -> 0] 6.49805 - 6.4873 GB/s
[          ] [2 -> 1] 8.95508 - 8.85254 GB/s
[          ] Full duplex copy from node to node by [push|pull, NONE]
[          ] [1 -> 0] 11.0986 - 11.0986 GB/s
[          ] [1 -> 2] 7.54297 - 7.54297 GB/s
[          ] [2 -> 0] 12.0264 - 11.9639 GB/s
[          ] [2 -> 1] 12.0469 - 12.0371 GB/s
[          ] Full duplex copy from node to node by [push, push]
[          ] [1 <-> 2] 11.7324 - 11.4541 GB/s
[          ] Full duplex copy from node to node by [pull, pull]
[          ] [1 <-> 2] 11.4824 - 11.0508 GB/s
[          ] Copy from node to multiple nodes by [push, NONE]
[          ] [1 -> [0...2]] 5.625 - 5.73633 GB/s
[          ] [2 -> [0...2]] 6.45801 - 6.4707 GB/s
[          ] Copy from multiple nodes to node by [push, NONE]
[          ] [[1...2] -> 0] 12.8379 - 12.2578 GB/s

Now we can get more timestamp info like below.

Copy from node to node by [push, NONE]
[1 -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-###############################
[1 : 1] ####################################################################################################
[1 -> 2]
[1 : 0] #--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-######################################
[1 : 1] ##################################################################################################-#
[2 -> 0]
[2 : 0] ##-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-#################
[2 : 1] ###############################################################################-#############-###-##
[2 -> 1]
[2 : 0] ##-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-####################
[2 : 1] ################################################################################-###-############-##

[snip]

Full duplex copy from node to node by [push, push]
[1 <-> 2]
[1 : 0] #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-####################################
[1 : 1] ################-###################################################-############-####-#############
[2 : 2] #-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##################
[2 : 3] #####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-#####-##
Full duplex copy from node to node by [pull, pull]
[1 <-> 2]
[1 : 0] ######################################################################-##-#-###############-####-###
[1 : 1] #-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-############################
[2 : 2] ##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-############
[2 : 3] #-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#########-#############
Copy from node to multiple nodes by [push, NONE]
[1 -> [0...2]]
[1 : 0] #-#--#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-###############################
[1 : 1] ########################################################################################-###-###-###
[2 -> [0...2]]
[2 : 0] ##-##-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-##################
[2 : 1] -################################################################################################-##
Copy from multiple nodes to node by [push, NONE]
[[1...2] -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-###############################
[1 : 1] ################################################################################################-#-#
[2 : 2] ##-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##################
[2 : 3] #########################-#########################-#########################-#########################
[       OK ] KFDPerformanceTest.P2PBandWidthTest (15982 ms)

Change-Id: Ia90044191d51650ccb220476d31fb317aa3ad6ce
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: e5a541eaf2]
2018-09-19 12:03:05 +08:00