Graphe des révisions

262 Révisions

Auteur SHA1 Message Date
Ben Sander ff2f54c1bf Add additional controls for forcing serialization and blocking.
Move HIP_COHERENT_HOST_ALLOC so it is read once at init time.
Add HIP_LAUNCH_BLOCKING_KERNELS, HIP_API_BLOCKING.
Update docs on debug and chicken bits.

Conflicts:
	src/hip_hcc.cpp
2016-12-02 18:03:59 -06:00
pensun 0dfcd3e664 Change to use produce device name by default
Change-Id: Ie2cee2a2e94a08b5874a2f5abee5d1ab6c9fdf47
2016-11-29 11:34:06 -06:00
Ben Sander ce92a53f25 Add more debug info 2016-11-26 08:56:02 -06:00
Ben Sander dec59d9909 Improve docs in some places
Change-Id: If31e84fbf0c8595ca72edb842dce7ce47783579b
2016-11-23 08:16:18 -06:00
Ben Sander b6ae6b08fb Improve debug capabilities.
Print TID mapping at init when HIP_TRACE_API=1.
Print base host/dev info from tracker during copy.

Change-Id: I84e26d7b801567e5a91baad36126fb590920ec87
2016-11-23 08:16:18 -06:00
Rahul Garg 2dcf20ac6f Removed hsaKmtReleaseSystemProperties call
Change-Id: I7cb992cccf587c333f0ca0cb518409f3944bdb06
2016-11-22 06:15:35 +05:30
Maneesh Gupta c0419cc749 Refactor for building HIP as dynamic library
Change-Id: I65a3d9d589c4fdbbdcf1611e5427224253be8260
2016-11-18 14:33:20 +05:30
Aditya Atluri 12dd9df88f Added i8 packed math intrinsics
1. Added add, sub, mul packed math i8 intrinsics
2. Removed c++ packed data structures included from HCC

Change-Id: I1d109c5ce10c48b7cd3ea059478b88fc1de78499
TODO: Add better packed data structures support
2016-11-17 01:09:12 -06:00
Maneesh Gupta 888a3528d2 Enable USE_COPY_EXT_V2 by default
Change-Id: I2c0dc80f85a0ccb5744715b5418a604e38b249ed
2016-11-15 10:42:27 +05:30
Aditya Atluri abf6872b2b fixed multi-dim module kernel launch
Change-Id: Id1d81f2375d058979ab526433f905cf0ea3d23d6
2016-11-11 12:25:23 -06:00
Ben Sander 1e5515ee9f Add option to deny peer access.
Also fix test.

Change-Id: I1b247f6c4271442b008e560669bca4daf8eb94c7
2016-11-10 23:12:48 -06:00
Ben Sander 65584e48de Use forceUnpinnedCopy to resolve P2p corner cases.
Change-Id: I2aebb419881246cebb696bec87798635bc71acc2
2016-11-10 23:12:48 -06:00
Ben Sander d3d6feb4de Enable async copy again.
Also add HIP_FORCE_SYNC_COPY chicken bit.

Change-Id: I76a385410494b99bf27305d3c08f55dd81987565
2016-11-10 23:12:48 -06:00
Ben Sander ced9d72d94 Refactor copy and P2P logic.
Prefer use of source-engine for DMA copies, even if user submits copy
in a stream attached to a different device.
The stream is now used only for synchronization, and HIP
makes the most optimal decision for which engine to perform the
copy - typically the source copy engine.

HIP now makes decision on which engine should perform the copy
and passes this to HCC using new apis.
HIP has additional information about peer
visibility and will make a decision which agent should perform
the copy .

Change-Id: I0cf4cfebeae256e6ca795f08a7ed7130f4857d1f
2016-11-10 23:12:48 -06:00
Ben Sander d728819d17 Improve Peer support and testing.
Change-Id: Icadc65988aaf145a265587ab0357c5bf4d26f3eb
2016-11-06 03:22:36 -06:00
Ben Sander 092b3dacda Set forceHostCopyEngine for other copy dirs. Support HIP_FORCE_P2P_HOST
Also: more debug for copy and P2p.

Change-Id: I87030c525410e041b2a00baaf6c68e6c0977ff42
2016-11-04 19:53:23 -05:00
Ben Sander 5d79384832 Refactor resolve-mem step1
Change-Id: I7b8b2bbb56d7b31a97b48ebd42002641cd07a460
2016-11-04 09:37:56 -05:00
Ben Sander 3f0a2b8dc1 Add debug for Peer APIs. Enable PeerMemcpy APIs by default.
Change-Id: I46e39a9e7b07686a78484c1f3b5495b08e052fbb
2016-11-04 08:51:16 -05:00
Aditya Atluri f48c53534e added inter thread data movement intrinsics
Change-Id: I2a8a8ed49429cb7f96439bd28c4b83b5142737df
2016-11-01 16:37:33 -05:00
Ben Sander 024d9ab090 Print short hipLaunchKernel correctly.
Change-Id: I6ca03d7c707cd03d6982199830213953d5855f17
2016-10-27 23:09:32 -05:00
Ben Sander bb58f4f6fc Add initial hipProfileStart/Stop
And modify sample to show how to use.
Still needs some work to understand interaction with CXL.

Change-Id: I2579824d2dd7863ea23874d34f0dabb3cb305d3e
2016-10-27 23:09:32 -05:00
Ben Sander ef8eac9b66 Add two levels of HIP_PROFILE_API (1=short,2=long)
Change-Id: I7ef98589f8731fb879db109fd573c62b489f2b61
2016-10-27 23:09:31 -05:00
Ben Sander ab1836544a Fix scoped marker so begin/end ATP timestamps correct
Change-Id: Ic944d3fc00d7bc31b756c0e6c327b99eb489537e
2016-10-27 23:09:31 -05:00
Ben Sander e9056798f6 Rename HIP_ATP_MARKER and profiling vars
HIP_PROFILE_API
HIP_DB_START_API
HIP_DB_STOP_API

Change-Id: I6c4da67212ff8217e6356a2622d4c6278a188c34
2016-10-27 23:09:31 -05:00
Ben Sander f5e8090f2f Allow HIP_DB to be number or string flags (ie HIP_DB=api+mem+sync)
Add callbacks for processing env vars.

Change-Id: I4ddf50e2da56b1dae43f50657bc693b07b23c03d
2016-10-27 23:09:31 -05:00
Ben Sander 710be682ca Add HIP_PROFILE_START_API, HIP_PROFILE_STOP_API
Refactor HIP_INIT_API to call recordApiTrace.

Change-Id: Ieff4b5018236f59e49e1b9841474440a34f821df
2016-10-27 23:09:31 -05:00
Ben Sander 739bc37503 Add per-thread API seqnum to debug
Change-Id: Ib13733a3e84cd56bae13a32bae40f936c20b7543
2016-10-27 23:09:31 -05:00
Ben Sander 346c519ace Improve HIP TID printing in debug mode.
Map long thread-id to a short one that is printed with each message.
Remove clunky stirng creation code for tid_tr.
Print TID on every message.

Change-Id: I780a91d8ce789cb4957789036b478bf5cde8c4e4
2016-10-27 23:09:31 -05:00
Aditya Atluri e1c1b4c009 reverted change for cache size query
Change-Id: I44a1f43818cd287a2a3b6265f43d183f9bd5b71c
2016-10-25 11:03:35 -05:00
Aditya Atluri 820a914b98 correct cachesize to output correct value
Change-Id: I5db031591eb718b0c12e78a35e4b19349de9526d
2016-10-25 09:33:45 -05:00
Ben Sander 203348ac9b Fix P2P for async
Also improve HIP debug message: Add more DB_COPY1 messages. memcpyStr,
expand HIP_DB bitmask.
2016-10-20 10:02:23 -05:00
Ben Sander 000d75de95 Add HIP_WAIT_MODE env var.
Also weaken cases where hipSetDeviceFlags returns hipErrorInvalidValue.

Change-Id: I7f113338be6fe498eaf1ab40fd0fd6b23849bb5e
2016-10-18 22:27:16 -05:00
Ben Sander 261ff423e1 Add hipDeviceSchedule* support to queue wait
Change-Id: Iffa7a356500b026f3737c3f5719ca9f62b10d855
2016-10-18 22:27:16 -05:00
Ben Sander d21d3ec222 Remove some TODO items
Change-Id: I7e9de2e43a8584f8dc9ee6d45c8ed00ca465f591
2016-10-18 22:27:16 -05:00
Ben Sander 9315ac1a29 Move some internal headers from "include/hip/" to src.
Change-Id: I7041bd5c803d9318979f4a7c1d658445c614691e
2016-10-18 22:27:16 -05:00
Maneesh Gupta 8471682f26 src/*: Update copyright header
Change-Id: I455f5d0d12fe9cb39a3ba873bd22b4c25ed07cbf
2016-10-15 22:55:22 +05:30
Ben Sander c54220eca9 Cleanup files from code review.
- Remove some stale code
- Update docs
- Correct define for __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__

Change-Id: Ic5e3cdb8269b1c18f6d2693700b55e08c4d0080e
2016-10-15 11:51:20 -05:00
Ben Sander 50e0a363ce Add code to use new HCC API accelerator_view::dispatch_hsa_kernel.
Disabed by default, can enable with USE_DISPATCH_HSA_KERNEL=1

Change-Id: I7a6ba76f2bada34952ed47f5335ce695fa2faea5
2016-10-14 23:46:29 -05:00
Ben Sander 89012201c9 Fix HIP_USE_PRODUCT_NAME detection.
Change-Id: I6879ec3a11845bea66a18a9328bd4eaf54713420
2016-10-13 11:51:53 -05:00
pensun b70409f3ad Add ifdef guard for the feature requires ROCm1.3
Change-Id: I7154517c47000c37fe5eb09a3c1cf2a9aacbe27c
2016-10-13 10:57:31 -05:00
Aditya Atluri 237837d9bd added constant memory property to 16KB
Change-Id: If067b4057c2e3fc0c26cf4604a1d4fac7f139b12
2016-10-13 10:47:40 -05:00
pensun c3f375327f Change to query device name using HSA_AMD_AGENT_INFO_PRODUCT_NAME;
Note: this commit depends on ROCR runtime in ROCm 1.3 release.

Change-Id: I90385ef6d11ee8a1e8adae1d3fdf21747347544c
2016-10-12 20:01:30 -05:00
Aditya Atluri 62ec53740c Added hipDeviceGetLimit api
1. hipDeviceGetLimit API for HCC path is added
2. Test for hipDeviceGetLimit API is added
3. The feature added only supports querying heap size
4. Corrected indents for malloc and free device functions
5. Removed redundant data structures
6. Added g_heap_malloc_size to store the heap size

Change-Id: If48d1b0ce9270e994f1c542cc283ddbb14746bbb
2016-10-12 19:58:48 -05:00
Aditya Atluri d24a7ef12b added malloc and free device functions
1. Added malloc and free device functions
2. Added test which check malloc and free functions
TODO: Need to add support for multiple device. Works only on one device (multi device support id NOT available).

Change-Id: Id11fc36463915d6ad46c264d5a20c8feb2d2c17c
2016-10-12 19:08:34 -05:00
Ben Sander dee364cb08 Add DISABLE_COPY_EXT option. 2016-10-05 12:18:42 -05:00
Ben Sander 821080487a Add HIP_BLOCKING_SYNC environment var to control stream sync behavior. 2016-10-05 12:18:42 -05:00
Maneesh Gupta b951cc99ed Move include/* to include/hip/*
Change-Id: I7a7b2839b4df59c7a4c503550f99fdc9e45c0f54
2016-10-04 22:17:18 +05:30
Ben Sander 4ff6dc8f38 Refactor asyncCopy and syncCopy to fix deadlock case.
- Minimize time that locks are held.
- Eliminate copy code that locked stream and ctx at same time.
    - Stream was locked to ensure thread-safe enqueue to the queue.
    - Devices were locked to query peer-lists.

Change-Id: Ibe8880bb7fb995a3da8f90ff911f212d81525018
2016-09-27 15:45:40 -05:00
Ben Sander 6de9136002 Add debug option to print ThreadID with each message.
Also print messages with single fprintf to prevents threads from
interleaving.

Change-Id: Ib3999fe6b1e67b4a16cd7dcde82f3dfc99dd48ff
2016-09-27 15:45:40 -05:00
Ben Sander 225e37fdc9 Fix signal resource issue.
Remove memory leak with new hc::completion_future.
Implement HIP_LAUNCH_BLOCKING with queue-level wait.

Change-Id: I45975f81c4d239fdeed7776970988d28449865dc
2016-09-26 16:47:32 -05:00