Commit Graph

167 Commits

Author SHA1 Message Date
Aditya Atluri 2e754d27dc Signal Fix: Changed global signal count to per stream signal count
1. The number of kernels that can use signals are increased to 128
2. The kernel count is now specific to the stream

Change-Id: Ie6d1aa3f437aad8f08c3333fe48bd3f46e551e60
2016-07-26 14:03:51 -05:00
Aditya Atluri 524127b4a4 removed redundant signal destroy
Change-Id: Icf0cd76b2620d34c87cfb6c7a83049087c0a0bc4
2016-07-26 13:35:35 -05:00
Aditya Atluri 0232e6bbb4 Added re-fix for memcpy kernel sync
1. The patch uses HIP signal pools to sync between copy and kernel commands
2. The hsa_signal_create is removed
3. Left the redundant enqueueBarrier method just in case

Change-Id: I3dff3e8ee57fff3cd49bec802ff735ed128e5ca1
2016-07-26 09:22:59 -05:00
Rahul Garg d11d65d401 D2H and H2D unpinned memory transfer support
Change-Id: If6d6c970f435e5d917d5cc6cddc2ee2918cd1c37

Conflicts:
	src/hip_hcc.cpp
2016-07-25 14:36:07 +05:30
Aditya Atluri 1704006bed Partial fix async after kernel launch signal issue
Change-Id: Ib48d6564379160035bded9493b93663fba361710
2016-07-23 14:54:20 -05:00
Maneesh Gupta b485470819 Replace calls to ihipInit with use of HIP_INIT_API macro
Change-Id: Iabf7df79f0238a8ddffea4607fe945df36642850
2016-07-22 15:46:55 +05:30
Maneesh Gupta dffed956fb Fix using ATP markers
Change-Id: If2d04f80b580237426c569737551e2001a8cd35a
2016-07-21 16:02:51 +05:30
Aditya Atluri 77d7134619 added fix for signal overflow in kernels
Change-Id: Ie0b1f97f69b7d7b34e445f6f120472819be03a0e
2016-07-19 13:51:44 -05:00
Fan Cao dc0a787984 Replace GPU agent with CPU agent properly for memory async copy API
ihipStream_t::copySync use GPU agent in memory async copy API, even
if the src/dst memory does not belong to GPU, which cause the hsa
runtime to choose a slower copy engine.

SWDEV-95191

Change-Id: If3cab3d493c0c96ed63721cdcf28247a1193887c
2016-06-30 18:23:29 +05:30
Maneesh Gupta dca8fca8eb Merge branch 'amd-master' into amd-develop 2016-06-24 21:13:11 +05:30
Rahul Garg 226aa917e7 Included code to calculate value of maxThreadsPerMultiprocessor property
Change-Id: Ie7cad7442f36a7163e715048de5a309febc28664
2016-06-24 15:10:11 +05:30
Ben Sander e27b5cc927 Grid-launch updates to 2.0 and cleanup of old.
_ Use fields from GRID_LAUNCH_20 structure
  (See USE_GRID_LAUNCH_20 define, currently set to 0)
  "1" will require HCC support.
- Remove old DISABLE_GRID_LAUNCH support.

Change-Id: I584ce648d217251789a6283cf27feb24cb7dc8d1
2016-06-21 23:24:38 -05:00
Maneesh Gupta 2d50e4b9e0 default value of uninitialized dim3 elements should be 1
Change-Id: Idff38fac8dfca68f38f1714f8fdec64df2890a6a
2016-06-20 10:13:46 +05:30
Aditya Atluri ffcfc95360 able to pass non-dim launch parm to kernel launch
Change-Id: I0411849a27efcba597a1a9aa08be179635e04988
2016-06-18 11:28:20 -05:00
Ben Sander 6a2a140f34 NVCC improvements.
- Complete translation tables for cudaError <-> hipError_t.
- Remove some odd errors that were not correctly translated or not used.
- Add HIPCHECK_API to test infrastructure.  Used for negative testing
  an API ; if a mismatch occurs it shows the expected return error
  code.  Can also print a warning rather than error.
- Enable hipMemoryAllocate on NV system, and review error coded.
- Add hipErrorName to nvcc.

Change-Id: I680427dcf32a5796d5913cf9e7f3b4c6f6b91599

Conflicts:
	tests/src/CMakeLists.txt

Bug fixes and improved docs for hipFree and hipHostFree.

    - Passing NULL pointer initialized runtime and return hipSuccess
      (not an error like before).
    - add negative test for this. (hipMemoryAllocate, improved)
    - Match NVCC errors for invalid pointers, add to test.
    - Update hipFree and hipHostFree docs.
    - hipGetDevicePointer always set *devicePointer=NULL, even for
      invalid flags.
    - Gate shared memory usage on specific HCC work-week.

Change-Id: I533b4fd3280a3d6cdbf05eb768976f0c7506c012
2016-06-16 06:13:51 +05:30
Ben Sander 20043d602e Merge branch 'privatestaging' into grid_launch 2016-05-02 18:38:20 -05:00
Aditya Atluri ec8cedc70e changed to guard from hc.hpp 2016-04-27 17:46:27 -05:00
bwicakso 4aca1babe8 Merge remote-tracking branch 'refs/remotes/origin/privatestaging' into kernel_synchronization 2016-04-25 13:57:28 -05:00
Maneesh Gupta 02e6fc27f4 Merge branch 'release_0.84.00' into privatestaging
Conflicts:
	include/hcc_detail/hip_runtime.h
	src/hip_hcc.cpp
2016-04-22 10:55:58 +05:30
bwicakso 6773a64b22 Fix for kernel synchronization
The completion future of a particular kernel is lost if there are
multiple kernels in the stream. This can cause a racing condition where
the signal associated with the unreferenced completion_future might get
released by hcc runtime.
2016-04-20 15:51:39 -05:00
Aditya Atluri 620c5c64e6 added support pinned dma memcpy between host and device 2016-04-20 14:21:22 -05:00
Aditya Atluri b493eac7e0 added support for __ldg 2016-04-20 12:25:40 -05:00
Ben Sander ccf2c1c323 Fix hipDeviceReset synchronization 2016-04-19 11:56:12 -05:00
Ben Sander b3c2d906db Set chicken bits to 0. 2016-04-19 11:56:12 -05:00
Ben Sander 1ac93489b9 Merge branch 'privatestaging' of https://github.com/AMDComputeLibraries/HIP-privatestaging into privatestaging 2016-04-18 21:51:13 -05:00
Ben Sander 6abfa13c34 Fixes for P2P and hipDeviceReset
- devicereset would lose track of default stream and thus subsequent
  synchronization calls might not actually sychronize.
- Also deviceReset now correctly frees streams.
- fix waits in P2P staging copy - first phase (Device0-to-Staging) must
  wait for second phase (Staging to Device1) to finish draining the
  buffer.
2016-04-18 20:49:33 -05:00
Aditya Atluri 40377fe5b1 Update hip_hcc.cpp 2016-04-18 11:36:51 -05:00
Ben Sander f9a31e28ad Move HIP_HCC define to CMake 2016-04-17 07:40:04 -05:00
Ben Sander 8d26dfcde3 Merge branch 'privatestaging' into p2p
Conflicts:
	include/hcc_detail/hip_hcc.h
	src/hip_hcc.cpp
2016-04-17 06:46:52 -05:00
Aditya Atluri dc61929a3d Corrected Memcpydefault 2016-04-16 17:10:13 -05:00
Ben Sander c3bd85595d P2P Update.
- add P2P staging buffer copy.
- If copy device does not have sufficient access permissions, fall back
  to staging buffer.
- improve docs for which copy device is used.
2016-04-16 10:18:56 -05:00
Ben Sander 1cc0ea86a1 Merge branch 'p2p' of https://github.com/AMDComputeLibraries/HIP-privatestaging into p2p
Conflicts:
	RELEASE.md
	include/hcc_detail/hip_hcc.h
	samples/1_Utils/hipInfo/hipInfo.cpp
	src/hip_hcc.cpp
	src/hip_peer.cpp
2016-04-11 09:17:27 -05:00
Ben Sander 5af4c901c6 P2p checkpoint.
- set USE_PEER_TO_PEER=3 (requires HCC "am_memtracker_update_peers")
- when enabling peer, turn it on for previously allocated memory.
- hipDeviceCanAccessPeer is no longer self-ware (self does not qualify
  as a peer)
- device peerlist always includes self, so when we call allow_access
  we never remove self access.
- hipDeviceReset() removes old peer mappings.
2016-04-11 12:52:18 -05:00
Ben Sander efffb0ed86 Clean up disable.
Add USE_HCC_LOCK (disabled)
Disable USE_PEER_TO_PEER.
2016-04-11 09:09:36 -05:00
Ben Sander 9e7efd7c65 P2p checkpoint.
- set USE_PEER_TO_PEER=3 (requires HCC "am_memtracker_update_peers")
- when enabling peer, turn it on for previously allocated memory.
- hipDeviceCanAccessPeer is no longer self-ware (self does not qualify
  as a peer)
- device peerlist always includes self, so when we call allow_access
  we never remove self access.
- hipDeviceReset() removes old peer mappings.
2016-04-11 07:58:59 -05:00
Ben Sander 173cff4c1e fix bugs in P2P implementation
- addPeers polarity reversed, would never add.
- check allow_access return value, pipe error to hipMalloc.
2016-04-11 07:58:58 -05:00
Ben Sander 97772d6363 For P2P, use the peer list when allocating Device memory or pinned host.
Each new allocation is automatically mapped into the address space of
all enabled peers.
2016-04-11 07:58:58 -05:00
Ben Sander e2d19d7f7a P2P checkpoint.
Maintain enabled peer tables for each device.
2016-04-11 07:58:58 -05:00
Ben Sander 4400875dda Checkpoint initial peer2peer implementation. 2016-04-11 07:58:58 -05:00
Ben Sander 7ca06d2fb1 fix bugs in P2P implementation
- addPeers polarity reversed, would never add.
- check allow_access return value, pipe error to hipMalloc.
2016-04-09 04:11:31 -05:00
pensun 45ed17ce2e clean up unused comments 2016-04-07 09:46:00 -05:00
Ben Sander 288682ccb3 For P2P, use the peer list when allocating Device memory or pinned host.
Each new allocation is automatically mapped into the address space of
all enabled peers.
2016-04-06 16:44:31 -05:00
Ben Sander 6a182ce788 P2P checkpoint.
Maintain enabled peer tables for each device.
2016-04-06 15:50:47 -05:00
Ben Sander db91890f53 Checkpoint initial peer2peer implementation. 2016-04-06 15:50:47 -05:00
Ben Sander e22925be22 Add runtime switch to control HIP_ATP_MARKER
Only generate the function strings if requested at
compile-time && runtime.
2016-03-29 17:27:30 -05:00
Ben Sander 1b2ab173c1 Tweak thread-safe implementation.
introduce LockedAccessor option so destructor does not unlock.
Allows locks to exist across function boundaries, required
for hipLaunchKernel macro which has several unusual requirements.
(including C comppatibility, must use variadic macro, more).
2016-03-28 21:41:47 -05:00
Ben Sander 6cab7862ae Stream thread-safe checkpoint.
Moving data structures to critical / protected section.
2016-03-28 09:46:40 -05:00
Ben Sander ecd56e1400 Stream thread-safe checkpoint. 2016-03-28 04:22:20 -05:00
Ben Sander c47b5b04ef Protect _stream_id as well.
- move lockedaccessor
- clean up device class.
- add simple ihipDevice constructor.
2016-03-26 11:45:25 -05:00
Ben Sander 4dd77c6612 Make ihipDevice_t thread-safe.
Move critical data into separate class and protect with LockAccessor
wrapper class.

For device, the streams list is the critical data since it is modified when
streams are created or destroyed.   The streams list is accessed in
several places including when synchronizing across all streams on the
device (ie from the default stream).
Other device data is set once by the device cosntructor and is not critical
so

All functions which acquire the LockAccessor now named with "locked_" prefix.
2016-03-26 10:46:20 -05:00