Files
rocm-systems/projects/hip
Siu Chi Chan d0252dfa79 migrate program_state logic from header into shared library (phase I) (#1077)
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""

This reverts commit 62e96cb4cf.

* Revert "Use COMgr to read Kernel Args Metadata (#1006)"

This reverts commit 882006555b.

* Revert "improve program state commentary"

This reverts commit fb2beb0c88.

* Revert "load program state once per agent"

This reverts commit 21f5e142f5.

* start moving function_names() into the hip shared lib

* start moving code_object_blobs to a new "state" object

* Consolidate various program state related static objects into a
single program_state object

* minor clean up

* move more stuffs from functional_grid_launch into program_state

* debug make_kernarg

* moving lookup for kernargs size_align into program_state

* clean up old code for kernarg size and alignment

* update hip_module to use newer api in program_state

* Create public member functions for program_state

* move most program state functions into shared library

* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)

* turning free functions in program state into members of program_state_impl

* change the free function globals() into a member of program_state_impl

* replace the static mutex used for populating globals

* moving associate_code_object_symbols_with_host_allocation into
program_state_impl

* move load_code_object_and_freeze_executable into program_state_impl

* moving executables and functions_names into program_state_impl

* moving kernels() into program_state_impl

* moving functions() into program_state_impl

* move get_kernargs into program_state_impl

* moving kernel_descriptor into program_state_impl

* moving kernargs_size_align calculation into program_state_impl

* Changing the handle to program_state_impl to a pointer

* moving program_state_impl into a separate inline source file

* fixing/cleaning up some header file includes

* moving member function for kernargs_size_align into program_state.cpp

* moving Kernel_descriptor into program_state.inl

* add a new class to manage agent globals

* moving all agent globals processing functions into agent_globals_impl

* load program state once per agent

re-merging PR991 against other program state changes

* fix per-agent program state member initialization

* cache executables based on elf name, isa, and agent.

This avoids program state reloading executables after a shared library is dlopened.

re-merging PR1057 against other program state changes

* protect executables cache by a global mutex

* return ref to executables cache

* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime


[ROCm/hip commit: f5eb91d53d]
2019-05-12 19:24:03 +05:30
..
2018-08-28 12:36:54 +05:30
2019-03-06 17:31:44 +05:30
2019-03-15 04:08:03 +05:30
2017-01-13 14:42:37 +05:30
2019-03-30 08:29:08 -04:00
2018-09-17 13:25:12 +02:00
2019-05-08 12:07:33 +05:30
2016-01-26 20:14:33 -06:00
2019-01-10 09:45:23 +01:00

What is this repository for?

HIP allows developers to convert CUDA code to portable C++. The same source code can be compiled to run on NVIDIA or AMD GPUs. Key features include:

  • HIP is very thin and has little or no performance impact over coding directly in CUDA or hcc "HC" mode.
  • HIP allows coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, and more.
  • HIP allows developers to use the "best" development environment and tools on each target platform.
  • The "hipify" tool automatically converts source from CUDA to HIP.
  • Developers can specialize for the platform (CUDA or hcc) to tune for performance or handle tricky cases

New projects can be developed directly in the portable HIP C++ language and can run on either NVIDIA or AMD platforms. Additionally, HIP provides porting tools which make it easy to port existing CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.

Repository branches:

The HIP repository maintains several branches. The branches that are of importance are:

  • master branch: This is the stable branch. All stable releases are based on this branch.
  • developer-preview branch: This is the branch were the new features still under development are visible. While this maybe of interest to many, it should be noted that this branch and the features under development might not be stable.

Release tagging:

HIP releases are typically of two types. The tag naming convention is different for both types of releases to help differentiate them.

  • release_x.yy.zzzz: These are the stable releases based on the master branch. This type of release is typically made once a month.
  • preview_x.yy.zzzz: These denote pre-release code and are based on the developer-preview branch. This type of release is typically made once a week.

More Info:

How do I get set up?

See the Installation notes.

Simple Example

The HIP API includes functions such as hipMalloc, hipMemcpy, and hipFree.
Programmers familiar with CUDA will also be able to quickly learn and start coding with the HIP API. Compute kernels are launched with the "hipLaunchKernel" macro call. Here is simple example showing a snippet of HIP API code:

hipMalloc(&A_d, Nbytes));
hipMalloc(&C_d, Nbytes));

hipMemcpy(A_d, A_h, Nbytes, hipMemcpyHostToDevice);

const unsigned blocks = 512;
const unsigned threadsPerBlock = 256;
hipLaunchKernel(vector_square,   /* compute kernel*/
                dim3(blocks), dim3(threadsPerBlock), 0/*dynamic shared*/, 0/*stream*/,     /* launch config*/
                C_d, A_d, N);  /* arguments to the compute kernel */

hipMemcpy(C_h, C_d, Nbytes, hipMemcpyDeviceToHost); 

The HIP kernel language defines builtins for determining grid and block coordinates, math functions, short vectors, atomics, and timer functions. It also specifies additional defines and keywords for function types, address spaces, and optimization controls. (See the HIP Kernel Language for a full description). Here's an example of defining a simple 'vector_square' kernel.

template <typename T>
__global__ void
vector_square(T *C_d, const T *A_d, size_t N)
{
    size_t offset = (hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x);
    size_t stride = hipBlockDim_x * hipGridDim_x ;

    for (size_t i=offset; i<N; i+=stride) {
        C_d[i] = A_d[i] * A_d[i];
    }
}

The HIP Runtime API code and compute kernel definition can exist in the same source file - HIP takes care of generating host and device code appropriately.

HIP Portability and Compiler Technology

HIP C++ code can be compiled with either :

  • On the NVIDIA CUDA platform, HIP provides header file which translate from the HIP runtime APIs to CUDA runtime APIs. The header file contains mostly inlined functions and thus has very low overhead - developers coding in HIP should expect the same performance as coding in native CUDA. The code is then compiled with nvcc, the standard C++ compiler provided with the CUDA SDK. Developers can use any tools supported by the CUDA SDK including the CUDA profiler and debugger.
  • On the AMD ROCm platform, HIP provides a header and runtime library built on top of hcc compiler. The HIP runtime implements HIP streams, events, and memory APIs, and is a object library that is linked with the application. The source code for all headers and the library implementation is available on GitHub.
    HIP developers on ROCm can use AMD's CodeXL for debugging and profiling.

Thus HIP source code can be compiled to run on either platform. Platform-specific features can be isolated to a specific platform using conditional compilation. Thus HIP provides source portability to either platform. HIP provides the hipcc compiler driver which will call the appropriate toolchain depending on the desired platform.

Examples and Getting Started:

  • A sample and blog that uses hipify to convert a simple app from CUDA to HIP:
cd samples/01_Intro/square
# follow README / blog steps to hipify the application.
  • A sample and blog demonstrating platform specialization:
cd samples/01_Intro/bit_extract
make

More Examples

The GitHub repository HIP-Examples contains a hipified version of the popular Rodinia benchmark suite. The README with the procedures and tips the team used during this porting effort is here: Rodinia Porting Guide

Tour of the HIP Directories

  • include:

    • hip_runtime_api.h : Defines HIP runtime APIs and can be compiled with many standard Linux compilers (hcc, GCC, ICC, CLANG, etc), in either C or C++ mode.
    • hip_runtime.h : Includes everything in hip_runtime_api.h PLUS hipLaunchKernel and syntax for writing device kernels and device functions. hip_runtime.h can only be compiled with hcc.
    • hcc_detail/** , nvcc_detail/** : Implementation details for specific platforms. HIP applications should not include these files directly.
    • hcc.h : Includes interop APIs for HIP and HCC
  • bin: Tools and scripts to help with hip porting

    • hipify : Tool to convert CUDA code to portable CPP. Converts CUDA APIs and kernel builtins.
    • hipcc : Compiler driver that can be used to replace nvcc in existing CUDA code. hipcc will call nvcc or hcc depending on platform, and include appropriate platform-specific headers and libraries.
    • hipconfig : Print HIP configuration (HIP_PATH, HIP_PLATFORM, CXX config flags, etc)
    • hipexamine.sh : Script to scan directory, find all code, and report statistics on how much can be ported with HIP (and identify likely features not yet supported)
  • doc: Documentation - markdown and doxygen info

Reporting an issue

Use the GitHub issue tracker. If reporting a bug, include the output of "hipconfig --full" and samples/1_hipInfo/hipInfo (if possible).