From 6eb1d2d52f2e14fc9e020eac11676b1920495cf5 Mon Sep 17 00:00:00 2001 From: Sandeep Kumar Date: Wed, 13 Sep 2017 12:57:37 +0530 Subject: [PATCH] Add more info for inline asm in hip kernel guide and cookbook readme --- hipamd/docs/markdown/hip_kernel_language.md | 12 +++++++++++- hipamd/samples/2_Cookbook/10_inline_asm/Readme.md | 15 ++++++++++++++- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/hipamd/docs/markdown/hip_kernel_language.md b/hipamd/docs/markdown/hip_kernel_language.md index a4f4d5d47f..094d7531e8 100644 --- a/hipamd/docs/markdown/hip_kernel_language.md +++ b/hipamd/docs/markdown/hip_kernel_language.md @@ -699,8 +699,18 @@ for (int i=0; i<16; i++) ... ## In-Line Assembly -In-line assembly, including in-line PTX, in-line HSAIL and in-line GCN ISA, is not supported. Users who need these features should employ conditional compilation to provide different functionally equivalent implementations on each target platform. +GCN ISA In-line assembly, is supported. For example: +``` +asm volatile ("v_mac_f32_e32 %0, %2, %3" : "=v" (out[i]) : "0"(out[i]), "v" (a), "v" (in[i])); +``` + +We insert the GCN isa into the kernel using `asm()` Assembler statement. +`volatile` keyword is used so that the optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. +`v_mac_f32_e32` is the GCN instruction, for more information please refer - [AMD GCN3 ISA architecture manual](http://gpuopen.com/compute-product/amd-gcn3-isa-architecture-manual/) +Index for the respective operand in the ordered fashion is provided by `%` followed by position in the list of operands +`"v"` is the constraint code (for target-specific AMDGPU) for 32-bit VGPR register, for more info please refer - [Supported Constraint Code List for AMDGPU](https://llvm.org/docs/LangRef.html#supported-constraint-code-list) +Output Constraints are specified by an `"="` prefix as shown above ("=v"). This indicate that assemby will write to this operand, and the operand will then be made available as a return value of the asm expression. Input constraints do not have a prefix - just the constraint code. The constraint string of `"0"` says to use the assigned register for output as an input as well (it being the 0'th constraint). ## C++ Support The following C++ features are not supported: diff --git a/hipamd/samples/2_Cookbook/10_inline_asm/Readme.md b/hipamd/samples/2_Cookbook/10_inline_asm/Readme.md index 0e64fe9c6e..7d0301bc74 100644 --- a/hipamd/samples/2_Cookbook/10_inline_asm/Readme.md +++ b/hipamd/samples/2_Cookbook/10_inline_asm/Readme.md @@ -27,10 +27,23 @@ We will be using the Simple Matrix Transpose application from the our very first ## asm() Assembler statement -We insert the GCN isa into the kernel using asm() Assembler statement. In the same sourcecode, we used for MatrixTranspose. We'll add the following: +In the same sourcecode, we used for MatrixTranspose. We'll add the following: ` asm volatile ("v_mov_b32_e32 %0, %1" : "=v" (out[x*width + y]) : "v" (in[y*width + x])); ` +GCN ISA In-line assembly, is supported. For example: + +``` +asm volatile ("v_mac_f32_e32 %0, %2, %3" : "=v" (out[i]) : "0"(out[i]), "v" (a), "v" (in[i])); +``` + +We insert the GCN isa into the kernel using `asm()` Assembler statement. +`volatile` keyword is used so that the optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. +`v_mac_f32_e32` is the GCN instruction, for more information please refer - [AMD GCN3 ISA architecture manual](http://gpuopen.com/compute-product/amd-gcn3-isa-architecture-manual/) +Index for the respective operand in the ordered fashion is provided by `%` followed by position in the list of operands +`"v"` is the constraint code (for target-specific AMDGPU) for 32-bit VGPR register, for more info please refer - [Supported Constraint Code List for AMDGPU](https://llvm.org/docs/LangRef.html#supported-constraint-code-list) +Output Constraints are specified by an `"="` prefix as shown above ("=v"). This indicate that assemby will write to this operand, and the operand will then be made available as a return value of the asm expression. Input constraints do not have a prefix - just the constraint code. The constraint string of `"0"` says to use the assigned register for output as an input as well (it being the 0'th constraint). + ## How to build and run: Use the make command and execute it using ./exe Use hipcc to build the application, which is using hcc on AMD and nvcc on nvidia.