adding API specification
Αυτή η υποβολή περιλαμβάνεται σε:
@@ -0,0 +1,683 @@
|
||||
# ROC Profiler Library Specification
|
||||
```
|
||||
The rocTracer API is agnostic to specific runtime and may trace
|
||||
the runtime API calls and asynchronous GPU activity.
|
||||
```
|
||||
## 1. High level overview
|
||||
```
|
||||
The goal of the implementation is to provide a runtime independent API
|
||||
for tracing of runtime calls and asynchronous activity, like GPU kernel
|
||||
dispatches and memory moves. The tracing includes callback API for
|
||||
runtime API tracing and activity API for asynchronous activity records
|
||||
logging.
|
||||
Depending on particular runtime intercepting mechanism, the rocTracer
|
||||
library can be dynamically linked, dynamically loaded by the runtime as
|
||||
a plugin or some API wrapper can be loaded using LD_PRELOAD.
|
||||
The library has a C API.
|
||||
|
||||
The rocTracer library is an API that intercepts runtime API calls and
|
||||
traces asynchronous activity. The activity tracing results are recorded
|
||||
in a ring buffer.
|
||||
```
|
||||
## 2. General API
|
||||
### 2.1. Description
|
||||
```
|
||||
The library supports method for getting the error number and error string
|
||||
of the last failed library API call. It allows to check the conformance
|
||||
of used library API header and the library binary, the version macros and
|
||||
API methods can be used.
|
||||
Returning the error and error string methods:
|
||||
• roctracer_status_t – error code enumeration
|
||||
• roctracer_error_string – method for returning the error string
|
||||
Library version:
|
||||
• ROCTRACER_VERSION_MAJOR – API major version macro
|
||||
• ROCTRACER_VERSION_MINOR – API minor version macro
|
||||
• roctracer_version_major – library major version
|
||||
• roctracer_version_minor – library minor version
|
||||
```
|
||||
### 2.2. Error codes and error string methods
|
||||
```
|
||||
Error code enumeration
|
||||
typedef enum {
|
||||
ROCTRACER_STATUS_SUCCESS = 0,
|
||||
ROCTRACER_STATUS_ERROR = 1,
|
||||
ROCTRACER_STATUS_UNINIT = 2,
|
||||
ROCTRACER_STATUS_BREAK = 3,
|
||||
ROCTRACER_STATUS_BAD_DOMAIN = 4,
|
||||
ROCTRACER_STATUS_BAD_PARAMETER = 5,
|
||||
ROCTRACER_STATUS_HIP_API_ERR = 6,
|
||||
ROCTRACER_STATUS_HCC_OPS_ERR = 7,
|
||||
ROCTRACER_STATUS_ROCTX_ERR = 8,
|
||||
} roctracer_status_t;
|
||||
|
||||
Return error string:
|
||||
const char* roctracer_error_string();
|
||||
```
|
||||
### 2.3. Library version
|
||||
```
|
||||
The library provides major and minor versions. Major version is for
|
||||
incompatible API changes and minor version for bug fixes.
|
||||
API version macros defined in the library API header ‘roctracer.h’:
|
||||
ROCTRACER_VERSION_MAJOR
|
||||
ROCTRACER_VERSION_MINOR
|
||||
|
||||
Methods to check library major and minor venison:
|
||||
uint32_t roctracer_major_version();
|
||||
uint32_t roctracer_minor_version();
|
||||
```
|
||||
## 3. Frontend API
|
||||
### 3.1. Description
|
||||
```
|
||||
The rocTracer provides support for runtime API callbacks and activity
|
||||
records logging. The APIs of different runtimes at different levels
|
||||
are considered as different API domains with assigned domain IDs. For
|
||||
example, language level and driver level. The API callbacks provide
|
||||
the API calls arguments and are called on two phases on “enter” and
|
||||
on “exit”. The activity records are logged to the ring buffer and can
|
||||
be associated with the respective API calls using the correlation ID.
|
||||
Activity API can be used to enable collecting of the records with
|
||||
timestamping data for API calls and asynchronous activity like the
|
||||
kernel submits, memory copies and barriers
|
||||
Tracing domains:
|
||||
• roctracer_domain_t – runtime API domains, HIP, HSA, etc…
|
||||
• roctracer_op_string – Return Op string by given domain and
|
||||
activity Op code
|
||||
• roctracer_op_code – Return Op code and kind by given string
|
||||
|
||||
Callback API:
|
||||
• roctracer_rtapi_callback_t – runtime API callback type
|
||||
• roctracer_enable_op_callback – enable runtime API callback
|
||||
by domain and Op code
|
||||
• roctracer_enable_domain_callback – enable runtime API callback
|
||||
by domain for all Ops
|
||||
• roctracer_enable_callback – enable runtime API callback for
|
||||
all domains, all Ops
|
||||
• roctracer_disable_op_callback – disable runtime API callback
|
||||
by domain and Op code
|
||||
• roctracer_enable_op_callback – enable runtime API callback
|
||||
by domain for all Ops
|
||||
• roctracer_enable_op_callback – enable runtime API callback for
|
||||
all domains, all Ops
|
||||
|
||||
Activity API:
|
||||
• roctracer_record_t – activity record
|
||||
• roctracer_pool_t – records pool type
|
||||
• roctracer_allocator_t – tracer allocator type
|
||||
• roctracer_buffer_callback_t – pool callback type
|
||||
• roctracer_open_pool[_expl] – create records pool
|
||||
• roctracer_close_pool[_expl] – close records pool
|
||||
• roctracer_default_pool[_expl] – get/set default pool
|
||||
• roctracer_properties_t – tracer properties
|
||||
• roctracer_enable_op_activity[_expl] – enable activity records
|
||||
logging
|
||||
• roctracer_enable_domain_activity[_expl] – enable activity records
|
||||
logging
|
||||
• roctracer_enable_activity[_expl] – enable activity records logging
|
||||
• roctracer_disable_op_activity – disable activity records logging
|
||||
• roctracer_disable_domain_activity – disable activity records
|
||||
logging
|
||||
• roctracer_disable_activity – disable activity records logging
|
||||
• roctracer_flush_activity[_expl] – disable activity records logging
|
||||
• roctracer_next_record – return next record
|
||||
• roctracer_get_timestamp – return correlated GPU/CPU system
|
||||
timestamp
|
||||
```
|
||||
### 3.2. Tracing Domains
|
||||
```
|
||||
Various tracing domains are supported. Each domain is assigned with
|
||||
a domain ID. The domains include HSA, HIP, and HCC runtime levels.
|
||||
Traced API domains:
|
||||
typedef enum {
|
||||
ACTIVITY_DOMAIN_HSA_API = 0, // HSA API domain
|
||||
ACTIVITY_DOMAIN_HSA_OPS = 1, // HSA async activity domain
|
||||
ACTIVITY_DOMAIN_HIP_API = 2, // HIP API domain
|
||||
ACTIVITY_DOMAIN_HIP_OPS = 3, // HIP async activity domain
|
||||
ACTIVITY_DOMAIN_KFD_API = 4, // KFD API domain
|
||||
ACTIVITY_DOMAIN_EXT_API = 5, // External ID domain
|
||||
ACTIVITY_DOMAIN_ROCTX = 6, // ROCTX domain
|
||||
ACTIVITY_DOMAIN_NUMBER = 7
|
||||
} activity_domain_t;
|
||||
|
||||
Return name by given domain and Op code:
|
||||
const char* roctracer_op_string( // NULL returned on error and error number
|
||||
// is set
|
||||
uint32_t domain, // tracing domain
|
||||
uint32_t op, // activity op code
|
||||
uint32_t kind); // activity kind
|
||||
Return Op code and kind by given string:
|
||||
roctracer_status_t roctracer_op_code(
|
||||
uint32_t domain, // tracing domain
|
||||
const char* str, // [in] op string
|
||||
uint32_t* op, // [out] op code
|
||||
uint32_t* kind); // [out] op kind code if not NULL
|
||||
```
|
||||
### 3.3. Callback API
|
||||
```
|
||||
The tracer provides support for runtime API callbacks and activity records
|
||||
logging. The API callbacks provide the API calls arguments and are called
|
||||
on two phases on “enter”, on “exit”.
|
||||
|
||||
API phase passed to the callbacks:
|
||||
typedef enum {
|
||||
ROCTRACER_API_PHASE_ENTER,
|
||||
ROCTRACER_API_PHASE_EXIT,
|
||||
} roctracer_api_phase_t;
|
||||
|
||||
Runtime API callback type:
|
||||
typedef void (*roctracer_rtapi_callback_t)(
|
||||
uint32_t domain, // runtime API domain
|
||||
uint32_t cid, // API call ID
|
||||
const void* data, // [in] callback data with correlation id and the call
|
||||
// arguments
|
||||
void* arg); // [in/out] user passed data
|
||||
|
||||
Enable runtime API callbacks:
|
||||
roctracer_status_t roctracer_enable_op_callback(
|
||||
activity_domain_t domain, // tracing domain
|
||||
uint32_t op, // API call ID
|
||||
activity_rtapi_callback_t callback, // callback function pointer
|
||||
void* arg); // [in/out] callback arg
|
||||
|
||||
roctracer_status_t roctracer_enable_domain_callback(
|
||||
activity_domain_t domain, // tracing domain
|
||||
activity_rtapi_callback_t callback, // callback function pointer
|
||||
void* arg); // [in/out] callback arg
|
||||
|
||||
|
||||
roctracer_status_t roctracer_enable_callback(
|
||||
activity_rtapi_callback_t callback, // callback function pointer
|
||||
void* arg); // [in/out] callback arg
|
||||
|
||||
Disable runtime API callbacks:
|
||||
roctracer_status_t roctracer_disable_op_callback(
|
||||
activity_domain_t domain, // tracing domain
|
||||
uint32_t op); // API call ID
|
||||
|
||||
roctracer_status_t roctracer_disable_domain_callback(
|
||||
activity_domain_t domain); // tracing domain
|
||||
|
||||
roctracer_status_t roctracer_disable_callback();
|
||||
```
|
||||
### 3.4 Activity API
|
||||
```
|
||||
The activity records are asynchronously logged to the pool and can be
|
||||
associated with the respective API callbacks using the correlation ID.
|
||||
Activity API can be used to enable collecting the records with
|
||||
timestamp data for API calls and GPU activity like kernel submits,
|
||||
memory copies, and barriers.
|
||||
|
||||
// Correlation id
|
||||
typedef uint64_t activity_correlation_id_t;
|
||||
|
||||
Activity record type:
|
||||
|
||||
// Activity record type
|
||||
struct activity_record_t {
|
||||
uint32_t domain; // activity domain id
|
||||
activity_kind_t kind; // activity kind
|
||||
activity_op_t op; // activity op
|
||||
activity_correlation_id_t correlation_id; // activity ID
|
||||
uint64_t begin_ns; // host begin timestamp
|
||||
uint64_t end_ns; // host end timestamp
|
||||
union {
|
||||
struct {
|
||||
int device_id; // device id
|
||||
uint64_t queue_id; // queue id
|
||||
};
|
||||
struct {
|
||||
uint32_t process_id; // device id
|
||||
uint32_t thread_id; // thread id
|
||||
};
|
||||
struct {
|
||||
activity_correlation_id_t external_id; // external correlation id
|
||||
};
|
||||
};
|
||||
size_t bytes; // data size bytes
|
||||
};
|
||||
|
||||
Return next record:
|
||||
static inline int roctracer_next_record(
|
||||
const activity_record_t* record, // [in] record ptr
|
||||
const activity_record_t** next); // [out] next record ptr
|
||||
|
||||
Tracer allocator type:
|
||||
typedef void (*roctracer_allocator_t)(
|
||||
char** ptr, // memory pointer
|
||||
size_t size, // memory size
|
||||
void* arg); // allocator arg
|
||||
|
||||
Pool callback type:
|
||||
typedef void (*roctracer_buffer_callback_t)(
|
||||
const char* begin, // [in] available buffered trace records
|
||||
const char* end, // [in] end of buffered trace records
|
||||
void* arg); // [in/out] callback arg
|
||||
|
||||
Tracer properties:
|
||||
typedef struct {
|
||||
uint32_t mode; // roctracer mode
|
||||
size_t buffer_size; // buffer size
|
||||
// power of 2
|
||||
roctracer_allocator_t alloc_fun; // memory allocator
|
||||
// function pointer
|
||||
void* alloc_arg; // memory allocator
|
||||
// function pointer
|
||||
roctracer_buffer_callback_t buffer_callback_fun; // tracer record
|
||||
// callback function
|
||||
void* buffer_callback_arg; // tracer record
|
||||
// callback arg
|
||||
} roctracer_properties_t;
|
||||
|
||||
Tracer memory pool handle type:
|
||||
typedef void roctracer_pool_t;
|
||||
|
||||
Create tracer memory pool:
|
||||
roctracer_status_t roctracer_open_pool(
|
||||
const roctracer_properties_t* properties); // tracer pool properties
|
||||
|
||||
roctracer_status_t roctracer_open_pool_expl(
|
||||
const roctracer_properties_t* properties, // tracer pool properties
|
||||
roctracer_pool_t** pool); // [out] returns tracer pool if
|
||||
// not NULL, otherwise sets the
|
||||
// default one if it is not set
|
||||
// yet; otherwise the error is
|
||||
// generated
|
||||
|
||||
Close tracer memory pool:
|
||||
roctracer_status_t roctracer_close_pool();
|
||||
|
||||
roctracer_status_t roctracer_close_pool_expl(
|
||||
roctracer_pool_t* pool); // memory pool, NULL means default pool
|
||||
|
||||
Return current default pool. Set new default pool if the argument is not NULL:
|
||||
roctracer_pool_t* roctracer_default_pool();
|
||||
|
||||
roctracer_pool_t* roctracer_default_pool_expl(
|
||||
roctracer_pool_t* pool); // new default pool if not NULL
|
||||
```
|
||||
Enable activity records logging:
|
||||
```
|
||||
roctracer_status_t roctracer_enable_op_activity(
|
||||
activity_domain_t domain, // tracing domain
|
||||
uint32_t op); // activity op ID
|
||||
|
||||
roctracer_status_t roctracer_enable_op_activity_expl(
|
||||
activity_domain_t domain, // tracing domain
|
||||
uint32_t op, // activity op ID
|
||||
roctracer_pool_t* pool); // memory pool, NULL means default pool
|
||||
|
||||
roctracer_status_t roctracer_enable_domain_activity(
|
||||
activity_domain_t domain); // tracing domain
|
||||
|
||||
roctracer_status_t roctracer_enable_domain_activity_expl(
|
||||
activity_domain_t domain, // tracing domain
|
||||
roctracer_pool_t* pool); // memory pool, NULL means default pool
|
||||
|
||||
roctracer_status_t roctracer_enable_activity();
|
||||
|
||||
roctracer_status_t roctracer_enable_activity_expl(
|
||||
roctracer_pool_t* pool); // memory pool, NULL means default pool
|
||||
|
||||
Disable activity records logging:
|
||||
roctracer_status_t roctracer_disable_op_activity(
|
||||
activity_domain_t domain, // tracing domain
|
||||
uint32_t op); // activity op ID
|
||||
|
||||
roctracer_status_t roctracer_disable_domain_activity(
|
||||
activity_domain_t domain); // tracing domain
|
||||
|
||||
roctracer_status_t roctracer_disable_activity();
|
||||
|
||||
Flush available activity records:
|
||||
roctracer_status_t roctracer_flush_activity();
|
||||
|
||||
roctracer_status_t roctracer_flush_activity_expl(
|
||||
roctracer_pool_t* pool); // memory pool, NULL means default pool
|
||||
|
||||
Return correlated GPU/CPU system timestamp:
|
||||
roctracer_status_t roctracer_get_timestamp(
|
||||
uint64_t* timestamp); // [out] return timestamp
|
||||
```
|
||||
## 4. rocTracer Usage Code Examples
|
||||
### 4.1. HIP API and HCC ops, GPU Activity Tracing
|
||||
```
|
||||
#include <inc/roctracer_hip.h>
|
||||
#include <inc/roctracer_hcc.h>
|
||||
|
||||
// HIP API callback function
|
||||
void hip_api_callback(
|
||||
uint32_t domain,
|
||||
uint32_t cid,
|
||||
const void* callback_data,
|
||||
void* arg)
|
||||
{
|
||||
(void)arg;
|
||||
const hip_api_data_t* data = reinterpret_cast <const hip_api_data_t*>
|
||||
(callback_data);
|
||||
fprintf(stdout, "<%s id(%u)\tcorrelation_id(%lu) %s> ",
|
||||
roctracer_id_string(ACTIVITY_DOMAIN_HIP_API, cid),
|
||||
cid,
|
||||
data->correlation_id,
|
||||
(data->phase == ACTIVITY_API_PHASE_ENTER) ? "on-enter" : "on-exit");
|
||||
<some code . . .>
|
||||
}
|
||||
|
||||
// Activity tracing callback
|
||||
void activity_callback(const char* begin, const char* end, void* arg) {
|
||||
const roctracer_record_t* record = reinterpret_cast<const
|
||||
roctracer_record_t*>(begin);
|
||||
const roctracer_record_t* end_record = reinterpret_cast<const
|
||||
roctracer_record_t*>(end);
|
||||
fprintf(stdout, "\tActivity records:\n");
|
||||
while (record < end_record) {
|
||||
const char * name = roctracer_op_string(record->domain,
|
||||
record->activity_id, 0);
|
||||
fprintf(stdout, "\t%s\tcorrelation_id(%lu) time_ns(%lu:%lu)
|
||||
device_id(%d) stream_id(%lu)\n",
|
||||
name,
|
||||
record->correlation_id,
|
||||
record->begin_ns,
|
||||
record->end_ns,
|
||||
record->device_id,
|
||||
record->stream_id
|
||||
);
|
||||
<some code . . .>
|
||||
ROCTRACER_CALL(roctracer_next_record(record, &record));
|
||||
}
|
||||
}
|
||||
|
||||
int main() {
|
||||
// Allocating tracing pool
|
||||
roctracer_properties_t properties{};
|
||||
properties.buffer_size = 12;
|
||||
properties.buffer_callback_fun = activity_callback;
|
||||
ROCTRACER_CALL(roctracer_open_pool(&properties));
|
||||
|
||||
// Enable HIP API callbacks. HIP_API_ID_ANY can be used to trace all HIP
|
||||
// API calls.
|
||||
ROCTRACER_CALL(roctracer_enable_op_callback(ACTIVITY_DOMAIN_HIP_API,
|
||||
HIP_API_ID_hipModuleLaunchKernel,
|
||||
hip_api_callback, NULL));
|
||||
ROCTRACER_CALL(roctracer_enable_op_acticity(ACTIVITY_DOMAIN_HIP_API,
|
||||
HIP_API_ID_hipModuleLaunchKernel));
|
||||
// Enable HIP kernel dispatch activity tracing
|
||||
ROCTRACER_CALL(roctracer_enable_op_activity(ACTIVITY_DOMAIN_HCC_OPS,
|
||||
hc::HSA_OP_ID_DISPATCH));
|
||||
|
||||
<test code>
|
||||
|
||||
// Disable tracing and closing the pool
|
||||
ROCTRACER_CALL(roctracer_disable_callback());
|
||||
ROCTRACER_CALL(roctracer_disable_activity());
|
||||
ROCTRACER_CALL(roctracer_close_pool());
|
||||
}
|
||||
```
|
||||
### 4.2. MatrixTranspose HIP sample with all APIs/activity tracing enabled
|
||||
```
|
||||
This shows a MatrixTranspose HIP sample with enabled tracing of
|
||||
all HIP API and all GPU asynchronous activity.
|
||||
/*
|
||||
Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in
|
||||
all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#include <iostream>
|
||||
|
||||
// hip header file
|
||||
#include <hip/hip_runtime.h>
|
||||
|
||||
#ifndef ITERATIONS
|
||||
# define ITERATIONS 100
|
||||
#endif
|
||||
#define WIDTH 1024
|
||||
|
||||
|
||||
#define NUM (WIDTH * WIDTH)
|
||||
|
||||
#define THREADS_PER_BLOCK_X 4
|
||||
#define THREADS_PER_BLOCK_Y 4
|
||||
#define THREADS_PER_BLOCK_Z 1
|
||||
|
||||
// Device (Kernel) function, it must be void
|
||||
// hipLaunchParm provides the execution configuration
|
||||
__global__ void matrixTranspose(hipLaunchParm lp, float* out, float* in,
|
||||
const int width) {
|
||||
int x = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x;
|
||||
int y = hipBlockDim_y * hipBlockIdx_y + hipThreadIdx_y;
|
||||
|
||||
out[y * width + x] = in[x * width + y];
|
||||
}
|
||||
|
||||
// CPU implementation of matrix transpose
|
||||
void matrixTransposeCPUReference(float* output, float* input, const unsigned
|
||||
int width) {
|
||||
for (unsigned int j = 0; j < width; j++) {
|
||||
for (unsigned int i = 0; i < width; i++) {
|
||||
output[i * width + j] = input[j * width + i];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int iterations = ITERATIONS;
|
||||
void start_tracing();
|
||||
void stop_tracing();
|
||||
|
||||
int main() {
|
||||
float* Matrix;
|
||||
float* TransposeMatrix;
|
||||
float* cpuTransposeMatrix;
|
||||
|
||||
float* gpuMatrix;
|
||||
float* gpuTransposeMatrix;
|
||||
|
||||
hipDeviceProp_t devProp;
|
||||
hipGetDeviceProperties(&devProp, 0);
|
||||
|
||||
std::cout << "Device name " << devProp.name << std::endl;
|
||||
|
||||
int i;
|
||||
int errors;
|
||||
|
||||
while (iterations-- > 0) {
|
||||
start_tracing();
|
||||
|
||||
Matrix = (float*)malloc(NUM * sizeof(float));
|
||||
TransposeMatrix = (float*)malloc(NUM * sizeof(float));
|
||||
cpuTransposeMatrix = (float*)malloc(NUM * sizeof(float));
|
||||
|
||||
// initialize the input data
|
||||
for (i = 0; i < NUM; i++) {
|
||||
Matrix[i] = (float)i * 10.0f;
|
||||
}
|
||||
|
||||
// allocate the memory on the device side
|
||||
hipMalloc((void**)&gpuMatrix, NUM * sizeof(float));
|
||||
hipMalloc((void**)&gpuTransposeMatrix, NUM * sizeof(float));
|
||||
|
||||
// Memory transfer from host to device
|
||||
hipMemcpy(gpuMatrix, Matrix, NUM * sizeof(float),
|
||||
hipMemcpyHostToDevice);
|
||||
|
||||
// Lauching kernel from host
|
||||
hipLaunchKernel(matrixTranspose,
|
||||
dim3(WIDTH / THREADS_PER_BLOCK_X, WIDTH /
|
||||
THREADS_PER_BLOCK_Y),
|
||||
dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y), 0, 0,
|
||||
gpuTransposeMatrix, gpuMatrix, WIDTH);
|
||||
|
||||
// Memory transfer from device to host
|
||||
hipMemcpy(TransposeMatrix, gpuTransposeMatrix, NUM * sizeof(float),
|
||||
hipMemcpyDeviceToHost);
|
||||
|
||||
// CPU MatrixTranspose computation
|
||||
matrixTransposeCPUReference(cpuTransposeMatrix, Matrix, WIDTH);
|
||||
|
||||
// verify the results
|
||||
errors = 0;
|
||||
double eps = 1.0E-6;
|
||||
for (i = 0; i < NUM; i++) {
|
||||
if (std::abs(TransposeMatrix[i] - cpuTransposeMatrix[i]) > eps) {
|
||||
errors++;
|
||||
}
|
||||
}
|
||||
if (errors != 0) {
|
||||
printf("FAILED: %d errors\n", errors);
|
||||
} else {
|
||||
printf("PASSED!\n");
|
||||
}
|
||||
|
||||
// free the resources on device side
|
||||
hipFree(gpuMatrix);
|
||||
hipFree(gpuTransposeMatrix);
|
||||
|
||||
// free the resources on host side
|
||||
free(Matrix);
|
||||
free(TransposeMatrix);
|
||||
free(cpuTransposeMatrix);
|
||||
|
||||
stop_tracing();
|
||||
}
|
||||
|
||||
return errors;
|
||||
}
|
||||
|
||||
/////////////////////////////////////////////////////////////////////////////
|
||||
// HIP/HCC Callbacks/Activity tracing
|
||||
/////////////////////////////////////////////////////////////////////////////
|
||||
#include <inc/roctracer_hip.h>
|
||||
#include <inc/roctracer_hcc.h>
|
||||
|
||||
// Macro to check ROC-tracer calls status
|
||||
#define ROCTRACER_CALL(call) \
|
||||
do { \
|
||||
int err = call; \
|
||||
if (err != 0) { \
|
||||
std::cerr << roctracer_error_string() << std::endl << std::flush; \
|
||||
abort(); \
|
||||
} \
|
||||
} while (0)
|
||||
|
||||
// HIP API callback function
|
||||
void hip_api_callback(
|
||||
uint32_t domain,
|
||||
uint32_t cid,
|
||||
const void* callback_data,
|
||||
void* arg)
|
||||
{
|
||||
(void)arg;
|
||||
const hip_api_data_t* data = reinterpret_cast<const hip_api_data_t*>
|
||||
(callback_data);
|
||||
fprintf(stdout, "<%s id(%u)\tcorrelation_id(%lu) %s> ",
|
||||
roctracer_op_string(ACTIVITY_DOMAIN_HIP_API, cid, 0),
|
||||
cid,
|
||||
data->correlation_id,
|
||||
(data->phase == ACTIVITY_API_PHASE_ENTER) ? "on-enter" : "on-exit");
|
||||
if (data->phase == ACTIVITY_API_PHASE_ENTER) {
|
||||
switch (cid) {
|
||||
case HIP_API_ID_hipMemcpy:
|
||||
fprintf(stdout, "dst(%p) src(%p) size(0x%x) kind(%u)",
|
||||
data->args.hipMemcpy.dst,
|
||||
data->args.hipMemcpy.src,
|
||||
(uint32_t)(data->args.hipMemcpy.sizeBytes),
|
||||
(uint32_t)(data->args.hipMemcpy.kind));
|
||||
break;
|
||||
case HIP_API_ID_hipMalloc:
|
||||
fprintf(stdout, "ptr(%p) size(0x%x)",
|
||||
data->args.hipMalloc.ptr,
|
||||
(uint32_t)(data->args.hipMalloc.size));
|
||||
break;
|
||||
case HIP_API_ID_hipFree:
|
||||
fprintf(stdout, "ptr(%p),
|
||||
data->args.hipFree.ptr);
|
||||
break;
|
||||
case HIP_API_ID_hipModuleLaunchKernel:
|
||||
fprintf(stdout, "kernel(\"%s\") stream(%p)",
|
||||
hipKernelNameRef(data->args.hipModuleLaunchKernel.f),
|
||||
data->args.hipModuleLaunchKernel.stream);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
} else {
|
||||
switch (cid) {
|
||||
case HIP_API_ID_hipMalloc:
|
||||
fprintf(stdout, "*ptr(0x%p)",
|
||||
*(data->args.hipMalloc.ptr));
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
fprintf(stdout, "\n"); fflush(stdout);
|
||||
}
|
||||
|
||||
// Activity tracing callback
|
||||
// hipMalloc id(3) correlation_id(1):
|
||||
// begin_ns(1525888652762640464) end_ns(1525888652762877067)
|
||||
void activity_callback(const char* begin, const char* end, void* arg) {
|
||||
const roctracer_record_t* record = reinterpret_cast
|
||||
<const roctracer_record_t*>(begin);
|
||||
const roctracer_record_t* end_record = reinterpret_cast
|
||||
<const roctracer_record_t*>(end);
|
||||
fprintf(stdout, "\tActivity records:\n"); fflush(stdout);
|
||||
while (record < end_record) {
|
||||
const char * name = roctracer_op_string(record->domain,
|
||||
record->activity_id, 0);
|
||||
fprintf(stdout, "\t%s\tcorrelation_id(%lu) time_ns(%lu:%lu) \
|
||||
device_id(%d) stream_id(%lu)",
|
||||
name,
|
||||
record->correlation_id,
|
||||
record->begin_ns,
|
||||
record->end_ns,
|
||||
record->device_id,
|
||||
record->stream_id
|
||||
);
|
||||
if (record->kind == hc::HSA_OP_ID_COPY)
|
||||
fprintf(stdout, " bytes(0x%zx)", record->bytes);
|
||||
fprintf(stdout, "\n");
|
||||
fflush(stdout);
|
||||
ROCTRACER_CALL(roctracer_next_record(record, &record));
|
||||
}
|
||||
}
|
||||
|
||||
// Start tracing routine
|
||||
void start_tracing() {
|
||||
std::cout << "# START #############################" << std::endl
|
||||
<< std::flush;
|
||||
// Allocating tracing pool
|
||||
roctracer_properties_t properties{};
|
||||
properties.buffer_size = 0x1000;
|
||||
properties.buffer_callback_fun = activity_callback;
|
||||
ROCTRACER_CALL(roctracer_open_pool(&properties));
|
||||
// Enable API callbacks, all domains
|
||||
ROCTRACER_CALL(roctracer_enable_callback(hip_api_callback, NULL));
|
||||
// Enable activity tracing, all domains
|
||||
ROCTRACER_CALL(roctracer_enable_activity());
|
||||
}
|
||||
|
||||
// Stop tracing routine
|
||||
void stop_tracing() {
|
||||
ROCTRACER_CALL(roctracer_disable_api_callback());
|
||||
ROCTRACER_CALL(roctracer_disable_api_activity());
|
||||
ROCTRACER_CALL(roctracer_close_pool());
|
||||
std::cout << "# STOP #############################" << std::endl
|
||||
<< std::flush;
|
||||
}
|
||||
/////////////////////////////////////////////////////////////////////////////
|
||||
```
|
||||
@@ -29,9 +29,10 @@ THE SOFTWARE.
|
||||
typedef enum {
|
||||
ACTIVITY_DOMAIN_HSA_API = 0, // HSA API domain
|
||||
ACTIVITY_DOMAIN_HSA_OPS = 1, // HSA async activity domain
|
||||
ACTIVITY_DOMAIN_HCC_OPS = 2, // HCC async activity domain
|
||||
ACTIVITY_DOMAIN_HIP_OPS = 2, // HIP async activity domain
|
||||
ACTIVITY_DOMAIN_HCC_OPS = ACTIVITY_DOMAIN_HIP_OPS, // HCC async activity domain
|
||||
ACTIVITY_DOMAIN_HIP_VDI = ACTIVITY_DOMAIN_HIP_OPS, // HIP VDI async activity domain
|
||||
ACTIVITY_DOMAIN_HIP_API = 3, // HIP API domain
|
||||
ACTIVITY_DOMAIN_HIP_VDI = ACTIVITY_DOMAIN_HCC_OPS, // HIP VDI domain
|
||||
ACTIVITY_DOMAIN_KFD_API = 4, // KFD API domain
|
||||
ACTIVITY_DOMAIN_EXT_API = 5, // External ID domain
|
||||
ACTIVITY_DOMAIN_ROCTX = 6, // ROCTX domain
|
||||
|
||||
@@ -245,6 +245,10 @@ roctracer_status_t roctracer_flush_activity()
|
||||
return roctracer_flush_activity_expl(NULL);
|
||||
}
|
||||
|
||||
// Get system timestamp
|
||||
roctracer_status_t roctracer_get_timestamp(
|
||||
uint64_t* timestamp); // [out] return timestamp
|
||||
|
||||
// Load/Unload methods
|
||||
// Set properties
|
||||
roctracer_status_t roctracer_set_properties(
|
||||
@@ -260,10 +264,6 @@ bool roctracer_load(
|
||||
|
||||
void roctracer_unload(bool destruct);
|
||||
|
||||
// Get system timestamp.
|
||||
roctracer_status_t roctracer_get_timestamp(
|
||||
uint64_t* timestamp);
|
||||
|
||||
#ifdef __cplusplus
|
||||
} // extern "C" block
|
||||
#endif // __cplusplus
|
||||
|
||||
Αναφορά σε νέο ζήτημα
Block a user