Updated links in documentation. (#328)

Updated to reflect new GitHub organization.
Fixed broken links to GitHub pages.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-compute commit: f5712875aa]
이 커밋은 다음에 포함됨:
David Galiffi
2024-03-21 11:14:37 -04:00
커밋한 사람 GitHub
부모 8943f686f5
커밋 376a86d2d0
19개의 변경된 파일47개의 추가작업 그리고 47개의 파일을 삭제
+1 -1
파일 보기
@@ -29,7 +29,7 @@ project(
VERSION ${OMNIPERF_VERSION}
LANGUAGES C
DESCRIPTION "OmniPerf"
HOMEPAGE_URL "https://github.com/AMDResearch/omniperf")
HOMEPAGE_URL "https://github.com/ROCm/omniperf")
include(ExternalProject)
include(GNUInstallDirs)
+6 -6
파일 보기
@@ -1,11 +1,11 @@
## How to fork from us
To keep our development fast and conflict free, we recommend you to [fork](https://github.com/AMDResearch/omniperf/fork) our repository and start your work from our `dev` branch in your private repository.
To keep our development fast and conflict free, we recommend you to [fork](https://github.com/ROCm/omniperf/fork) our repository and start your work from our `dev` branch in your private repository.
Afterwards, git clone your repository to your local machine. But that is not it! To keep track of the original develop repository, add it as another remote.
```
git remote add mainline https://github.com/AMDResearch/omniperf.git
git remote add mainline https://github.com/ROCm/omniperf.git
git checkout dev
```
@@ -21,19 +21,19 @@ and apply your changes there.
### Did you find a bug?
- Ensure the bug was not already reported by searching on GitHub under [Issues](https://github.com/AMDResearch/omniperf/issues).
- Ensure the bug was not already reported by searching on GitHub under [Issues](https://github.com/ROCm/omniperf/issues).
- If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/AMDResearch/omniperf/issues/new).
- If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/ROCm/omniperf/issues/new).
### Did you write a patch that fixes a bug?
- Open a new GitHub [pull request](https://github.com/AMDResearch/omniperf/compare) with the patch.
- Open a new GitHub [pull request](https://github.com/ROCm/omniperf/compare) with the patch.
- Ensure the PR description clearly describes the problem and solution. If there is an existing GitHub issue open describing this bug, please include it in the description so we can close it.
- Ensure the PR is based on the `dev` branch of the Omniperf GitHub repository.
- Omniperf requires new commits to include a "Signed-off-by" token in the commit message (typically enabled via the `git commit -s` option), indicating your agreement to the projects's [Developer's Certificate of Origin](https://developercertificate.org/) and compatability with the project [LICENSE](https://github.com/AMDResearch/omniperf/blob/main/LICENSE):
- Omniperf requires new commits to include a "Signed-off-by" token in the commit message (typically enabled via the `git commit -s` option), indicating your agreement to the projects's [Developer's Certificate of Origin](https://developercertificate.org/) and compatability with the project [LICENSE](https://github.com/ROCm/omniperf/blob/main/LICENSE):
> (a) The contribution was created in whole or in part by me and I
+5 -5
파일 보기
@@ -1,7 +1,7 @@
[![Ubuntu 20.04](https://github.com/AMDResearch/omniperf/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/AMDResearch/omniperf/actions/workflows/ubuntu-focal.yml)
[![RHEL 8](https://github.com/AMDResearch/omniperf/actions/workflows/opensuse.yml/badge.svg)](https://github.com/AMDResearch/omniperf/actions/workflows/opensuse.yml)
[![MI100](https://github.com/AMDResearch/omniperf/actions/workflows/mi100.yml/badge.svg?branch=2.x)](https://github.com/AMDResearch/omniperf/actions/workflows/mi100.yml)
[![Docs](https://github.com/AMDResearch/omniperf/actions/workflows/docs.yml/badge.svg)](https://amdresearch.github.io/omniperf/)
[![Ubuntu 20.04](https://github.com/ROCm/omniperf/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/ROCm/omniperf/actions/workflows/ubuntu-focal.yml)
[![RHEL 8](https://github.com/ROCm/omniperf/actions/workflows/opensuse.yml/badge.svg)](https://github.com/ROCm/omniperf/actions/workflows/opensuse.yml)
[![MI100](https://github.com/ROCm/omniperf/actions/workflows/mi100.yml/badge.svg?branch=2.x)](https://github.com/ROCm/omniperf/actions/workflows/mi100.yml)
[![Docs](https://github.com/ROCm/omniperf/actions/workflows/docs.yml/badge.svg)](https://rocm.github.io/omniperf/)
[![DOI](https://zenodo.org/badge/561919887.svg)](https://zenodo.org/badge/latestdoi/561919887)
@@ -14,7 +14,7 @@ targets usage on MI100 and MI200 accelerators.
* For more information on available features, installation steps, and
workload profiling and analysis, please refer to the online
[documentation](https://amdresearch.github.io/omniperf).
[documentation](https://rocm.github.io/omniperf).
* Omniperf is an AMD open source research project and is not supported
as part of the ROCm software stack. We welcome contributions and
+1 -1
파일 보기
@@ -12,7 +12,7 @@ whatis("Name: omniperf")
whatis("Version: @OMNIPERF_FULL_VERSION@")
whatis("Keywords: Profiling, Performance, GPU")
whatis("Description: tool for GPU performance profiling")
whatis("URL: https://github.com/AMDResearch/omniperf")
whatis("URL: https://github.com/ROCm/omniperf")
-- Export environmental variables
local topDir="@CMAKE_INSTALL_PREFIX@"
+6 -6
파일 보기
@@ -12,7 +12,7 @@ While analyzing with the CLI offers quick and straightforward access to Omniperf
See sections below for more information on each.
## CLI Analysis
> Profiling results from the [aforementioned vcopy workload](https://amdresearch.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform.
> Profiling results from the [aforementioned vcopy workload](https://rocm.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform.
### Features
@@ -171,7 +171,7 @@ $ omniperf analyze -p workloads/vcopy/mi200/ --list-metrics gfx90a
├─────────┼─────────────────────────────┤
...
```
2. Choose your own customized subset of metrics with `-b` (a.k.a. `--metric`), or build your own config following [config_template](https://github.com/AMDResearch/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light).
2. Choose your own customized subset of metrics with `-b` (a.k.a. `--metric`), or build your own config following [config_template](https://github.com/ROCm/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light).
```shell-session
$ omniperf analyze -p workloads/vcopy/mi200/ -b 2
--------
@@ -351,7 +351,7 @@ be generated directly from the command-line. This option is provided
as an alternative for users wanting to explore profiling results
graphically, but without the additional setup requirements or
server-side overhead of Omniperf's detailed [Grafana
interface](https://amdresearch.github.io/omniperf/analysis.html#grafana-based-gui)
interface](https://rocm.github.io/omniperf/analysis.html#grafana-based-gui)
option. The standalone GUI analyzer is provided as simple
[Flask](https://flask.palletsprojects.com/en/2.2.x/) application
allowing users to view results from within a web browser.
@@ -365,7 +365,7 @@ between the desired web browser host (e.g. login node or remote workstation) and
required. Alternatively, users may find it more convenient to download
profiled workloads to perform analysis on their local system.
See [FAQ](https://amdresearch.github.io/omniperf/faq.html) for more details on SSH tunneling.
See [FAQ](https://rocm.github.io/omniperf/faq.html) for more details on SSH tunneling.
```
#### Usage
@@ -420,7 +420,7 @@ Once you have applied a filter, you will also see several additional
sections become available with detailed metrics specific to that area
of AMD hardware. These detailed sections mirror the data displayed in
Omniperf's [Grafana
interface](https://amdresearch.github.io/omniperf/analysis.html#grafana-based-gui).
interface](https://rocm.github.io/omniperf/analysis.html#grafana-based-gui).
### Grafana-based GUI
@@ -470,7 +470,7 @@ For example, if one wants to inspect Dispatch Range from 17 to 48, inclusive, th
##### Incremental Profiling
Omniperf supports incremental profiling to significantly speed up performance analysis.
> Refer to [*IP Block profiling*](https://amdresearch.github.io/omniperf/profiling.html#ip-block-profiling) section for this command.
> Refer to [*IP Block profiling*](https://rocm.github.io/omniperf/profiling.html#ip-block-profiling) section for this command.
By default, the entire application is profiled to collect perfmon counter for all IP blocks, giving a system level view of where the workload stands in terms of performance optimization opportunities and bottlenecks.
+1 -1
파일 보기
@@ -74,7 +74,7 @@ Modes change the fundamental behavior of the Omniperf command line tool. Dependi
- **Database**: Our detailed Grafana GUI is built on a MongoDB database. `--import` profiling results to the DB to interact with the workload in Grafana or `--remove` the workload from the DB.
Connection options will need to be specified. See the [*Grafana
Analysis*](https://amdresearch.github.io/omniperf/analysis.html#grafana-gui-import) import section
Analysis*](https://rocm.github.io/omniperf/analysis.html#grafana-gui-import) import section
for more details on this.
```shell
+1 -1
파일 보기
@@ -6,7 +6,7 @@
:maxdepth: 4
```
The [Omniperf](https://github.com/AMDResearch/omniperf) Tool is architecturally composed of three major components, as shown in the following figure.
The [Omniperf](https://github.com/ROCm/omniperf) Tool is architecturally composed of three major components, as shown in the following figure.
- **Omniperf Profiling**: Acquire raw performance counters via application replay based on the [rocProfiler](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html). The counters are stored in a comma-seperated value, for further analyis. A set of MI200 specific micro benchmarks are also run to acquire the hierarchical roofline data. The roofline model is not available on earlier accelerators.
+1 -1
파일 보기
@@ -1,4 +1,4 @@
# Welcome to the [Omniperf](https://github.com/AMDResearch/omniperf) Documentation!
# Welcome to the [Omniperf](https://github.com/ROCm/omniperf) Documentation!
```eval_rst
.. toctree::
+1 -1
파일 보기
@@ -51,7 +51,7 @@ defined as follows:
A typical install will begin by downloading the latest release tarball
available from the
[Releases](https://github.com/AMDResearch/omniperf/releases) section
[Releases](https://github.com/ROCm/omniperf/releases) section
of the Omniperf development site. From there, untar and descend into
the top-level directory as follows:
+2 -2
파일 보기
@@ -6,11 +6,11 @@
:maxdepth: 4
```
[Browse Omniperf source code on Github](https://github.com/AMDResearch/omniperf)
[Browse Omniperf source code on Github](https://github.com/ROCm/omniperf)
## Scope
MI Performance Profiler ([Omniperf](https://github.com/AMDResearch/omniperf)) is a system performance profiling tool for Machine Learning/HPC workloads running on AMD Instinct (tm) Accelerators. It is currently built on top of the [rocProfiler](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html) to monitor hardware performance counters. The Omniperf tool primarily targets accelerators in the MI100 and MI200 families. Development is in progress to support MI300 and Radeon (tm) RDNA (tm) GPUs.
MI Performance Profiler ([Omniperf](https://github.com/ROCm/omniperf)) is a system performance profiling tool for Machine Learning/HPC workloads running on AMD Instinct (tm) Accelerators. It is currently built on top of the [rocProfiler](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html) to monitor hardware performance counters. The Omniperf tool primarily targets accelerators in the MI100 and MI200 families. Development is in progress to support MI300 and Radeon (tm) RDNA (tm) GPUs.
## Features
+2 -2
파일 보기
@@ -6,7 +6,7 @@
:maxdepth: 5
```
The [Omniperf](https://github.com/AMDResearch/omniperf) repository
The [Omniperf](https://github.com/ROCm/omniperf) repository
includes source code for a sample GPU compute workload,
__vcopy.cpp__. A copy of this file is available in the `share/sample`
subdirectory after a normal Omniperf installation, or via the
@@ -37,7 +37,7 @@ Releasing CPU memory
```
## Omniperf Profiling
The *omniperf* script, availible through the [Omniperf](https://github.com/AMDResearch/omniperf) repository, is used to aquire all necessary perfmon data through analysis of compute workloads.
The *omniperf* script, availible through the [Omniperf](https://github.com/ROCm/omniperf) repository, is used to aquire all necessary perfmon data through analysis of compute workloads.
**omniperf help:**
```shell-session
+6 -6
파일 보기
@@ -12,7 +12,7 @@ While analyzing with the CLI offers quick and straightforward access to Omniperf
See sections below for more information on each.
## CLI Analysis
> Profiling results from the [aforementioned vcopy workload](https://amdresearch.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform.
> Profiling results from the [aforementioned vcopy workload](https://rocm.github.io/omniperf/profiling.html#workload-compilation) will be used in the following sections to demonstrate the use of Omniperf in MI GPU performance analysis. Unless otherwise noted, the performance analysis is done on the MI200 platform.
### Features
@@ -266,7 +266,7 @@ Analysis mode = cli
2.1.30 -> L1I Fetch Latency
...
```
3. Choose your own customized subset of metrics with `-b` (a.k.a. `--block`), or build your own config following [config_template](https://github.com/AMDResearch/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light).
3. Choose your own customized subset of metrics with `-b` (a.k.a. `--block`), or build your own config following [config_template](https://github.com/ROCm/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light).
```shell-session
$ omniperf analyze -p workloads/vcopy/MI200/ -b 2
--------
@@ -363,7 +363,7 @@ be generated directly from the command-line. This option is provided
as an alternative for users wanting to explore profiling results
graphically, but without the additional setup requirements or
server-side overhead of Omniperf's detailed [Grafana
interface](https://amdresearch.github.io/omniperf/analysis.html#grafana-based-gui)
interface](https://rocm.github.io/omniperf/analysis.html#grafana-based-gui)
option. The standalone GUI analyzer is provided as simple
[Flask](https://flask.palletsprojects.com/en/2.2.x/) application
allowing users to view results from within a web browser.
@@ -377,7 +377,7 @@ between the desired web browser host (e.g. login node or remote workstation) and
required. Alternatively, users may find it more convenient to download
profiled workloads to perform analysis on their local system.
See [FAQ](https://amdresearch.github.io/omniperf/faq.html) for more details on SSH tunneling.
See [FAQ](https://rocm.github.io/omniperf/faq.html) for more details on SSH tunneling.
```
#### Usage
@@ -437,7 +437,7 @@ Once you have applied a filter, you will also see several additional
sections become available with detailed metrics specific to that area
of AMD hardware. These detailed sections mirror the data displayed in
Omniperf's [Grafana
interface](https://amdresearch.github.io/omniperf/analysis.html#grafana-based-gui).
interface](https://rocm.github.io/omniperf/analysis.html#grafana-based-gui).
### Grafana-based GUI
@@ -487,7 +487,7 @@ For example, if one wants to inspect Dispatch Range from 17 to 48, inclusive, th
##### Incremental Profiling
Omniperf supports incremental profiling to significantly speed up performance analysis.
> Refer to [*Hardware Component Filtering*](https://amdresearch.github.io/omniperf/profiling.html#hardware-component-filtering) section for this command.
> Refer to [*Hardware Component Filtering*](https://rocm.github.io/omniperf/profiling.html#hardware-component-filtering) section for this command.
By default, the entire application is profiled to collect performance counters for all hardware blocks, giving a complete view of where the workload stands in terms of performance optimization opportunities and bottlenecks.
+1 -1
파일 보기
@@ -75,7 +75,7 @@ Modes change the fundamental behavior of the Omniperf command line tool. Dependi
- **Database**: Our detailed Grafana GUI is built on a MongoDB database. `--import` profiling results to the DB to interact with the workload in Grafana or `--remove` the workload from the DB.
Connection options will need to be specified. See the [*Grafana
Analysis*](https://amdresearch.github.io/omniperf/analysis.html#grafana-gui-import) import section
Analysis*](https://rocm.github.io/omniperf/analysis.html#grafana-gui-import) import section
for more details on this.
```shell
+1 -1
파일 보기
@@ -6,7 +6,7 @@
:maxdepth: 4
```
The [Omniperf](https://github.com/AMDResearch/omniperf) Tool is architecturally composed of three major components, as shown in the following figure.
The [Omniperf](https://github.com/ROCm/omniperf) Tool is architecturally composed of three major components, as shown in the following figure.
- **Omniperf Profiling**: Acquire raw performance counters via application replay based on [rocProf](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html). The counters are stored in a comma-seperated value, for further analysis. A set of MI200 specific micro benchmarks are also run to acquire the hierarchical roofline data. The roofline model is not available on earlier accelerators.
+1 -1
파일 보기
@@ -1,4 +1,4 @@
# Welcome to the [Omniperf](https://github.com/AMDResearch/omniperf) Documentation!
# Welcome to the [Omniperf](https://github.com/ROCm/omniperf) Documentation!
```eval_rst
.. toctree::
+1 -1
파일 보기
@@ -51,7 +51,7 @@ defined as follows:
A typical install will begin by downloading the latest release tarball
available from the
[Releases](https://github.com/AMDResearch/omniperf/releases) section
[Releases](https://github.com/ROCm/omniperf/releases) section
of the Omniperf development site. From there, untar and descend into
the top-level directory as follows:
+1 -1
파일 보기
@@ -10,7 +10,7 @@ This documentation was created to provide a detailed breakdown of all facets of
This project is proudly open source, and we welcome all feedback! For more details on how to contribute, please see our Contribution Guide.
[Browse Omniperf source code on Github](https://github.com/AMDResearch/omniperf)
[Browse Omniperf source code on Github](https://github.com/ROCm/omniperf)
## What is Omniperf
+8 -8
파일 보기
@@ -2178,7 +2178,7 @@ A good discussion of coarse and fine grained memory allocations and what type of
(VALU_inst_mix_example)=
## VALU Arithmetic Instruction Mix
For this example, we consider the [instruction mix sample](https://github.com/AMDResearch/omniperf/blob/dev/sample/instmix.hip) distributed as a part of Omniperf.
For this example, we consider the [instruction mix sample](https://github.com/ROCm/omniperf/blob/dev/sample/instmix.hip) distributed as a part of Omniperf.
```{note}
This example is expected to work on all CDNA accelerators, however the results in this section were collected on an [MI2XX](2xxnote) accelerator
@@ -2269,7 +2269,7 @@ shows that we have exactly one of each type of VALU arithmetic instruction, by c
(Fabric_transactions_example)=
## Infinity-Fabric(tm) transactions
For this example, we consider the [Infinity Fabric(tm) sample](https://github.com/AMDResearch/omniperf/blob/dev/sample/fabric.hip) distributed as a part of Omniperf.
For this example, we consider the [Infinity Fabric(tm) sample](https://github.com/ROCm/omniperf/blob/dev/sample/fabric.hip) distributed as a part of Omniperf.
This code launches a simple read-only kernel, e.g.:
```c++
@@ -2826,7 +2826,7 @@ On an AMD [MI2XX](2xxnote) accelerator, for FP32 values this will generate a `gl
(flatmembench)=
### Global / Generic (FLAT)
For this example, we consider the [vector-memory sample](https://github.com/AMDResearch/omniperf/blob/dev/sample/vmem.hip) distributed as a part of Omniperf.
For this example, we consider the [vector-memory sample](https://github.com/ROCm/omniperf/blob/dev/sample/vmem.hip) distributed as a part of Omniperf.
This code launches many different versions of a simple read/write/atomic-only kernels targeting various address spaces, e.g. below is our simple `global_write` kernel:
```c++
@@ -2976,7 +2976,7 @@ The assembly in these experiments were generated for an [MI2XX](2xxnote) acceler
Next, we examine a generic write.
As discussed [previously](Flat_design), our `generic_write` kernel uses an address space cast to _force_ the compiler to choose our desired address space, regardless of other optimizations that may be possible.
We also note that the `filter` parameter passed in as a kernel argument (see [example](https://github.com/AMDResearch/omniperf/blob/dev/sample/vmem.hip), or [design note](Flat_design)) is set to zero on the host, such that we always write to the 'local' (LDS) memory allocation `lds`.
We also note that the `filter` parameter passed in as a kernel argument (see [example](https://github.com/ROCm/omniperf/blob/dev/sample/vmem.hip), or [design note](Flat_design)) is set to zero on the host, such that we always write to the 'local' (LDS) memory allocation `lds`.
Examining this kernel in the VMEM Instruction Mix table yields:
@@ -3339,7 +3339,7 @@ Next we examine the use of 'Spill/Scratch' memory.
On current CDNA accelerators such as the [MI2XX](2xxnote), this is implemented using the [private](mspace) memory space, which maps to ['scratch' memory](https://llvm.org/docs/AMDGPUUsage.html#amdgpu-address-spaces) in AMDGPU hardware terminology.
This type of memory can be accessed via different instructions depending on the specific architecture targeted. However, current CDNA accelerators such as the [MI2XX](2xxnote) use so called `buffer` instructions to access private memory in a simple (and typically) coalesced manner. See [Sec. 9.1, 'Vector Memory Buffer Instructions' of the CDNA2 ISA guide](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf) for further reading on this instruction type.
We develop a [simple kernel](https://github.com/AMDResearch/omniperf/blob/dev/sample/stack.hip) that uses stack memory:
We develop a [simple kernel](https://github.com/ROCm/omniperf/blob/dev/sample/stack.hip) that uses stack memory:
```c++
#include <hip/hip_runtime.h>
__global__ void knl(int* out, int filter) {
@@ -3404,7 +3404,7 @@ Here we see a single write to the stack (10.3.6), which corresponds to an L1-L2
(IPC_example)=
## Instructions-per-cycle and Utilizations example
For this section, we use the instructions-per-cycle (IPC) [example](https://github.com/AMDResearch/omniperf/blob/dev/sample/ipc.hip) included with Omniperf.
For this section, we use the instructions-per-cycle (IPC) [example](https://github.com/ROCm/omniperf/blob/dev/sample/ipc.hip) included with Omniperf.
This example is compiled using `c++17` support:
@@ -3824,7 +3824,7 @@ Finally, we note that our branch utilization (11.2.5) has increased slightly fro
## LDS Examples
For this example, we consider the [LDS sample](https://github.com/AMDResearch/omniperf/blob/dev/sample/lds.hip) distributed as a part of Omniperf.
For this example, we consider the [LDS sample](https://github.com/ROCm/omniperf/blob/dev/sample/lds.hip) distributed as a part of Omniperf.
This code contains two kernels to explore how both [LDS](lds) bandwidth and bank conflicts are calculated in Omniperf.
This example was compiled and run on an MI250 accelerator using ROCm v5.6.0, and Omniperf v2.0.0.
@@ -4037,7 +4037,7 @@ The bank conflict rate linearly increases with the number of work-items within a
## Occupancy Limiters Example
In this [example](https://github.com/AMDResearch/omniperf/blob/dev/sample/occupancy.hip), we will investigate the use of the resource allocation panel in the [Workgroup Manager](SPI)'s metrics section to determine occupancy limiters.
In this [example](https://github.com/ROCm/omniperf/blob/dev/sample/occupancy.hip), we will investigate the use of the resource allocation panel in the [Workgroup Manager](SPI)'s metrics section to determine occupancy limiters.
This code contains several kernels to explore how both various kernel resources impact achieved occupancy, and how this is reported in Omniperf.
This example was compiled and run on a MI250 accelerator using ROCm v5.6.0, and Omniperf v2.0.0:
+1 -1
파일 보기
@@ -6,7 +6,7 @@
:maxdepth: 5
```
The [Omniperf](https://github.com/AMDResearch/omniperf) repository
The [Omniperf](https://github.com/ROCm/omniperf) repository
includes source code for a sample GPU compute workload,
__vcopy.cpp__. A copy of this file is available in the `share/sample`
subdirectory after a normal Omniperf installation, or via the