[rocprofiler-sdk] Update rocprofiler-sdk CONTRIBUTING.md (#1371)

Этот коммит содержится в:
Jonathan R. Madsen
2025-10-20 21:46:24 -05:00
коммит произвёл GitHub
родитель 32f9fa6ca5
Коммит 4cca398b56
23 изменённых файлов: 348 добавлений и 106 удалений
+3
Просмотреть файл
@@ -9,6 +9,7 @@ misc-*,\
-misc-non-private-member-variables-in-classes,\
-misc-include-cleaner,\
-misc-const-correctness,\
-misc-use-internal-linkage,\
modernize-*,\
-modernize-deprecated-headers,\
-modernize-raw-string-literal,\
@@ -47,6 +48,8 @@ readability-*,\
-readability-function-cognitive-complexity,\
-readability-identifier-length,\
-readability-use-anyofallof,\
-readability-enum-initial-value,\
-readability-math-missing-parentheses,\
"
CheckOptions:
- key: readability-braces-around-statements.ShortStatementLines
+152 -7
Просмотреть файл
@@ -22,19 +22,164 @@ meets the acceptance criteria or to discuss an idea about the library.
## Acceptance Criteria ##
Github issues are recommended for any significant change to the code base that adds a feature or fixes a non-trivial issue. If the code change is large without the presence of an issue (or prior discussion with AMD), the change may not be reviewed. Small fixes that fix broken behavior or other bugs are always welcome with or without an associated issue.
## Coding Style ##
All changes must be formatted with clang-format-15/cmake-format before review/acceptance. The exact settings for these formatters must be the ones in this repository.
Github issues are recommended for any significant change to the code base that adds a feature or fixes a non-trivial issue.
If the code change is large without the presence of an issue (or prior discussion with AMD), the change may not be reviewed.
Small fixes that fix broken behavior or other bugs are always welcome with or without an associated issue.
## Pull Request Guidelines ##
By creating a pull request, you agree to the statements made in the [code license](#code-license) section. Your pull request should target the default branch. Our current default branch is the **develop** branch, which serves as our integration branch.
By creating a pull request, you agree to the statements made in the [code license](#code-license) section.
Your pull request should target the default branch. Our current default branch is the **develop** branch, which serves as our integration branch.
All changes must meet the following requirements for review/acceptance:
1. All C and C++ code must be formatted with clang-format-11.
2. All Python code must be formatted with black.
3. All CMake code must be formatted with cmake-format.
4. All C++ changes must pass the clang-tidy checks (clang-tidy version 15.x.x through version 19.x.x are acceptable).
5. All text files must end with the new line character.
6. All C and C++ compiler warnings must be fixed
All the above checks are enforced during CI.
The [requirements.txt](requirements.txt) defines the exact versions of formatters and linters as needed.
In order to streamline requirements 1-4, support has been built into the rocprofiler-sdk build system.
By default, CMake will search for `clang-format`, `black`, and `cmake-format`. If `clang-format` is found,
CMake will add a `format-source` build target, e.g. `make format-source`; if `black` is found, CMake
will add a `format-python` build target; if `cmake-format` is found, CMake will add a `format-cmake` build
target. If any of the `format-source`, `format-python`, or `format-cmake` targets exist, CMake will
also add a generic `format` build target which depends on all the available `format-*` targets. Thus,
running `make format` will apply formatting to C, C++, Python, and CMake. The CMake option
`ROCPROFILER_ENABLE_CLANG_TIDY` can be used to enable clang-tidy checks when compiling the source code.
For requirement #5, it is recommended to configure your IDE to automatically add new lines at the end of files.
For requirement #6, the CMake option `ROCPROFILER_BUILD_DEVELOPER` can be used to enable the `-Werror` compiler flag,
which treats warnings as errors.
For simplicity, rocprofiler-sdk provides a CMake option `ROCPROFILER_BUILD_CI` to enable the following CMake options by default:
`ROCPROFILER_BUILD_TESTS`, `ROCPROFILER_BUILD_SAMPLES`, `ROCPROFILER_BUILD_DEVELOPER`. However, if CMake is initially configured
with `ROCPROFILER_BUILD_CI=OFF` (the default), re-running cmake with `ROCPROFILER_BUILD_CI=ON` does not change the values of
`ROCPROFILER_BUILD_TESTS` and `ROCPROFILER_BUILD_SAMPLES` (which are also, by default, OFF).
Thus, the build setup for developer contributions is the following:
```bash
python3 -m pip install --user ./requirements.txt
cmake -B build-rocprofiler-sdk . -DROCPROFILER_BUILD_CI=ON -DROCPROFILER_ENABLE_CLANG_TIDY=ON
```
## Coding Style Guidelines ##
1. Use the file extension `.h` for C-compatible header files and `.c` for C implementation files.
2. Use the file extension `.hpp` for C++ header files and `.cpp` for C++ implementation files.
3. All public APIs which require linking must be compatible with C. Public C++ APIs may only be distributed as header-only implementations.
4. The source code organization within [source](./source) should roughly align to the installation locations, e.g. an executable `foo` which will be
installed in `bin` should be in either `source/bin/foo.py` (if script which doesn't require compilation) or in the folder `source/bin/foo/` (if requires compilation).
5. In a `CMakeLists.txt` file, do not add sources to a target from any other directory other than the current directory; instead use a combination of `add_subdirectory` and `target_sources`.
6. In CMake, always use target-based semantics such as `target_include_directories(...)`, `target_compile_definitions(...)`; CMake functions which are not target-based such as `include_directories(...)`, `add_definitions(...)` should be strictly avoided.
7. In CMake, use of `INTERFACE` libraries is encouraged for compiler options, compiler definitions, include directories, etc.
8. In internal implementations, designs requiring internal communication across translation units should prefer procedural or functional interfaces instead of object-oriented interfaces.
* E.g. headers should declare simple structs without any protected or private data and standalone functions returning or operating on the aforementioned structs instead of exposing classes with public/protected/private member variables and member functions.
* Within the implementation file, classes may be used as desired.
9. All public API structs which as used in C should have a `uint64_t size` member variable as the first member variable. Tool developers use this for ABI-compatability checks at runtime when accessing a struct instance via a pointer.
* In internal implementations, all public API structs should be initialized via the `init_public_api_struct` function defined in [source/lib/common/utility.hpp](./source/lib/common/utility.hpp).
* If a public API struct is intentionally padded, the padding should be of the form `uint8_t reserved_padding[<num-bytes>]` at the end of the struct. The name `reserved_padding` is important to how `init_public_api_struct` sets the `.size` value. Furthermore, static asserts should be added to ensure that `sizeof(T)` is never changed.
10. In internal implementations, one variable should be initialized per line: `int x, y;` is not permitted. The preferred form of variable initialization for non-primitive types is `auto <name> = <type>{}`... in other words, `auto` on the LHS and curly braces `{}` instead of parentheses `()`.
* The use of `auto` is for readability: determining the variable name in `auto val = std::unordered_map<Foo, std::unordered_map<uint64_t, std::vector<Bar>>{};` is quite a bit easier than in `std::unordered_map<Foo, std::unordered_map<uint64_t, std::vector<Bar>> val{};`.
* The use of curly braces has many benefits: prevention of implicit casting, is not potentially ambiguous with a function call (i.e. `Foo()` in `auto val = Foo()` may be a function call or construction of an object of class `Foo` whereas `Foo{}` can only be construction of an object of class `Foo`), etc.
## Testing Guidelines ##
To run the rocprofiler-sdk test suite alongside the building rocprofiler-sdk:
```bash
cmake -B build-rocprofiler-sdk -DROCPROFILER_BUILD_TESTS=ON -DROCPROFILER_BUILD_SAMPLES=ON .
cmake --build build-rocprofiler-sdk --target all --parallel 12
cd build-rocprofiler-sdk
ctest --output-on-failure -O ctest.all.log
```
In the above `ctest` command, `--output-on-failure` shows the test log only when the test fails and `-O <filename>` writes the log to a file in addition echoing it to the terminal.
CTest supports various options such as `-R` and `-E` for filtering which tests are run based on the test names, options such as `-L` and `-LE` for filtering which tests are run based on the test labels (Use `--print-labels` to see list of test labels).
Other useful options are `--rerun-failed`, `--stop-on-failure`, `--repeat until-fail:<N>`, `--show-only` (`-N`), `--verbose` (`-V`), and `--extra-verbose` (`-VV`).
Running `ctest -N -V` will show all details of the tests (command, environment, etc.) without running them.
One can also use [source/scripts/run-ci.py](./source/scripts/run-ci.py) locally with the argument `--disable-cdash` to avoid submitting the job to the CDash dashboard.
Examples using [run-ci.py](./source/scripts/run-ci.py) can be found in the [GitHub Actions workflows for rocprofiler-sdk](../../.github/workflows/rocprofiler-sdk-continuous_integration.yml).
If attempting to reproduce the sanitizer jobs, e.g. `cmake -DROCPROFILER_MEMCHECK=ThreadSanitizer ...`, locally instead of using [source/script/run-ci.py](./source/scripts/run-ci.py),
use [source/scripts/setup-sanitizer-env.sh](./source/scripts/setup-sanitizer-env.sh) to set the same sanitizer environment variables that rocprofiler-sdk uses during CI.
If trying to debug a specific test, use `ctest -N -V -R <test-name>` and use the output to create a bash script to run it, e.g. `ctest -N -V -R rocprofv3-test-trace-execute` produces:
```console
# ... removed for brevity
204: Test command: /home/user/rocm-systems/projects/rocprofiler-sdk/build-rocprofiler-sdk/bin/hip-graph
204: Working Directory: /home/user/rocm-systems/projects/rocprofiler-sdk/build-rocprofiler-sdk/tests/hip-graph-tracing
204: Environment variables:
204: LD_PRELOAD=/home/user/rocm-systems/projects/rocprofiler-sdk/build-rocprofiler-sdk/lib/rocprofiler-sdk/librocprofiler-sdk-json-tool.so.0.0.0
204: ROCPROFILER_TOOL_OUTPUT_FILE=hip-graph-tracing-test.json
204: LD_LIBRARY_PATH=/home/user/rocm-systems/projects/rocprofiler-sdk/build-rocprofiler-sdk/lib:/usr/lib64:/usr/lib:/usr/local/lib
204: ROCPROFILER_TOOL_CONTEXTS=HIP_API_CALLBACK,HIP_API_BUFFERED,KERNEL_DISPATCH_CALLBACK,KERNEL_DISPATCH_BUFFERED,CODE_OBJECT
Labels: integration-tests
Test #204: test-hip-graph-tracing-execute
Total Tests: 1
```
Using all of the lines prefixed with `204:`, a bash script can be easily created:
```bash
# taken from "Environment variables:"
export LD_PRELOAD=/home/user/rocm-systems/projects/rocprofiler-sdk/build-rocprofiler-sdk/lib/rocprofiler-sdk/librocprofiler-sdk-json-tool.so.0.0.0
export ROCPROFILER_TOOL_OUTPUT_FILE=hip-graph-tracing-test.json
export LD_LIBRARY_PATH=/home/user/rocm-systems/projects/rocprofiler-sdk/build-rocprofiler-sdk/lib:/usr/lib64:/usr/lib:/usr/local/lib
export ROCPROFILER_TOOL_CONTEXTS=HIP_API_CALLBACK,HIP_API_BUFFERED,KERNEL_DISPATCH_CALLBACK,KERNEL_DISPATCH_BUFFERED,CODE_OBJECT
# taken from "Working Directory:"
pushd /home/user/rocm-systems/projects/rocprofiler-sdk/build-rocprofiler-sdk/tests/hip-graph-tracing
# taken from "Test command:" (and prefixed with `gdb --args` for debugging)
gdb --args /home/user/rocm-systems/projects/rocprofiler-sdk/build-rocprofiler-sdk/bin/hip-graph
```
If the test command uses [rocprofv3](./source/bin/rocprofv3.py), using debuggers such as `gdb` will require replacing prefixing with `gdb --args python3 /path/to/rocprofv3 ...`.
If rocprofv3 requires application replay, execute `set follow-fork-mode child` within the GDB command line prompt.
### Test Locations ###
* Integration tests are located in the top-level [tests](./tests) directory.
* Unit tests are located in a `tests` subdirectory of the units being tested.
* Samples are located in the top-level [samples](./samples) directory.
* Applications used for integration tests are located in the [tests/bin](./tests/bin) directory.
### Test Coding Style Guidelines ###
* Integration Test Applications ([tests/bin](./tests/bin))
* These applications are a common suite of applications which can be used by any integration test.
* These applications should, when possible, support command-line arguments to control the number of threads, streams, problem size, etc.
* It is highly recommended to make use of threads, streams, etc. in the applications... few real-world applications are single-threaded and use only the default HIP stream.
* Integration tests
* Should be composed of at least two tests: (1) an "execute" test which runs the profiler on the application and (2) a "validate" test
* Pay attention to the naming conventions of the folders, files, and test names
* The "validate" written in Python with PyTest, which validates the data collected during the "execute" phase. The Python script should be named `validate.py` and should be accompanied by a `conftest.py` and `pytest.ini`.
* The `validate.py` main should return the following: `return pytest.main(["-x", __file__] + sys.argv[1:])`
* In general, follow the same recipe as other integration tests
* Samples should be kept as simple as possible when possible: a `main.cpp` with a sample test application and a `client.cpp` which contains the tool built to demonstrate the functionality of the sample.
* Please use existing samples such as [samples/api_buffered_tracing](./samples/api_buffered_tracing/), [samples/api_callback_tracing](./samples/api_callback_tracing/), [samples/external_correlation_id_request](./samples/external_correlation_id_request/), and [samples/intercept_table](./samples/intercept_table/) as a guide.
* Unit tests should follow the standard recipe:
* Written with `GTest`
* The first parameter to `TEST(<group>, <name>)` or `TEST_F(<group>, <name>)` should either be the name of the file, e.g. `TEST(agent, <name>)` in `agent.cpp`, or the name of test executable.
* If the unit test is limit to a certain source file, e.g. `source/lib/common/utility.cpp`, then unit tests in the tests folder should be in a file by the same name, e.g. `source/lib/common/tests/utility.cpp`.
* All of the source files in a unit test folder should be compiled into one executable and CTests should be added via `gtest_add_tests(...)`
* It is permitted to deactive clang-tidy for unit tests via `rocprofiler_deactivate_clang_tidy()`
* The `add_subdirectory(tests)` in parent directory's `CMakeLists.txt` should be guarded with `if(ROCPROFILER_BUILD_TESTS)`
## Code License ##
All code contributed to this project will be licensed under the license identified in the [LICENSE.md](../LICENSE.md). Your contribution will be accepted under the same license.
All code contributed to this project will be licensed under the license identified in the [LICENSE.md](LICENSE.md). Your contribution will be accepted under the same license.
## Release Cadence ##
+61 -14
Просмотреть файл
@@ -13,47 +13,94 @@
include_guard(DIRECTORY)
include(rocprofiler_utilities)
if(ROCPROFILER_BUILD_DEVELOPER)
set(_FMT_REQUIRED REQUIRED)
else()
set(_FMT_REQUIRED)
endif()
if(NOT ROCPROFILER_CLANG_FORMAT_EXE AND EXISTS $ENV{HOME}/.local/bin/clang-format)
# checks that clang-format is version 11.x.x
function(_rocprofiler_check_clang_format_version _OUT _EXE)
execute_process(
COMMAND $ENV{HOME}/.local/bin/clang-format --version
COMMAND ${_EXE} --version
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
OUTPUT_VARIABLE _CLANG_FMT_OUT
RESULT_VARIABLE _CLANG_FMT_RET
OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET)
if(_CLANG_FMT_RET EQUAL 0)
if("${_CLANG_FMT_OUT}" MATCHES "version 11\\.([0-9]+)\\.([0-9]+)")
set(ROCPROFILER_CLANG_FORMAT_EXE
"$ENV{HOME}/.local/bin/clang-format"
CACHE FILEPATH "clang-format exe")
endif()
if(_CLANG_FMT_RET EQUAL 0 AND "${_CLANG_FMT_OUT}" MATCHES
"version 11\\.([0-9]+)\\.([0-9]+)")
set(${_OUT}
ON
PARENT_SCOPE)
else()
set(${_OUT}
OFF
PARENT_SCOPE)
endif()
endfunction()
_rocprofiler_get_python_user_bin(_PYTHON_USER_BIN)
if(NOT ROCPROFILER_CLANG_FORMAT_EXE
AND _PYTHON_USER_BIN
AND EXISTS "${_PYTHON_USER_BIN}/clang-format")
_rocprofiler_check_clang_format_version(_IS_VALID_CLANG_FMT
"${_PYTHON_USER_BIN}/clang-format")
if(_IS_VALID_CLANG_FMT)
set(ROCPROFILER_CLANG_FORMAT_EXE
"${_PYTHON_USER_BIN}/clang-format"
CACHE FILEPATH "clang-format exe")
endif()
endif()
if(NOT ROCPROFILER_CMAKE_FORMAT_EXE
AND _PYTHON_USER_BIN
AND EXISTS "${_PYTHON_USER_BIN}/cmake-format")
set(ROCPROFILER_CMAKE_FORMAT_EXE
"${_PYTHON_USER_BIN}/cmake-format"
CACHE FILEPATH "cmake-format exe")
endif()
if(NOT ROCPROFILER_BLACK_FORMAT_EXE
AND _PYTHON_USER_BIN
AND EXISTS "${_PYTHON_USER_BIN}/black")
set(ROCPROFILER_BLACK_FORMAT_EXE
"${_PYTHON_USER_BIN}/black"
CACHE FILEPATH "black exe")
endif()
find_program(
ROCPROFILER_CLANG_FORMAT_EXE ${_FMT_REQUIRED}
NAMES clang-format-11 clang-format-mp-11 clang-format
PATHS $ENV{HOME}/.local
HINTS $ENV{HOME}/.local
PATHS ${_PYTHON_USER_BIN}
HINTS ${_PYTHON_USER_BIN}
PATH_SUFFIXES bin)
find_program(
ROCPROFILER_CMAKE_FORMAT_EXE ${_FMT_REQUIRED}
NAMES cmake-format
PATHS $ENV{HOME}/.local
HINTS $ENV{HOME}/.local
PATHS ${_PYTHON_USER_BIN}
HINTS ${_PYTHON_USER_BIN}
PATH_SUFFIXES bin)
find_program(
ROCPROFILER_BLACK_FORMAT_EXE ${_FMT_REQUIRED}
NAMES black
PATHS $ENV{HOME}/.local
HINTS $ENV{HOME}/.local
PATHS ${_PYTHON_USER_BIN}
HINTS ${_PYTHON_USER_BIN}
PATH_SUFFIXES bin)
_rocprofiler_check_clang_format_version(_IS_VALID_CLANG_FMT
"${ROCPROFILER_CLANG_FORMAT_EXE}")
if(NOT _IS_VALID_CLANG_FMT)
if(ROCPROFILER_BUILD_DEVELOPER)
message(
AUTHOR_WARNING
"[rocprofiler] clang-format version 11 not found. Please see rocprofiler-sdk CONTRIBUTING.md for instructions on installing clang-format version 11."
)
endif()
unset(ROCPROFILER_CLANG_FORMAT_EXE CACHE)
endif()
add_custom_target(format-rocprofiler)
if(NOT TARGET format)
add_custom_target(format)
+43 -21
Просмотреть файл
@@ -1,49 +1,71 @@
include_guard(GLOBAL)
# ----------------------------------------------------------------------------------------#
#
# Clang Tidy
#
# ----------------------------------------------------------------------------------------#
include_guard(DIRECTORY)
include(rocprofiler_utilities)
if(ROCPROFILER_BUILD_DEVELOPER)
set(_TIDY_REQUIRED REQUIRED)
else()
set(_TIDY_REQUIRED)
endif()
if(NOT ROCPROFILER_CLANG_TIDY_EXE AND EXISTS $ENV{HOME}/.local/bin/clang-tidy)
# checks that clang-tidy is version >= 15.x.x and < 20.x.x
function(_rocprofiler_check_clang_tidy_version _OUT _EXE)
execute_process(
COMMAND $ENV{HOME}/.local/bin/clang-tidy --version
COMMAND ${_EXE} --version
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
OUTPUT_VARIABLE _CLANG_TIDY_OUT
RESULT_VARIABLE _CLANG_TIDY_RET
OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET)
if(_CLANG_TIDY_RET EQUAL 0 AND "${_CLANG_TIDY_OUT}" MATCHES
"version 1[5-9]\\.([0-9]+)\\.([0-9]+)")
set(${_OUT}
ON
PARENT_SCOPE)
else()
set(${_OUT}
OFF
PARENT_SCOPE)
endif()
endfunction()
if(_CLANG_TIDY_RET EQUAL 0)
if("${_CLANG_TIDY_OUT}" MATCHES "version 1[5-9]\\.([0-9]+)\\.([0-9]+)")
set(ROCPROFILER_CLANG_TIDY_EXE
"$ENV{HOME}/.local/bin/clang-tidy"
CACHE FILEPATH "clang-tidy exe")
endif()
_rocprofiler_get_python_user_bin(_PYTHON_USER_BIN)
if(NOT ROCPROFILER_CLANG_TIDY_EXE
AND _PYTHON_USER_BIN
AND EXISTS "${_PYTHON_USER_BIN}/clang-tidy")
_rocprofiler_check_clang_tidy_version(_IS_VALID_CLANG_TIDY
"${_PYTHON_USER_BIN}/clang-tidy")
if(_IS_VALID_CLANG_TIDY)
set(ROCPROFILER_CLANG_TIDY_EXE
"${_PYTHON_USER_BIN}/clang-tidy"
CACHE FILEPATH "clang-tidy exe")
endif()
endif()
find_program(
ROCPROFILER_CLANG_TIDY_EXE ${_TIDY_REQUIRED}
NAMES clang-tidy-18
clang-tidy-17
clang-tidy-16
clang-tidy-15
clang-tidy-14
clang-tidy-13
clang-tidy-12
clang-tidy-11
clang-tidy
PATHS $ENV{HOME}/.local
HINTS $ENV{HOME}/.local
NAMES clang-tidy-19 clang-tidy-18 clang-tidy-17 clang-tidy-16 clang-tidy-15 clang-tidy
PATHS ${_PYTHON_USER_BIN}
HINTS ${_PYTHON_USER_BIN}
PATH_SUFFIXES bin)
_rocprofiler_check_clang_tidy_version(_IS_VALID_CLANG_TIDY
"${ROCPROFILER_CLANG_TIDY_EXE}")
if(NOT _IS_VALID_CLANG_TIDY)
if(ROCPROFILER_BUILD_DEVELOPER)
message(
AUTHOR_WARNING
"[rocprofiler] clang-tidy version >= 15, < 20 not found. Please see rocprofiler-sdk CONTRIBUTING.md for instructions on installing clang-tidy"
)
endif()
unset(ROCPROFILER_CLANG_TIDY_EXE CACHE)
endif()
macro(ROCPROFILER_ACTIVATE_CLANG_TIDY)
if(ROCPROFILER_ENABLE_CLANG_TIDY)
if(NOT ROCPROFILER_CLANG_TIDY_EXE)
+30
Просмотреть файл
@@ -1036,4 +1036,34 @@ function(rocprofiler_install_env_setup_files)
COMPONENT ${RIEF_COMPONENT})
endfunction()
# ----------------------------------------------------------------------------
# gets the user local python bin directory from `python3 -m pip install --user ...`
#
function(_rocprofiler_get_python_user_bin _OUT)
find_package(Python3 QUIET)
# default to empty
set(_VAL)
if(Python3_FOUND)
execute_process(
COMMAND ${Python3_EXECUTABLE} -m site --user-base
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}
OUTPUT_VARIABLE _PYTHON_USER_BASE
RESULT_VARIABLE _PYTHON_USER_BASE_RET
OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET)
# if successful, check the bin dir
if(_PYTHON_USER_BASE_RET EQUAL 0)
set(_PYTHON_USER_BIN "${_PYTHON_USER_BASE}/bin")
if(EXISTS "${_PYTHON_USER_BIN}")
set(_VAL "${_PYTHON_USER_BIN}")
endif()
endif()
endif()
# return value
set(${_OUT}
"${_VAL}"
PARENT_SCOPE)
endfunction()
cmake_policy(POP)
+1 -1
Просмотреть файл
@@ -1,6 +1,6 @@
black
clang-format>=11.0.0,<12.0.0
clang-tidy>=15.0.0,<19.0.0
clang-tidy>=15.0.0,<20.0.0
cmake>=3.21.0
cmake-format
dataclasses
+4 -3
Просмотреть файл
@@ -257,9 +257,10 @@ query_available_agents(rocprofiler_agent_version_t /* version */,
if(agent->type != ROCPROFILER_AGENT_TYPE_GPU) continue;
auto parameters = std::vector<rocprofiler_thread_trace_parameter_t>{};
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_TARGET_CU, TARGET_CU});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_BUFFER_SIZE, BUFFER_SIZE});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_SHADER_ENGINE_MASK, SHADER_MASK});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_TARGET_CU, {TARGET_CU}});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_BUFFER_SIZE, {BUFFER_SIZE}});
parameters.push_back(
{ROCPROFILER_THREAD_TRACE_PARAMETER_SHADER_ENGINE_MASK, {SHADER_MASK}});
ROCPROFILER_CALL(
rocprofiler_configure_device_thread_trace_service(agent_ctx,
+3 -1
Просмотреть файл
@@ -122,7 +122,9 @@ set_env(std::string_view env_id, bool value, int override)
template <typename Tp>
int
set_env(std::string_view env_id, Tp value, int override)
set_env(std::string_view env_id,
Tp value, // NOLINT(performance-unnecessary-value-param)
int override)
{
auto str_value = std::stringstream{};
str_value << value;
+1 -1
Просмотреть файл
@@ -906,7 +906,7 @@ write_rocpd(
if(itr.kernel_id == 0 && itr.code_object_id == 0) continue;
auto json_data =
get_json_string([](auto& ar, const auto oitr) { cereal::save(ar, oitr); }, itr);
get_json_string([](auto& ar, const auto& oitr) { cereal::save(ar, oitr); }, itr);
auto stmt = get_insert_statement(
"rocpd_info_kernel_symbol{{uuid}}",
+3 -3
Просмотреть файл
@@ -186,8 +186,8 @@ output_keys(std::string _tag)
{
for(size_t i = 0; i < _cmdline.size(); ++i)
{
const auto _l = std::string{(i == 0) ? "" : "_"};
auto _v = _cmdline.at(i);
const auto _l = std::string{(i == 0) ? "" : "_"};
const auto& _v = _cmdline.at(i);
_argv_string += _l + _v;
if(i > 0)
{
@@ -237,7 +237,7 @@ output_keys(std::string _tag)
{
for(size_t i = 0; i < _cmdline.size(); ++i)
{
auto _v = _cmdline.at(i);
const auto& _v = _cmdline.at(i);
auto itr = output_key{fmt::format("arg{}", i), _v, fmt::format("Argument #{}", i)};
_options.emplace_back(fmt::format("%{}%", itr.key), itr.value, itr.description);
_options.emplace_back(
+17 -27
Просмотреть файл
@@ -529,42 +529,32 @@ PYBIND11_MODULE(libpyrocpd, pyrocpd)
// (1) the process with the earliest start time
// (2) find the process with the longest duration
uint64_t min_start_time = std::numeric_limits<uint64_t>::max();
uint64_t max_fini_time = 0;
uint64_t max_fini_time = std::numeric_limits<uint64_t>::min();
for(auto obj : {data.connection})
{
auto* conn = rocpd::interop::get_connection(std::move(obj));
// min start
sqlite3_stmt* _stmt_min_start;
sqlite3_prepare_v2(
conn, "SELECT MIN(start) FROM processes;", -1, &_stmt_min_start, nullptr);
uint64_t _min_start_time = std::numeric_limits<uint64_t>::max();
if(sqlite3_step(_stmt_min_start) == SQLITE_ROW)
sqlite3_stmt* _stmt_min_start_max_fini = nullptr;
uint64_t _min_start_time = std::numeric_limits<uint64_t>::max();
uint64_t _max_fini_time = std::numeric_limits<uint64_t>::min();
sqlite3_prepare_v2(conn,
"SELECT MIN(start), MAX(fini) FROM processes;",
-1,
&_stmt_min_start_max_fini,
nullptr);
if(sqlite3_step(_stmt_min_start_max_fini) == SQLITE_ROW)
{
_min_start_time =
static_cast<uint64_t>(sqlite3_column_int64(_stmt_min_start, 0));
static_cast<uint64_t>(sqlite3_column_int64(_stmt_min_start_max_fini, 0));
_max_fini_time =
static_cast<uint64_t>(sqlite3_column_int64(_stmt_min_start_max_fini, 1));
}
sqlite3_finalize(_stmt_min_start);
if(min_start_time > _min_start_time)
{
min_start_time = _min_start_time;
}
//// max fini
sqlite3_stmt* _stmt_max_fini;
sqlite3_prepare_v2(
conn, "SELECT MAX(fini) FROM processes;", -1, &_stmt_max_fini, nullptr);
uint64_t _max_fini_time = 0;
if(sqlite3_step(_stmt_max_fini) == SQLITE_ROW)
{
_max_fini_time = static_cast<uint64_t>(sqlite3_column_int64(_stmt_max_fini, 0));
}
sqlite3_finalize(_stmt_max_fini);
if(max_fini_time < _max_fini_time)
{
max_fini_time = _max_fini_time;
}
sqlite3_finalize(_stmt_min_start_max_fini);
min_start_time = std::min(min_start_time, _min_start_time);
max_fini_time = std::max(max_fini_time, _max_fini_time);
}
auto otf2_session =
+8 -8
Просмотреть файл
@@ -538,9 +538,9 @@ write_otf2(const OTF2Session& otf2_session,
get_hash_id(_name),
region_info{_name, OTF2_REGION_ROLE_DATA_TRANSFER, OTF2_PARADIGM_HIP});
auto _extended_agent = agent_data.at(itr.dst_agent_abs_index);
auto _agent_handle = _extended_agent.types_agent.id.handle;
auto _evt_info = event_info{location_base{
const auto& _extended_agent = agent_data.at(itr.dst_agent_abs_index);
auto _agent_handle = _extended_agent.types_agent.id.handle;
auto _evt_info = event_info{location_base{
process.pid, itr.tid, _agent_handle, ROCPROFILER_AGENT_MEMORY_COPY_TYPE}};
auto agent_index_info = _extended_agent.agent_index;
@@ -587,8 +587,8 @@ write_otf2(const OTF2Session& otf2_session,
get_hash_id(_alloc_operation),
region_info{_alloc_operation, OTF2_REGION_ROLE_ALLOCATE, OTF2_PARADIGM_HIP});
auto _extended_agent = agent_data.at(itr.agent_abs_index);
auto _handle = _extended_agent.types_agent.id.handle;
const auto& _extended_agent = agent_data.at(itr.agent_abs_index);
auto _handle = _extended_agent.types_agent.id.handle;
auto _evt_info = event_info{location_base{
process.pid, itr.tid, _handle, ROCPROFILER_AGENT_MEMORY_ALLOC_TYPE}};
@@ -672,9 +672,9 @@ write_otf2(const OTF2Session& otf2_session,
_attr_str.emplace(get_hash_id(_perfetto_name), _perfetto_name);
auto* _attrs = create_attribute_list_for_name(_perfetto_name);
auto _extended_agent = agent_data.at(itr.agent_abs_index);
auto _handle = _extended_agent.types_agent.id.handle;
auto agent_index_info = _extended_agent.agent_index;
const auto& _extended_agent = agent_data.at(itr.agent_abs_index);
auto _handle = _extended_agent.types_agent.id.handle;
auto agent_index_info = _extended_agent.agent_index;
auto _evt_info = event_info{location_base{
process.pid, itr.tid, _handle, ROCPROFILER_AGENT_DISPATCH_TYPE, itr.queue_id}};
+2 -2
Просмотреть файл
@@ -858,7 +858,7 @@ code_object_tracing_callback(rocprofiler_callback_tracing_record_t record,
[](auto& data_vec,
std::string file_name,
tool::rocprofiler_code_object_info_t* obj_data_v) {
data_vec.push_back({file_name,
data_vec.push_back({std::move(file_name),
obj_data_v->code_object_id,
obj_data_v->load_base,
obj_data_v->load_size});
@@ -900,7 +900,7 @@ code_object_tracing_callback(rocprofiler_callback_tracing_record_t record,
[](auto& data_vec,
std::string file_name,
tool::rocprofiler_code_object_info_t* obj_data_v) {
data_vec.push_back({file_name,
data_vec.push_back({std::move(file_name),
obj_data_v->code_object_id,
obj_data_v->load_base,
obj_data_v->load_size});
+1 -1
Просмотреть файл
@@ -153,7 +153,7 @@ parse_cpu_info()
return 0;
};
auto value = match.back();
const auto& value = match.back();
if(itr.find("vendor_id") == 0)
info_v.vendor_id = value;
+1 -1
Просмотреть файл
@@ -40,7 +40,7 @@ class consumer_thread_t
using consume_func_t = std::function<void(DataType&&)>;
public:
consumer_thread_t(consume_func_t func) { this->consume_fn = func; }
consumer_thread_t(consume_func_t func) { this->consume_fn = std::move(func); }
virtual ~consumer_thread_t() { exit(); }
void start()
+3 -1
Просмотреть файл
@@ -612,7 +612,9 @@ update_table(const context_array_t& ctxs, hsa_amd_tool_table_t* _orig)
template <size_t TableIdx, size_t... OpIdx>
void
update_table(context_array_t ctxs, hsa_amd_tool_table_t* _orig, std::index_sequence<OpIdx...>)
update_table(const context_array_t& ctxs,
hsa_amd_tool_table_t* _orig,
std::index_sequence<OpIdx...>)
{
static_assert(
std::is_same<hsa_amd_tool_table_t, typename hsa_table_lookup<TableIdx>::type>::value,
+2 -2
Просмотреть файл
@@ -67,7 +67,7 @@ set_tests_properties(
add_executable(pcs_bench_test)
target_compile_options(pcs_bench_test PRIVATE "-Ofast")
target_compile_options(pcs_bench_test PRIVATE "-O3" "-ffast-math")
target_sources(pcs_bench_test
PRIVATE ${ROCPROFILER_LIB_PC_SAMPLING_PARSER_BENCH_TEST_SOURCES})
target_include_directories(pcs_bench_test PRIVATE ${PCTEST_INCLUDE_DIR})
@@ -79,7 +79,7 @@ target_link_libraries(
GTest::gtest_main)
add_executable(pcs_thread_test)
target_compile_options(pcs_thread_test PRIVATE "-Ofast")
target_compile_options(pcs_thread_test PRIVATE "-O3" "-ffast-math")
target_sources(pcs_thread_test
PRIVATE ${ROCPROFILER_LIB_PC_SAMPLING_PARSER_MULTIGPU_TEST_SOURCES})
+2 -1
Просмотреть файл
@@ -30,7 +30,8 @@
/**
* Benchmarks how fast the parser can process samples on a single threaded case
* Current: 5600X with -Ofast, up to >140 million samples/s or ~9GB/s R/W (18GB/s bidirectional)
* Current: 5600X with -O3 -ffast-math, up to >140 million samples/s or ~9GB/s R/W (18GB/s
* bidirectional)
*/
template <typename PcSamplingRecordT>
static bool
+2 -1
Просмотреть файл
@@ -156,7 +156,8 @@ multithread_queue_hammer(size_t tid, Latch* latch)
/**
* Benchmarks how fast the parser can process samples on a single threaded case
* Current: 5600X with -Ofast, up to >140 million samples/s or ~9GB/s R/W (18GB/s bidirectional)
* Current: 5600X with -O3 -ffast-math, up to >140 million samples/s or ~9GB/s R/W (18GB/s
* bidirectional)
*/
template <typename PcSamplingRecordT>
static std::pair<size_t, size_t>
+2 -1
Просмотреть файл
@@ -292,7 +292,8 @@ query_available_agents(rocprofiler_agent_version_t /* version */,
att_param.type = ROCPROFILER_THREAD_TRACE_PARAMETER_PERFCOUNTER;
att_param.simd_mask = 0xF;
for(auto& metric : metrics)
if(metric.name() == "SQ_WAVES") rocprofiler_counter_id_t{.handle = metric.id()};
if(metric.name() == "SQ_WAVES")
att_param.counter_id = rocprofiler_counter_id_t{.handle = metric.id()};
params.push_back(att_param);
}
+5 -5
Просмотреть файл
@@ -107,10 +107,10 @@ query_available_agents(rocprofiler_agent_version_t /* version */,
static uint64_t buffer_size_mb = (var ? atoi(var) : 96) * 1024ul * 1024ul;
std::vector<rocprofiler_thread_trace_parameter_t> parameters;
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_TARGET_CU, 1});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_SIMD_SELECT, 0xF});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_BUFFER_SIZE, buffer_size_mb});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_SHADER_ENGINE_MASK, 0x1});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_TARGET_CU, {1}});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_SIMD_SELECT, {0xF}});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_BUFFER_SIZE, {buffer_size_mb}});
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_SHADER_ENGINE_MASK, {0x1}});
static const bool extra_args =
std::getenv("ATT_NODETAIL") ? std::stoi(std::getenv("ATT_NODETAIL")) != 0 : false;
@@ -118,7 +118,7 @@ query_available_agents(rocprofiler_agent_version_t /* version */,
{
// Dont generate instruction profiling, only occupancy and shaderdata
parameters.emplace_back(rocprofiler_thread_trace_parameter_t{
ROCPROFILER_THREAD_TRACE_PARAMETER_NO_DETAIL, 1});
ROCPROFILER_THREAD_TRACE_PARAMETER_NO_DETAIL, {1}});
}
ROCPROFILER_CALL(
+2 -3
Просмотреть файл
@@ -27,8 +27,7 @@
#include "trace_callbacks.hpp"
constexpr double WAVE_RATIO_TOLERANCE = 0.05;
constexpr size_t NUM_KERNELS = 5;
constexpr size_t NUM_KERNELS = 5;
namespace ATTTest
{
@@ -73,7 +72,7 @@ tool_init(rocprofiler_client_finalize_t /* fini_func */, void* /* tool_data */)
"code object tracing service configure");
std::vector<rocprofiler_thread_trace_parameter_t> params{};
params.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_SERIALIZE_ALL, 1});
params.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_SERIALIZE_ALL, {1}});
std::vector<rocprofiler_agent_id_t> agents{};
-2
Просмотреть файл
@@ -27,8 +27,6 @@
#include "trace_callbacks.hpp"
constexpr double WAVE_RATIO_TOLERANCE = 0.05;
namespace ATTTest
{
namespace Single