Adjusted the regex to filter out new "PAGE*" domains added by the
SDK. This was causing the passing regex to fail.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
This patch uses udmabuf driver to allocate system memory instead of using amdgpu
driver for APU. With this function app can account its consumed system memory by
cgroup mechanism. This function is enabled by env variable HSA_USE_UDMABUF.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
* Add `rocpd` choice for `--format-rocprof-output` option
* Add rocpd_data.py which defines SQL queries to extract data from rocpd database
* Use sqlite3 package to read the database
* Add `--retain-rocpd-output` option in profile mode to retain raw
rocpd database
* Add warning notice to say `--format-rocprof-output rocpd` will be
default in future release
For rocpd output:
* Use only `pmc_perf.csv` instead of reading individual coll_level results csv files
* Post process csv files using pandas in analysis mode instead of profile mode
* Use ACCUM counters instead of SQ_ACCUM_PREV_HIRES
* Add test cases for rocpd output format
* Fix code formatting issues
* Update CHANGELOG
* Show description of metrics during analysis
* Use --include-cols Description show the Description column in analyze mode (this is hidden by default)
* Remove tips field from analysis config
* Align metric names in analysis config and documentation
* Add unified config utils/unified_config.yaml
* Add python script utils/split_config.py to auto generate analysis configuration and documentation metrics description
* Add test case to ensure unified config is older than auto-generated config
* Auto generate analysis config and documentation metrics description
* Update CONTRIBUTING.md to add instructions to build documentation assets
* Add docker image and compose file to build documentation
* Update CHANGELOG and Documentation
* Use jinja template instead of hardcoding metric tables in documentation
The ape1_size_ member was leftover after the removal
of KV and is no longer used.
Remove it to remove some compiler warnings.
Signed-off-by: Tony Gutierrez <anthony.gutierrez@amd.com>
* adding summary.py to generate tmp <category_region>_summary views
* migrating CSV summary to SDK method of writing CSVs
- Add domain_view to summary.py
- omit the C++ code of writing CSV because it gets revered later anyway
* Add summary subparser and write_sql_view_to_csv function
* adding all <>_summary views generation to summary.py
* add summary_per_rank feature
* add --summary-per-rank
* reconstruct generate_summary_view and create_domain_view
-introduce by_rank
* remove sqr and variance in summary views
* use RocpdImportData instead of connection
* two fixes on summary.py
--modify the generate_summary_view function to return a tuple with view name and sql code
add if_not_exits parameter to generete_summary_view
* Refactor summary.py to allow output path and filename args, and apply time_window
- clean up summary table column headers
- only generate by-rank views if that param is specified
* Add ProcessID to Hostname output and csv, so users can identify the system in the by-rank summaries
* Summary.py, just add hostname to by-rank summaries, instead of creating mapping table
* Summary - migrate csv writer to pandas, for more future flexibility
* Adding a few simple tests for summary.py
* Linting fixes
* add region_categories to summary options
- Automatically retrieve region categories from the database if argument is None
* add backticks for view_names
* fix tests after rebase
* Made code review changes
- fixed whitespace in CMakelists.txt
- adding query.py module & subparser in __main__.py
- refactor summary function to return query
- used query.py to output csv
- used query.py to also output summary to console
- provided new command line options to select summary output to csv or console
* Made fix to jinja template in query.py, as suggested by copilot
* Consolidated output calls to query in export_view function based on feedback
- refactored: helpers, query functions, create view functions
- extended formats to include what query supports (md, html, pdf, json)
- added json format to query, and changed orient=records
- adding jinja2 and reportlab to requirements.txt
* Add version_info for rocpd and roctx
* Add rocpd commandline tool
* Add executable permissions to source/bin/rocpd.py
* Removed rocpd2query, and cleaned up --help examples
---------
Co-authored-by: acanadas <acanadas@amd.com>
Co-authored-by: Jin Tao <jintao12@amd.com>
Co-authored-by: a-canadasruiz <Araceli.CanadasRuiz@amd.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
* Do not force unsupported metrics to be specified in older gpu
architectures as None
* Remove metrics which are explicitly set to None
* Update CHANGELOG
* Fix analysis configuration to fix baseline comparisons across all gpu
architectures
* Add missing 1812 section for gfx908
* Add missing 1812 section for gfx90a
* Baseline comparision will only show common metrics
* First workload will be used to set Metric ID index column
Modify agent initialization to support different driver types,
to enable KFD_VIRTIO dirver for CPU and GPU agent here.
1. Add driver_type parameter to CpuAgent and GpuAgent constructors
2. Update topology discovery to handle multiple driver types
3. Fix MakeMemoryResident return value check in VirtioDriver
4. Add helper function IsGPUDriver to check driver types
5. Update agent discovery to iterate through all available drivers
This change makes the runtime more flexible by removing hardcoded KFD
driver assumptions and properly handling different driver backends.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
This commit adds virtio driver support to the ROCm runtime by:
1. Implementing KfdVirtioDriver class that inherits from core::Driver
2. Adding KFD_VIRTIO to DriverType enum
3. Registering virtio driver discovery function in topology
4. Adding virtio driver source files to CMake build
The virtio driver implementation provides basic memory management and
queue operations for virtualized GPU environments. Some advanced features
like PC sampling and SMI are currently not supported.
Key changes:
- Add new files: amd_kfd_virtio_driver.h/cpp
- Update CMakeLists.txt to include virtio driver
- Add VIRTIO to DriverType enum in driver.h
- Register virtio driver in amd_topology.cpp
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
This patch adds VirtIO support to the libhsakmt library, enabling communication
with AMD GPUs via VirtIO.
Details
- CMakeLists.txt: Added a new CMakeLists.txt file for the VirtIO component
of libhsakmt.
- hsakmt_virtio.c/h: Implemented the core VirtIO functionality, including
VirtIO GPU device initialization, command execution, and memory management.
- virtio_gpu.c/h: Contains the implementation of the VirtIO GPU device,
including ioctl handling, shared memory management, and command execution.
- hsakmt_virtio_events.c: Implements event handling for VirtIO, such as event
creation, destruction, setting, resetting, and querying event states.
- hsakmt_virtio_memory.c: Manages memory operations for VirtIO, including memory
allocation, freeing, mapping, and unmapping.
- hsakmt_virtio_queues.c: Implements queue management for VirtIO, including
queue creation, destruction, and updating.
- hsakmt_virtio_topology.c: Handles system and node properties for VirtIO.
- hsakmt_virtio_vm.c: Manages VM-related operations for VirtIO, such as
reserving and dereserving VA space.
- include/linux/virtgpu_drm.h: Contains DRM definitions for VirtIO GPU.
Key Features
- VirtIO GPU Initialization: The library can now initialize a VirtIO GPU device
and communicate with it.
- Command Execution: Supports executing commands on the VirtIO GPU device.
- Memory Management: Provides functions for allocating, freeing, mapping, and
unmapping memory for VirtIO operations.
- Event Handling: Implements a comprehensive event system for VirtIO.
- Queue Management: Allows for creating, destroying, and updating queues
on the VirtIO GPU device.
- System and Node Properties: Retrieves and manages system and node
properties for VirtIO.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
- Corelate memory_copy and kernel_dispatch events with their HIP stream_id and add stream_id as an annotation in Perfetto.
- By default, group memory_copy and kernel_dispatch events in Perfetto output by their stream_id.
- Add option, with the configuration setting ROCPROFSYS_ROCM_GROUP_BY_QUEUE, to group by HSA queue instead.
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* Update formatting.yml
Changed runners to ubuntu-latest instead of AMD-ROCm-Internal-dev1
* Updated rocprofiler-sdk formatting workflow
* Added Sparse Checkouts
* Run in folder
* Check the WD
* List all files
* Added Rocprofiler-register
* Removed working dirs and rocprofiler-register
---------
Co-authored-by: Sivasuntharampillai, Haresh <Haresh.Sivasuntharampillai+amdeng@amd.com>
- Fix context tracing domain bitset overflow
- Previous behavior would enable all flags above ROCPROFILER_BUFFER_TRACING_MARKER_CORE_RANGE_API when this domain was enabled.
* Fix to find MPI symbols from undefined symbols
* Moved condition checks before
* Fixing format
---------
Co-authored-by: Anuj Shukla <anujshuk@amd.com>
* Adding test and samples to decoder
* Fix sample
* Formatting
* Fix multi test
* Disable sample
* Fix tests
* Format
* Version fix
* Locking the decoder
* Add atomic
* Review comments
* Format
* Adding readme
* merge conflict and adding PCS+ATT test
* Review comments
* Properly disable PCS test
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
* Adding back env var test
* Name fix
* Preload sample
* Addressing review comments
* Update docs
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>