SWDEV-408607: Removed MPi message. Aded changelog and readme
Change-Id: I31efaf53ce4bf1b25c2bd94197a0b41bff84b0ff
[ROCm/rocprofiler commit: 02fbd5887b]
This commit is contained in:
@@ -229,24 +229,17 @@ Rocprofiler for ROCm 5.7 added support for counter collection (PMC) and advanced
|
||||
- Removing Replay modes as we discovered that some of them will corrupt the applications' behavior, we will re-add them once we implement the fix for them.
|
||||
|
||||
### Optimized
|
||||
- ATT json filesizes
|
||||
- Improved ATT parser performance and filesizes.
|
||||
- Now profiler autocorrects user input errors for pmc and throws exception for wrong input with this message:"Bad input metric. usage --> pmc: [counter1] [counter2]"
|
||||
|
||||
### Added
|
||||
- Every API trace in V2 reported synchronously will have two records, one for Enter phase and for Exit phase
|
||||
- File Plugin now reports the HSA OPS operation kind as part of the output text
|
||||
- MI300 counters support for rocprof v1 and v2.
|
||||
- Limiting file name sizes for ATT plugin.
|
||||
- Support for MI300 XCC modes for rocprof v2.
|
||||
- MI300 individual XCC counters dumped per-xcc as separate records but with same record-id and kernel dispatch info
|
||||
- Naming for MPI ranks. Filenames containing "%rank" are replaced by variables "MPI_RANK", "OMPI_COMM_WORLD_RANK" or "MV2_COMM_WORLD_RANK".
|
||||
- MPI Rank will appear in perfetto track names.
|
||||
- SE_MASK parameter in ATT, a binary mask specifying for which shader engines to run ATT.
|
||||
On GFX9, SEs are masked out completely. On Navi only part of the data is masked.
|
||||
The use of SE_MASK=0x1 is heavily encouraged to avoid packet lost events.
|
||||
- "--mode file" option in ATT, which allows for parsed files to be stored. Run python3 httpserver.py from within ./UI/ to view files locally.
|
||||
- "ROCPROFILER_MAX_ATT_PROFILES" environment variable can be set. Previously fixed at 16, now the default is 1.
|
||||
- Increased ATT buffer size per collection to 1GB.
|
||||
- File plugin is splitted to File & CLI plugins, CLI plugin is responsible for showing results on the terminal screen and will be automatically the choice if no -d option given in rocprof, File plugin on the other hand is responsible for writing the output results in files if -d option is given.
|
||||
- Structure of the results is different for both CLI & File plugin; File plugin will make sure every type of result is in a separate file, starting by specifying the header; CLI plugin will have the records in the old way.
|
||||
Example for file plugin output:
|
||||
@@ -265,6 +258,14 @@ Example for file plugin output:
|
||||
- Removing Record IDs from tracer records in CLI plugin.
|
||||
- Added Flush Interval and Trace Period functionality, where --flush-interval <time_in_ms>, for flushing the buffers every given interval by the user, and --trace-period <delay>:<trace_time>:<interval>, where delay is the time to wait before starting session, trace_time is the time between every start and stop session and interval the time between two consecutive sessions (ommiting interval = infinite). For more details please refer to the ROCProfV2 tool usage document.
|
||||
- Added requirements.txt to be used to install all the necessary python3 packages.
|
||||
- ATT plugin:
|
||||
- Added --mode, --mpi and --depth parameters.
|
||||
- Limiting file name sizes for large kernels.
|
||||
- SE_MASK parameter for input.txt, a binary mask specifying for which shader engines to collect from.
|
||||
On GFX9, SEs are masked out completely. On Navi only part of the data is masked.
|
||||
The use of SE_MASK=0x1 is heavily encouraged to avoid packet lost events.
|
||||
- "ROCPROFILER_MAX_ATT_PROFILES" environment variable can be set. Previously fixed at 16, now the default is 1.
|
||||
- Increased ATT buffer size per collection to 1GB.
|
||||
|
||||
### Fixed
|
||||
- Samples are fixed to show the new usage of phases.
|
||||
|
||||
@@ -234,20 +234,25 @@ The user has two options for building:
|
||||
see Plugin Support section for installation
|
||||
|
||||
# 3. Run the following to view the trace
|
||||
rocprofv2 --plugin att <app_relative_path_assembly_file> --mode <network, file, off> -i input.txt <app_relative_path>
|
||||
rocprofv2 -i input.txt --plugin att <app_relative_path_assembly_file> --mode [network, file, off] <app_relative_path>
|
||||
|
||||
# app_assembly_file_relative_path is the assembly file with .s extension generated in 1st step
|
||||
# app_relative_path is the path for the application binary
|
||||
# Mode:
|
||||
# - Network: opens the server with the browser UI.
|
||||
# att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") option to change these.
|
||||
# In case the browser is running on a different machine, port forwarding can be done with ssh -L 8000:localhost:8000 <user@IP>.
|
||||
# - File: dumps the json files to disk, it can be used to quickly verify if there is anything wrong with the data.
|
||||
# Run python3 httpserver.py from within the generated ui/ folder to view the trace. The folder can be copied to another machine, and will run without rocm.
|
||||
# - Off runs collection but not analysis/parsing. So it can be later viewed another time and/or system.
|
||||
# Parameters:
|
||||
# --mode <mode>:
|
||||
# - network: opens the server with the browser UI.
|
||||
# att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") option to change these.
|
||||
# In case the browser is running on a different machine, port forwarding can be done with ssh -L 8000:localhost:8000 <user@IP>.
|
||||
# - file: dumps the json files to disk, it can be used to quickly verify if there is anything wrong with the data.
|
||||
# Run python3 httpserver.py from within the generated ui/ folder to view the trace. The folder can be copied to another machine, and will run without rocm.
|
||||
# - off runs collection but not analysis/parsing. So it can be later viewed another time and/or system.
|
||||
# --depth <n>: How many waves per slot to parse (maximum).
|
||||
# --mpi <nproc>: Parse with this many mpi processes, for performance improvements. Requires mpi4py.
|
||||
# --att_kernel "filename": Kernel filename to use (instead of ATT asking which one to use).
|
||||
# --trace_file "files": glob (wildcards allowed) of traces files to parse.
|
||||
# input.txt gives flexibility to to target the compute unit and provide filters.
|
||||
# input.txt contents:
|
||||
# TARGET_CU=1 // or some other CU [0,15] - WGP for Navi
|
||||
# att: TARGET_CU=1 // or some other CU [0,15] - WGP for Navi
|
||||
# SE_MASK=0x1 // bitmask of shader engines. The fewer, the easier on the hardware. Default enables all shader engines.
|
||||
# SIMD_MASK=0xF // There are four SIMDs. GFX9: bitmask of SIMDs. Navi: SIMD Index [0-3].
|
||||
# PERFCOUNTERS_COL_PERIOD=0x3 // Multiplier period for counter collection [0~31]. GFX9 only.
|
||||
|
||||
@@ -18,8 +18,9 @@ import gc
|
||||
|
||||
try:
|
||||
from mpi4py import MPI
|
||||
MPI_IMPORTED = True
|
||||
except:
|
||||
pass
|
||||
MPI_IMPORTED = False
|
||||
|
||||
class PerfEvent(ctypes.Structure):
|
||||
_fields_ = [
|
||||
@@ -330,15 +331,16 @@ def apply_min_event(min_event_time, OCCUPANCY, EVENTS, DBFILES, TIMELINES):
|
||||
if __name__ == "__main__":
|
||||
comm = None
|
||||
mpi_root = True
|
||||
try:
|
||||
comm = MPI.COMM_WORLD
|
||||
if comm.Get_size() < 2:
|
||||
if MPI_IMPORTED:
|
||||
try:
|
||||
comm = MPI.COMM_WORLD
|
||||
if comm.Get_size() < 2:
|
||||
comm = None
|
||||
else:
|
||||
mpi_root = comm.Get_rank() == 0
|
||||
except:
|
||||
print('Could not load MPI')
|
||||
comm = None
|
||||
else:
|
||||
mpi_root = comm.Get_rank() == 0
|
||||
except:
|
||||
print('Could not load MPI')
|
||||
comm = None
|
||||
|
||||
pathenv = os.getenv('OUTPUT_PATH')
|
||||
if pathenv is None:
|
||||
|
||||
Referens i nytt ärende
Block a user