SWDEV-408607: Removed MPi message. Aded changelog and readme

Change-Id: I31efaf53ce4bf1b25c2bd94197a0b41bff84b0ff


[ROCm/rocprofiler commit: 02fbd5887b]
This commit is contained in:
Giovanni LB
2023-06-30 21:50:49 -03:00
förälder 5f784afbe5
incheckning 9be8bcc9d4
3 ändrade filer med 34 tillägg och 26 borttagningar
+9 -8
Visa fil
@@ -229,24 +229,17 @@ Rocprofiler for ROCm 5.7 added support for counter collection (PMC) and advanced
- Removing Replay modes as we discovered that some of them will corrupt the applications' behavior, we will re-add them once we implement the fix for them.
### Optimized
- ATT json filesizes
- Improved ATT parser performance and filesizes.
- Now profiler autocorrects user input errors for pmc and throws exception for wrong input with this message:"Bad input metric. usage --> pmc: [counter1] [counter2]"
### Added
- Every API trace in V2 reported synchronously will have two records, one for Enter phase and for Exit phase
- File Plugin now reports the HSA OPS operation kind as part of the output text
- MI300 counters support for rocprof v1 and v2.
- Limiting file name sizes for ATT plugin.
- Support for MI300 XCC modes for rocprof v2.
- MI300 individual XCC counters dumped per-xcc as separate records but with same record-id and kernel dispatch info
- Naming for MPI ranks. Filenames containing "%rank" are replaced by variables "MPI_RANK", "OMPI_COMM_WORLD_RANK" or "MV2_COMM_WORLD_RANK".
- MPI Rank will appear in perfetto track names.
- SE_MASK parameter in ATT, a binary mask specifying for which shader engines to run ATT.
On GFX9, SEs are masked out completely. On Navi only part of the data is masked.
The use of SE_MASK=0x1 is heavily encouraged to avoid packet lost events.
- "--mode file" option in ATT, which allows for parsed files to be stored. Run python3 httpserver.py from within ./UI/ to view files locally.
- "ROCPROFILER_MAX_ATT_PROFILES" environment variable can be set. Previously fixed at 16, now the default is 1.
- Increased ATT buffer size per collection to 1GB.
- File plugin is splitted to File & CLI plugins, CLI plugin is responsible for showing results on the terminal screen and will be automatically the choice if no -d option given in rocprof, File plugin on the other hand is responsible for writing the output results in files if -d option is given.
- Structure of the results is different for both CLI & File plugin; File plugin will make sure every type of result is in a separate file, starting by specifying the header; CLI plugin will have the records in the old way.
Example for file plugin output:
@@ -265,6 +258,14 @@ Example for file plugin output:
- Removing Record IDs from tracer records in CLI plugin.
- Added Flush Interval and Trace Period functionality, where --flush-interval <time_in_ms>, for flushing the buffers every given interval by the user, and --trace-period <delay>:<trace_time>:<interval>, where delay is the time to wait before starting session, trace_time is the time between every start and stop session and interval the time between two consecutive sessions (ommiting interval = infinite). For more details please refer to the ROCProfV2 tool usage document.
- Added requirements.txt to be used to install all the necessary python3 packages.
- ATT plugin:
- Added --mode, --mpi and --depth parameters.
- Limiting file name sizes for large kernels.
- SE_MASK parameter for input.txt, a binary mask specifying for which shader engines to collect from.
On GFX9, SEs are masked out completely. On Navi only part of the data is masked.
The use of SE_MASK=0x1 is heavily encouraged to avoid packet lost events.
- "ROCPROFILER_MAX_ATT_PROFILES" environment variable can be set. Previously fixed at 16, now the default is 1.
- Increased ATT buffer size per collection to 1GB.
### Fixed
- Samples are fixed to show the new usage of phases.
+14 -9
Visa fil
@@ -234,20 +234,25 @@ The user has two options for building:
see Plugin Support section for installation
# 3. Run the following to view the trace
rocprofv2 --plugin att <app_relative_path_assembly_file> --mode <network, file, off> -i input.txt <app_relative_path>
rocprofv2 -i input.txt --plugin att <app_relative_path_assembly_file> --mode [network, file, off] <app_relative_path>
# app_assembly_file_relative_path is the assembly file with .s extension generated in 1st step
# app_relative_path is the path for the application binary
# Mode:
# - Network: opens the server with the browser UI.
# att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") option to change these.
# In case the browser is running on a different machine, port forwarding can be done with ssh -L 8000:localhost:8000 <user@IP>.
# - File: dumps the json files to disk, it can be used to quickly verify if there is anything wrong with the data.
# Run python3 httpserver.py from within the generated ui/ folder to view the trace. The folder can be copied to another machine, and will run without rocm.
# - Off runs collection but not analysis/parsing. So it can be later viewed another time and/or system.
# Parameters:
# --mode <mode>:
# - network: opens the server with the browser UI.
# att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") option to change these.
# In case the browser is running on a different machine, port forwarding can be done with ssh -L 8000:localhost:8000 <user@IP>.
# - file: dumps the json files to disk, it can be used to quickly verify if there is anything wrong with the data.
# Run python3 httpserver.py from within the generated ui/ folder to view the trace. The folder can be copied to another machine, and will run without rocm.
# - off runs collection but not analysis/parsing. So it can be later viewed another time and/or system.
# --depth <n>: How many waves per slot to parse (maximum).
# --mpi <nproc>: Parse with this many mpi processes, for performance improvements. Requires mpi4py.
# --att_kernel "filename": Kernel filename to use (instead of ATT asking which one to use).
# --trace_file "files": glob (wildcards allowed) of traces files to parse.
# input.txt gives flexibility to to target the compute unit and provide filters.
# input.txt contents:
# TARGET_CU=1 // or some other CU [0,15] - WGP for Navi
# att: TARGET_CU=1 // or some other CU [0,15] - WGP for Navi
# SE_MASK=0x1 // bitmask of shader engines. The fewer, the easier on the hardware. Default enables all shader engines.
# SIMD_MASK=0xF // There are four SIMDs. GFX9: bitmask of SIMDs. Navi: SIMD Index [0-3].
# PERFCOUNTERS_COL_PERIOD=0x3 // Multiplier period for counter collection [0~31]. GFX9 only.
+11 -9
Visa fil
@@ -18,8 +18,9 @@ import gc
try:
from mpi4py import MPI
MPI_IMPORTED = True
except:
pass
MPI_IMPORTED = False
class PerfEvent(ctypes.Structure):
_fields_ = [
@@ -330,15 +331,16 @@ def apply_min_event(min_event_time, OCCUPANCY, EVENTS, DBFILES, TIMELINES):
if __name__ == "__main__":
comm = None
mpi_root = True
try:
comm = MPI.COMM_WORLD
if comm.Get_size() < 2:
if MPI_IMPORTED:
try:
comm = MPI.COMM_WORLD
if comm.Get_size() < 2:
comm = None
else:
mpi_root = comm.Get_rank() == 0
except:
print('Could not load MPI')
comm = None
else:
mpi_root = comm.Get_rank() == 0
except:
print('Could not load MPI')
comm = None
pathenv = os.getenv('OUTPUT_PATH')
if pathenv is None: