diff --git a/projects/rocprofiler/CHANGELOG.md b/projects/rocprofiler/CHANGELOG.md index 5cc5ad47fd..ae56ae5645 100644 --- a/projects/rocprofiler/CHANGELOG.md +++ b/projects/rocprofiler/CHANGELOG.md @@ -229,24 +229,17 @@ Rocprofiler for ROCm 5.7 added support for counter collection (PMC) and advanced - Removing Replay modes as we discovered that some of them will corrupt the applications' behavior, we will re-add them once we implement the fix for them. ### Optimized -- ATT json filesizes +- Improved ATT parser performance and filesizes. - Now profiler autocorrects user input errors for pmc and throws exception for wrong input with this message:"Bad input metric. usage --> pmc: [counter1] [counter2]" ### Added - Every API trace in V2 reported synchronously will have two records, one for Enter phase and for Exit phase - File Plugin now reports the HSA OPS operation kind as part of the output text - MI300 counters support for rocprof v1 and v2. -- Limiting file name sizes for ATT plugin. - Support for MI300 XCC modes for rocprof v2. - MI300 individual XCC counters dumped per-xcc as separate records but with same record-id and kernel dispatch info - Naming for MPI ranks. Filenames containing "%rank" are replaced by variables "MPI_RANK", "OMPI_COMM_WORLD_RANK" or "MV2_COMM_WORLD_RANK". - MPI Rank will appear in perfetto track names. -- SE_MASK parameter in ATT, a binary mask specifying for which shader engines to run ATT. - On GFX9, SEs are masked out completely. On Navi only part of the data is masked. - The use of SE_MASK=0x1 is heavily encouraged to avoid packet lost events. -- "--mode file" option in ATT, which allows for parsed files to be stored. Run python3 httpserver.py from within ./UI/ to view files locally. -- "ROCPROFILER_MAX_ATT_PROFILES" environment variable can be set. Previously fixed at 16, now the default is 1. -- Increased ATT buffer size per collection to 1GB. - File plugin is splitted to File & CLI plugins, CLI plugin is responsible for showing results on the terminal screen and will be automatically the choice if no -d option given in rocprof, File plugin on the other hand is responsible for writing the output results in files if -d option is given. - Structure of the results is different for both CLI & File plugin; File plugin will make sure every type of result is in a separate file, starting by specifying the header; CLI plugin will have the records in the old way. Example for file plugin output: @@ -265,6 +258,14 @@ Example for file plugin output: - Removing Record IDs from tracer records in CLI plugin. - Added Flush Interval and Trace Period functionality, where --flush-interval , for flushing the buffers every given interval by the user, and --trace-period ::, where delay is the time to wait before starting session, trace_time is the time between every start and stop session and interval the time between two consecutive sessions (ommiting interval = infinite). For more details please refer to the ROCProfV2 tool usage document. - Added requirements.txt to be used to install all the necessary python3 packages. +- ATT plugin: + - Added --mode, --mpi and --depth parameters. + - Limiting file name sizes for large kernels. + - SE_MASK parameter for input.txt, a binary mask specifying for which shader engines to collect from. + On GFX9, SEs are masked out completely. On Navi only part of the data is masked. + The use of SE_MASK=0x1 is heavily encouraged to avoid packet lost events. + - "ROCPROFILER_MAX_ATT_PROFILES" environment variable can be set. Previously fixed at 16, now the default is 1. + - Increased ATT buffer size per collection to 1GB. ### Fixed - Samples are fixed to show the new usage of phases. diff --git a/projects/rocprofiler/README.md b/projects/rocprofiler/README.md index bb8196c24c..867a5ee168 100644 --- a/projects/rocprofiler/README.md +++ b/projects/rocprofiler/README.md @@ -234,20 +234,25 @@ The user has two options for building: see Plugin Support section for installation # 3. Run the following to view the trace - rocprofv2 --plugin att --mode -i input.txt + rocprofv2 -i input.txt --plugin att --mode [network, file, off] # app_assembly_file_relative_path is the assembly file with .s extension generated in 1st step # app_relative_path is the path for the application binary - # Mode: - # - Network: opens the server with the browser UI. - # att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") option to change these. - # In case the browser is running on a different machine, port forwarding can be done with ssh -L 8000:localhost:8000 . - # - File: dumps the json files to disk, it can be used to quickly verify if there is anything wrong with the data. - # Run python3 httpserver.py from within the generated ui/ folder to view the trace. The folder can be copied to another machine, and will run without rocm. - # - Off runs collection but not analysis/parsing. So it can be later viewed another time and/or system. + # Parameters: + # --mode : + # - network: opens the server with the browser UI. + # att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") option to change these. + # In case the browser is running on a different machine, port forwarding can be done with ssh -L 8000:localhost:8000 . + # - file: dumps the json files to disk, it can be used to quickly verify if there is anything wrong with the data. + # Run python3 httpserver.py from within the generated ui/ folder to view the trace. The folder can be copied to another machine, and will run without rocm. + # - off runs collection but not analysis/parsing. So it can be later viewed another time and/or system. + # --depth : How many waves per slot to parse (maximum). + # --mpi : Parse with this many mpi processes, for performance improvements. Requires mpi4py. + # --att_kernel "filename": Kernel filename to use (instead of ATT asking which one to use). + # --trace_file "files": glob (wildcards allowed) of traces files to parse. # input.txt gives flexibility to to target the compute unit and provide filters. # input.txt contents: - # TARGET_CU=1 // or some other CU [0,15] - WGP for Navi + # att: TARGET_CU=1 // or some other CU [0,15] - WGP for Navi # SE_MASK=0x1 // bitmask of shader engines. The fewer, the easier on the hardware. Default enables all shader engines. # SIMD_MASK=0xF // There are four SIMDs. GFX9: bitmask of SIMDs. Navi: SIMD Index [0-3]. # PERFCOUNTERS_COL_PERIOD=0x3 // Multiplier period for counter collection [0~31]. GFX9 only. diff --git a/projects/rocprofiler/plugin/att/att.py b/projects/rocprofiler/plugin/att/att.py index 087c944252..70f8c4f4c0 100755 --- a/projects/rocprofiler/plugin/att/att.py +++ b/projects/rocprofiler/plugin/att/att.py @@ -18,8 +18,9 @@ import gc try: from mpi4py import MPI + MPI_IMPORTED = True except: - pass + MPI_IMPORTED = False class PerfEvent(ctypes.Structure): _fields_ = [ @@ -330,15 +331,16 @@ def apply_min_event(min_event_time, OCCUPANCY, EVENTS, DBFILES, TIMELINES): if __name__ == "__main__": comm = None mpi_root = True - try: - comm = MPI.COMM_WORLD - if comm.Get_size() < 2: + if MPI_IMPORTED: + try: + comm = MPI.COMM_WORLD + if comm.Get_size() < 2: + comm = None + else: + mpi_root = comm.Get_rank() == 0 + except: + print('Could not load MPI') comm = None - else: - mpi_root = comm.Get_rank() == 0 - except: - print('Could not load MPI') - comm = None pathenv = os.getenv('OUTPUT_PATH') if pathenv is None: