[RcclReplayer] JSON <-> BIN log format conversion tool (#2056)
* Add replay log format converter * Add Log Sanitizer * Add no timestamp option (nts) to sanitizer
Этот коммит содержится в:
коммит произвёл
GitHub
родитель
db52690c2a
Коммит
5fd86021a8
@@ -85,3 +85,37 @@ Replayer is a separate tool which aims to re-run the same set of RCCL calls as r
|
||||
Each rank will print out its progress as it goes through every line of calls, including its rank, line number, RCCL API name, status (INFO/WARNING/ERROR).
|
||||
It will also report time and bandwidth (if the line is a communication call) for that call. In the end, it will report the total time taken by all communication calls.
|
||||
Replayer is still under development and experimentations, so the formats of logging or contents of replayer output will be subject to changes.
|
||||
|
||||
## Log Converter
|
||||
`replay_log_converter.py` is a utility to convert between binary and JSON log formats, standardize JSON logs for easier parsing, and sanitize logs for comparison.
|
||||
|
||||
**Usage:**
|
||||
* **Binary to JSON:** `python3 replay_log_converter.py <basename> tojson`
|
||||
* **JSON to Binary:** `python3 replay_log_converter.py <basename> tobin`
|
||||
* **Standardize JSON:** `python3 replay_log_converter.py <basename> --standardize`
|
||||
* **Sanitize JSON:** `python3 replay_log_converter.py <basename> --sanitize`
|
||||
|
||||
An optional output basename can be provided after the mode (tojson/tobin) to customize the output filename:
|
||||
* `python3 replay_log_converter.py <basename> <mode> <output_basename>`
|
||||
|
||||
The converter automatically finds all matching log files with pattern `basename.PID.hostname` and processes them.
|
||||
|
||||
**Output Files:**
|
||||
* Standardized JSON output is saved with `.standard.json` extension and can be parsed with standard JSON libraries.
|
||||
* Sanitized files are modified in-place (original files are overwritten with sanitized versions).
|
||||
|
||||
**Examples:**
|
||||
* `python3 replay_log_converter.py replayer_log tojson` produces `replayer_log.{1270-1278}.quanta-cx77-11.json`
|
||||
* `python3 replay_log_converter.py replayer_log tojson converted_log` produces `converted_log.{1270-1278}.quanta-cx77-11.json`
|
||||
* `python3 replay_log_converter.py replayer_log --sanitize` sanitizes existing JSON files in-place
|
||||
* `python3 replay_log_converter.py replayer_log tojson --sanitize` converts to JSON and sanitizes in one step
|
||||
* `python3 replay_log_converter.py replayer_log --sanitize --no-timestamp` (or `--nts`) sets all timestamps to 0.0
|
||||
|
||||
**Sanitization:**
|
||||
The `--sanitize` option normalizes logs for easier comparison by:
|
||||
* Remapping pointers to readable identifiers (e.g., `comm : 0x7fb680328010` → `comm : comm_001`)
|
||||
* Normalizing timestamps relative to the first call (e.g., `time : 1762969171532.248535` → `time : 0.000000`)
|
||||
* Use `--no-timestamp` (or `--nts`) to set all timestamps to 0.0 instead
|
||||
* Preserving relationships: same pointer values get the same sanitized identifier
|
||||
* Sanitized fields: communicators (`comm`), unique IDs (`uniqueID`), streams (`stream`), buffer addresses (`addr`/`base`/`ptr`/`acc`), handles (`handle`), thread IDs (`thread`), and process IDs (`pid`)
|
||||
|
||||
|
||||
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Ссылка в новой задаче
Block a user