neuralqx.profile.aggregate module¶
- aggregate_run(run_dir, output_path=None)¶
Offline aggregation of per-rank summary files.
- Produces:
merged tree with summed inclusive/exclusive/calls/flops/bytes
per-node across-rank min/mean/max for inclusive_ns and exclusive_ns
This avoids runtime MPI barriers and is recommended for large HPC runs.