neuralqx.profile.aggregate module

class RankStat(inclusive_ns, exclusive_ns, calls, flops, bytes)

Bases: object

aggregate_run(run_dir, output_path=None)

Offline aggregation of per-rank summary files.

Return type:

Dict[str, Any]

Produces:
  • merged tree with summed inclusive/exclusive/calls/flops/bytes

  • per-node across-rank min/mean/max for inclusive_ns and exclusive_ns

This avoids runtime MPI barriers and is recommended for large HPC runs.