Skip to main content
Version: v2603

intelligence.zenith_tune.evaluators.megatron

Evaluator for Megatron training throughput.

MegatronThroughputEvaluator Objects

@EvaluatorRegistry.register("megatron")
class MegatronThroughputEvaluator(TuningEvaluator)

Extract throughput (TFLOP/s/GPU) from Megatron training output.

Searches stdout for the last occurrence of throughput per GPU (TFLOP/s/GPU): <value>.

Raises ValueError if the pattern is not found in stdout.

Example:

evaluator = MegatronThroughputEvaluator() value = evaluator.evaluate(stdout, metadata)

evaluate

def evaluate(stdout: str, metadata: dict[str, Any]) -> float

Extract the last reported throughput value from stdout.

Arguments:

  • stdout - The stdout output from the training command.
  • metadata - Trial metadata (unused).

Returns:

Throughput in TFLOP/s/GPU.

Raises:

  • ValueError - If the throughput line is not found in stdout.