ADR-0006: Lazy Metrics Pattern

Status:

Accepted

Date:

2026-01-01 (v1.5.0)

Context

Distribution fitting computes multiple goodness-of-fit metrics:

  • SSE: Sum of Squared Errors (always computed, fast)

  • AIC/BIC: Information criteria (always computed, fast)

  • KS: Kolmogorov-Smirnov statistic and p-value

  • AD: Anderson-Darling statistic

KS and AD statistics require evaluating the CDF at every data point, making them O(n) operations that dominate fitting time for large datasets:

  • Fitting 90 distributions with KS/AD: ~15 seconds

  • Fitting 90 distributions without KS/AD: ~3 seconds

Many workflows only need AIC/BIC for model selection, making KS/AD computation wasteful.

Decision

We implemented lazy metric computation via the lazy_metrics config option:

config = (FitterConfigBuilder()
    .with_lazy_metrics(True)
    .build())

Behavior:

  • When lazy_metrics=False (default): KS/AD computed during parallel fit

  • When lazy_metrics=True: KS/AD deferred until accessed

Implementation (results.py):

class LazyFitResult(FitResult):
    def __init__(self, ..., data_sample: np.ndarray):
        self._data_sample = data_sample
        self._ks_statistic: Optional[float] = None
        self._ks_pvalue: Optional[float] = None

    @property
    def ks_statistic(self) -> float:
        if self._ks_statistic is None:
            self._compute_ks()
        return self._ks_statistic

    def _compute_ks(self) -> None:
        # Compute on-demand using stored data sample

Class hierarchy (v2.1.0 refinement):

FitResult (base)
├── EagerFitResult  # KS/AD computed upfront
└── LazyFitResult   # KS/AD computed on access

Data lifecycle:

  • Lazy results store a reference to the data sample

  • After KS/AD computation, sample can be released via release_data()

  • Serialization includes computed values, not raw data

Consequences

Positive:

  • 5x speedup for AIC/BIC-only workflows

  • No API change: properties still accessible, just deferred

  • Users only pay for metrics they actually use

  • Parallel fitting remains embarrassingly parallel

Negative:

  • Memory overhead: lazy results hold data sample until metrics accessed

  • First access to KS/AD incurs computation delay

  • Serialization of lazy results may include uncomputed metrics as None

Neutral:

  • Default is lazy_metrics=False for backwards compatibility

  • Lazy results explicitly document their deferred behavior

References