Skip to content

Vintage & Smoothing API Reference

Vintage Curve Fitting and Analysis

cranalytics.vintage

Vintage analysis public facade.

This module re-exports transforms, fitting, smoothing, and validation helpers from focused submodules to preserve backwards compatibility.

BaseSmoother

Base class for smoothing methods.

CurveFitter

Bases: BaseEstimator, RegressorMixin

Parametric curve fitting for vintage loss data.

.. warning:: Macro-Level Approximation Only This estimator fits cumulative loss rates over time. Iteratively predicting loss extrapolates cumulative impact based on the original starting cohort size. It is blind to actual principal paydown, scheduled amortization, and constant prepayment rates (CPR). It should be used for high-level scenario approximation, not as a substitute for true loan-level cashflow modeling.

forecast(months)

Alias for predict().

aggregate_by_dollar_weights(df, vintage_col='vintage_date', mob_col='months_on_book', segment_col='fico_band', loss_col='charge_off_amount', balance_col='outstanding_balance')

Aggregate segmented data using dollar-weighted averaging.

Aggregation is always performed by vintage and months-on-book. When segment_col is provided and present in the DataFrame, results are further segmented by that column.

create_vintage_triangle(df, vintage_col='vintage_date', mob_col='months_on_book', loss_col='cumulative_loss_rate', balance_col=None)

Create a vintage triangle from loan-level or aggregated data.

When balance_col is provided, loss rates are dollar-weighted (weighted average by balance). Otherwise a simple mean is used.

cross_validate_smoother(smoother, mob, values, weights=None, n_folds=5, strict=True, cv_method='rolling_origin')

Cross-validate a smoother and return (mean_mse, std_mse).

Parameters:

Name Type Description Default
smoother Any

A fitted smoother object with fit and forecast methods.

required
mob ndarray

Months-on-book array (time index). Must be monotonically non-decreasing for rolling_origin to be meaningful.

required
values ndarray

Observed curve values.

required
weights ndarray | None

Optional per-observation weights.

None
n_folds int

Number of CV folds.

5
strict bool

If True, raise on fold failure; if False, skip failed folds.

True
cv_method str

Fold strategy:

  • "rolling_origin" (default) -- walk-forward / expanding-window folds. Train always ends before test begins; no temporal leakage. Recommended for time-indexed vintage curves.
  • "kfold" -- standard k-fold with non-adjacent test blocks. Allows future data into training; retained for backward compatibility and non-time-ordered inputs.
'rolling_origin'

detect_incomplete_vintages(triangle, min_maturity_months=24)

Detect vintages that have dropped off (e.g., after charge-off).

normalize_vintage_data(df, vintage_col='vintage_date', mob_col='months_on_book', loss_col='cumulative_loss_rate', segment_col=None)

Normalize raw data into a list of standardized DataFrames for curve fitting. Each DataFrame corresponds to a unique vintage-segment and contains 'mob' and 'cumulative_loss_rate'.

project_incomplete_vintage_tails(df, incomplete_vintages, smoother, max_mob, vintage_col='vintage_date', mob_col='months_on_book', loss_col='cumulative_loss_rate')

Return projected tail rows for incomplete vintages.

For each vintage in incomplete_vintages, fits smoother to the observed mob/loss pairs and forecasts forward to max_mob. Projected values are clipped to be non-decreasing and bounded by 1.0 (cumulative loss semantics). If a vintage has fewer than two observations, a constant tail (last observed value held flat) is used instead.

Parameters

df: Long-format vintage data (same format passed to run_vintage_analysis_session). incomplete_vintages: Vintage names as they appear in df[vintage_col]. smoother: A :class:~cranalytics.vintage.BaseSmoother instance or a method name string accepted by :func:~cranalytics.vintage.create_smoother (e.g. "moving_average", "spline"). max_mob: The final month-on-book to project to. vintage_col, mob_col, loss_col: Column names matching those used in df.

Returns

pd.DataFrame Long-format DataFrame with synthetic rows for each (vintage, mob) pair beyond the observed range. Includes an is_projected bool column marking all returned rows.

run_vintage_analysis_session(df, *, vintage_col='vintage_date', mob_col='months_on_book', loss_col='cumulative_loss_rate', balance_col=None, segment_col=None, vintage_name=None, smoothers=None, min_maturity_months=None, include_cv=True, strict=False, extrapolate_tails=False, tail_max_mob=None)

Run one representative vintage analysis workflow from raw data.

Parameters

extrapolate_tails: When True and incomplete vintages are present, project each immature vintage's tail forward using the best-ranked smoother. Results are returned on tail_projections as a long-format DataFrame with an is_projected bool column. tail_max_mob: Project tails to this month-on-book. Defaults to the maximum MOB observed in the selected (complete) vintage.

smooth_curve(mob, values, method, weights=None, **kwargs)

Functional wrapper to smooth a curve by name or smoother instance.