Skip to content

Predictive Modeling API Reference

Model Development

cranalytics.model_development

Feature engineering, WoE binning, and lift/gain utilities.

Workflow 4: Credit Risk Feature Analytics

engineer_loan_features(df, *, as_of_date=None, reference_date=None)

Add derived credit-risk features to a loan DataFrame.

Source columns that are absent are silently skipped. Original columns are always preserved.

Parameters

df : pd.DataFrame Loan-level DataFrame. as_of_date : pd.Timestamp Snapshot or model development date used to compute loan_age_months. Must be provided explicitly — never default to max(start_date). reference_date : pd.Timestamp, optional Deprecated alias for as_of_date. Will be removed in v2.0.

Returns

pd.DataFrame Copy of df with derived columns appended.

fit_woe_binning(df, feature_cols, target_col, *, special_codes=None, **binning_kwargs)

Fit an optimal WoE binning process on the provided features.

Wraps optbinning.BinningProcess. Requires optbinning.

Parameters

df : pd.DataFrame feature_cols : list[str] target_col : str Binary target column (0/1). special_codes : list, optional Sentinel values to group into a special bin. **binning_kwargs Passed to BinningProcess.

Returns

optbinning.BinningProcess Fitted process. Call .transform(X, metric="woe") to encode.

lift_gain_table(y_true, y_prob, *, n_bins=10)

Compute lift and gain table from binary labels and predicted probabilities.

Sorts observations by descending score, splits into n_bins equal-sized buckets, and returns per-bin statistics.

Parameters

y_true : array-like Binary labels (0/1). y_prob : array-like Predicted probabilities in [0, 1]. n_bins : int Number of equal-size buckets (default 10).

Returns

pd.DataFrame Columns: bin, n, n_events, event_rate, cumulative_n, cumulative_events, cumulative_gain, lift, score_min, score_max, baseline_rate.

Predictive Targets

cranalytics.predictive_targets

Target construction from monthly performance panels.

Workflow 5a: Target Construction

assemble_modeling_frame(features_df, targets_df, *, on, leakage_guard=True)

Join features and targets with optional leakage detection.

Parameters

features_df : pd.DataFrame targets_df : pd.DataFrame on : str or list[str] Join key(s), typically "loan_id" or ["loan_id", "as_of_date"]. leakage_guard : bool If True, raises if any non-key target column appears in features_df.

Returns

pd.DataFrame Inner join of features and targets.

build_targets(panel_df, *, mode, targets, target_overrides=None)

Derive or validate binary/regression targets from a monthly performance panel.

Parameters

panel_df : pd.DataFrame Monthly performance panel. Required columns (panel mode): loan_id, mob, as_of_date, origination_date, original_principal, dpd, chargeoff_amount. mode : str "panel" — derive targets from dpd/chargeoff columns. "prelabeled" — validate and pass through pre-built target columns. targets : list[str] Target column names to derive or validate. target_overrides : dict, optional Per-target config overrides, e.g. {"fpf30_flag": {"dpd_threshold": 60, "mob_horizon": 6}}.

Returns

pd.DataFrame Loan-level DataFrame with loan_id + one column per target. NaN indicates immature (required observation horizon not yet reached).

Predictive Modeling

cranalytics.predictive_modeling

Model training, scoring, and evaluation utilities.

Workflow 5b: ML Modeling

score_model(df, estimator, feature_cols, *, output_col, prediction_type)

Score a fitted estimator against a DataFrame.

Parameters

df : pd.DataFrame The DataFrame to score. Must be the first argument for .pipe() compatibility. estimator Any fitted sklearn-compatible estimator. feature_cols : list[str] output_col : str Name for the appended prediction column. prediction_type : str "probability"predict_proba; positive class probability. "class"predict; class label. "value"predict; continuous value (regression).

Returns

pd.DataFrame df copy with output_col appended.

train_binary_model(df, feature_cols, target_col, *, model_family, model_params=None, calibrate=False, random_state=42, strict=False)

Train a binary classification model.

Parameters

df : pd.DataFrame feature_cols : list[str] target_col : str Binary target (0/1). NaN rows are dropped before training. model_family : str "logistic", "hgb_classifier", "xgboost_classifier". model_params : dict, optional Merged over defaults. calibrate : bool If True, applies isotonic calibration post-fit. random_state : int

Returns

tuple[estimator, dict, pd.DataFrame] (fitted estimator, metadata dict, diagnostics DataFrame)

train_regression_model(df, feature_cols, target_col, *, model_family, model_params=None, random_state=42, strict=False)

Train a regression model.

Parameters

model_family : str "tweedie", "hgb_regressor", "quantile_hgb", "xgboost_regressor".

Returns

tuple[estimator, dict, pd.DataFrame]

Predictive Backtest

cranalytics.predictive_backtest

Temporal out-of-time backtesting for predictive models.

Workflow 5c: Temporal Backtesting

run_predictive_backtest(df, feature_cols, target_col, split_col, *, model_family, window_type='expanding', n_splits=5, rolling_window_size=None, model_params=None, random_state=42, strict=False)

Rolling or expanding temporal OOT backtest.

Splits df by unique values of split_col (e.g., origination_month) in ascending order. No data shuffling — all splits are strictly temporal.

Parameters

df : pd.DataFrame feature_cols : list[str] target_col : str Binary target (0/1/NaN). NaN rows excluded from training and scoring. split_col : str Column defining temporal ordering (e.g., origination_month, vintage). model_family : str Passed to _build_binary_estimator. window_type : str "expanding" — all prior periods in train. "rolling" — fixed-width window of rolling_window_size prior periods. n_splits : int Number of OOT folds. Requires len(unique_periods) >= n_splits + 1. rolling_window_size : int, optional Number of prior periods in each rolling train window. Only used when window_type="rolling". Defaults to n_splits when not provided. model_params : dict, optional random_state : int

Returns

pd.DataFrame One row per fold: split, train_periods, val_periods, n_train, n_val, auc, gini, ks.

summarize_predictive_backtest(backtest_df, *, by=None)

Aggregate backtest metrics across folds.

Parameters

backtest_df : pd.DataFrame Output of run_predictive_backtest. by : str or list[str], optional Column(s) to group by (e.g., "model"). If None, aggregates all rows.

Returns

pd.DataFrame One row per group with mean_auc, mean_gini, mean_ks, n_folds.

Forecasting Bridge

cranalytics.forecasting_bridge

Calendar charge-off bridge: loan-level predictions → monthly aggregate forecasts.

Workflow 5d: Calendar Charge-Off Bridge

forecast_calendar_chargeoff_from_predictions(scored_df, *, score_col, as_of_col, current_mob_col, hazard_curves, principal_col='original_principal', segment_col=None, strict=True)

Convert loan-level PD predictions to calendar-month charge-off forecasts.

Algorithm
  1. Per loan: expected_co_dollars = score * original_principal
  2. Look up segment hazard curve (fall back to global profile if missing)
  3. Normalize future chargeoff_hazard_rate from current_mob forward → weights
  4. Allocate expected_co_dollars across future MOBs proportionally
  5. Map future MOB → calendar month via as_of_date + offset
  6. Aggregate by forecast_month
Parameters

scored_df : pd.DataFrame Loan-level. Required: score_col, as_of_col, current_mob_col, principal_col. Optional: segment_col. score_col : str PD probability column (values in [0, 1]). as_of_col : str Snapshot date for each loan. current_mob_col : str Current month-on-book for each loan. hazard_curves : pd.DataFrame Output of fit_flow_hazard_curves(). Required columns: segment_id, month_on_book, chargeoff_hazard_rate. principal_col : str Original principal column (default "original_principal"). segment_col : str, optional Column for segment lookup in hazard_curves. strict : bool If True, raise on missing segment. If False, fall back to global profile.

Returns

pd.DataFrame Columns: forecast_month, expected_chargeoff_amount, loan_count, expected_chargeoff_rate.