Predictive Modeling API Reference
Model Development
cranalytics.model_development
Feature engineering, WoE binning, and lift/gain utilities.
Workflow 4: Credit Risk Feature Analytics
engineer_loan_features(df, *, as_of_date=None, reference_date=None)
Add derived credit-risk features to a loan DataFrame.
Source columns that are absent are silently skipped. Original columns are always preserved.
Parameters
df : pd.DataFrame
Loan-level DataFrame.
as_of_date : pd.Timestamp
Snapshot or model development date used to compute loan_age_months.
Must be provided explicitly — never default to max(start_date).
reference_date : pd.Timestamp, optional
Deprecated alias for as_of_date. Will be removed in v2.0.
Returns
pd.DataFrame
Copy of df with derived columns appended.
fit_woe_binning(df, feature_cols, target_col, *, special_codes=None, **binning_kwargs)
Fit an optimal WoE binning process on the provided features.
Wraps optbinning.BinningProcess. Requires optbinning.
Parameters
df : pd.DataFrame
feature_cols : list[str]
target_col : str
Binary target column (0/1).
special_codes : list, optional
Sentinel values to group into a special bin.
**binning_kwargs
Passed to BinningProcess.
Returns
optbinning.BinningProcess
Fitted process. Call .transform(X, metric="woe") to encode.
lift_gain_table(y_true, y_prob, *, n_bins=10)
Compute lift and gain table from binary labels and predicted probabilities.
Sorts observations by descending score, splits into n_bins equal-sized
buckets, and returns per-bin statistics.
Parameters
y_true : array-like Binary labels (0/1). y_prob : array-like Predicted probabilities in [0, 1]. n_bins : int Number of equal-size buckets (default 10).
Returns
pd.DataFrame Columns: bin, n, n_events, event_rate, cumulative_n, cumulative_events, cumulative_gain, lift, score_min, score_max, baseline_rate.
Predictive Targets
cranalytics.predictive_targets
Target construction from monthly performance panels.
Workflow 5a: Target Construction
assemble_modeling_frame(features_df, targets_df, *, on, leakage_guard=True)
Join features and targets with optional leakage detection.
Parameters
features_df : pd.DataFrame
targets_df : pd.DataFrame
on : str or list[str]
Join key(s), typically "loan_id" or ["loan_id", "as_of_date"].
leakage_guard : bool
If True, raises if any non-key target column appears in features_df.
Returns
pd.DataFrame Inner join of features and targets.
build_targets(panel_df, *, mode, targets, target_overrides=None)
Derive or validate binary/regression targets from a monthly performance panel.
Parameters
panel_df : pd.DataFrame
Monthly performance panel. Required columns (panel mode):
loan_id, mob, as_of_date, origination_date, original_principal,
dpd, chargeoff_amount.
mode : str
"panel" — derive targets from dpd/chargeoff columns.
"prelabeled" — validate and pass through pre-built target columns.
targets : list[str]
Target column names to derive or validate.
target_overrides : dict, optional
Per-target config overrides, e.g.
{"fpf30_flag": {"dpd_threshold": 60, "mob_horizon": 6}}.
Returns
pd.DataFrame
Loan-level DataFrame with loan_id + one column per target.
NaN indicates immature (required observation horizon not yet reached).
Predictive Modeling
cranalytics.predictive_modeling
Model training, scoring, and evaluation utilities.
Workflow 5b: ML Modeling
score_model(df, estimator, feature_cols, *, output_col, prediction_type)
Score a fitted estimator against a DataFrame.
Parameters
df : pd.DataFrame
The DataFrame to score. Must be the first argument for .pipe() compatibility.
estimator
Any fitted sklearn-compatible estimator.
feature_cols : list[str]
output_col : str
Name for the appended prediction column.
prediction_type : str
"probability" — predict_proba; positive class probability.
"class" — predict; class label.
"value" — predict; continuous value (regression).
Returns
pd.DataFrame
df copy with output_col appended.
train_binary_model(df, feature_cols, target_col, *, model_family, model_params=None, calibrate=False, random_state=42, strict=False)
Train a binary classification model.
Parameters
df : pd.DataFrame
feature_cols : list[str]
target_col : str
Binary target (0/1). NaN rows are dropped before training.
model_family : str
"logistic", "hgb_classifier", "xgboost_classifier".
model_params : dict, optional
Merged over defaults.
calibrate : bool
If True, applies isotonic calibration post-fit.
random_state : int
Returns
tuple[estimator, dict, pd.DataFrame] (fitted estimator, metadata dict, diagnostics DataFrame)
train_regression_model(df, feature_cols, target_col, *, model_family, model_params=None, random_state=42, strict=False)
Train a regression model.
Parameters
model_family : str
"tweedie", "hgb_regressor", "quantile_hgb",
"xgboost_regressor".
Returns
tuple[estimator, dict, pd.DataFrame]
Predictive Backtest
cranalytics.predictive_backtest
Temporal out-of-time backtesting for predictive models.
Workflow 5c: Temporal Backtesting
run_predictive_backtest(df, feature_cols, target_col, split_col, *, model_family, window_type='expanding', n_splits=5, rolling_window_size=None, model_params=None, random_state=42, strict=False)
Rolling or expanding temporal OOT backtest.
Splits df by unique values of split_col (e.g., origination_month)
in ascending order. No data shuffling — all splits are strictly temporal.
Parameters
df : pd.DataFrame
feature_cols : list[str]
target_col : str
Binary target (0/1/NaN). NaN rows excluded from training and scoring.
split_col : str
Column defining temporal ordering (e.g., origination_month, vintage).
model_family : str
Passed to _build_binary_estimator.
window_type : str
"expanding" — all prior periods in train.
"rolling" — fixed-width window of rolling_window_size prior periods.
n_splits : int
Number of OOT folds. Requires len(unique_periods) >= n_splits + 1.
rolling_window_size : int, optional
Number of prior periods in each rolling train window. Only used when
window_type="rolling". Defaults to n_splits when not provided.
model_params : dict, optional
random_state : int
Returns
pd.DataFrame One row per fold: split, train_periods, val_periods, n_train, n_val, auc, gini, ks.
summarize_predictive_backtest(backtest_df, *, by=None)
Aggregate backtest metrics across folds.
Parameters
backtest_df : pd.DataFrame
Output of run_predictive_backtest.
by : str or list[str], optional
Column(s) to group by (e.g., "model"). If None, aggregates all rows.
Returns
pd.DataFrame One row per group with mean_auc, mean_gini, mean_ks, n_folds.
Forecasting Bridge
cranalytics.forecasting_bridge
Calendar charge-off bridge: loan-level predictions → monthly aggregate forecasts.
Workflow 5d: Calendar Charge-Off Bridge
forecast_calendar_chargeoff_from_predictions(scored_df, *, score_col, as_of_col, current_mob_col, hazard_curves, principal_col='original_principal', segment_col=None, strict=True)
Convert loan-level PD predictions to calendar-month charge-off forecasts.
Algorithm
- Per loan: expected_co_dollars = score * original_principal
- Look up segment hazard curve (fall back to global profile if missing)
- Normalize future chargeoff_hazard_rate from current_mob forward → weights
- Allocate expected_co_dollars across future MOBs proportionally
- Map future MOB → calendar month via as_of_date + offset
- Aggregate by forecast_month
Parameters
scored_df : pd.DataFrame
Loan-level. Required: score_col, as_of_col, current_mob_col,
principal_col. Optional: segment_col.
score_col : str
PD probability column (values in [0, 1]).
as_of_col : str
Snapshot date for each loan.
current_mob_col : str
Current month-on-book for each loan.
hazard_curves : pd.DataFrame
Output of fit_flow_hazard_curves().
Required columns: segment_id, month_on_book, chargeoff_hazard_rate.
principal_col : str
Original principal column (default "original_principal").
segment_col : str, optional
Column for segment lookup in hazard_curves.
strict : bool
If True, raise on missing segment. If False, fall back to global profile.
Returns
pd.DataFrame Columns: forecast_month, expected_chargeoff_amount, loan_count, expected_chargeoff_rate.