Feature Analytics Tutorial
This guide is the best starting point when you want to understand which origination-time variables carry signal before you commit to a predictive model.
Use this workflow when
- you have loan-level application or booking data
- you want a ranked feature table before full model training
- you need a quick first win for early-performance or FPF-style analysis
Do not start here if
- you only have monthly aggregated rollforward data — use the Rollforward workflow instead
- you already have a finished target, train/test design, and want model metrics — go to the ML Modeling tutorial
- you need reserve forecasting from a transition matrix — use Lifetime Loss Forecasting instead
Inputs
A meaningful first pass usually includes:
- loan identifiers such as
loan_id - booked-loan attributes such as
principal,annual_rate,term_months,start_date,fico_score - a binary target or early-performance flag such as
fpf30_flag
Optional external backend:
- install
optbinningif you want optimal WoE binning withfit_woe_binning()
See the full contract reference here: Input Data Contracts.
Code
import pandas as pd
from cranalytics import (
engineer_loan_features,
lift_gain_table,
make_mock_fpf_data,
rank_features_by_separation,
)
raw = make_mock_fpf_data(n_loans=1200, mature_pct=0.85, seed=42)
mature = raw.dropna(subset=["fpf30_flag"]).reset_index(drop=True)
feature_frame = engineer_loan_features(
mature,
as_of_date=pd.Timestamp("2025-12-31"),
)
feature_cols = [
"fico_score",
"fico_normalized",
"annual_rate",
"dti",
"loan_to_income",
]
ranking = rank_features_by_separation(
feature_frame,
feature_cols=feature_cols,
flag_col="fpf30_flag",
)
print(ranking.head())
lift = lift_gain_table(
y_true=feature_frame["fpf30_flag"],
y_prob=feature_frame["vendor_pd"],
)
print(lift[["bin", "event_rate", "lift"]].head())
Expected output / first win
Your first win is a ranked feature table that tells you which variables are worth carrying forward into modeling.
You should expect:
- a table ordered by separation strength (
iv,gini,ks) - a quick lift/gain table that shows whether an existing score or proxy ranks risk sensibly
- a short list of “promote”, “watch”, and “drop for now” candidates
Common mistakes
- jumping straight to model training before checking whether any features have useful separation
- using immature targets without dropping
NaNrows first - treating WoE binning as required on day one — it is optional, not the first step
- mixing post-outcome fields into feature engineering, which creates leakage
Next step
Run the packaged demo end-to-end with:
python -m cranalytics.examples.core_feature_analytics
- If you want production-style feature transformation, add
fit_woe_binning()withoptbinning. - If you are ready to train and backtest a classifier, continue to the ML Modeling Tutorial.
- For full API details, see the Predictive Modeling API Reference.