Glossary
Domain and package-specific terms used throughout cranalytics.
A
Absorbing state
A state in a transition matrix from which there is no exit — once a loan enters it, it stays there. Charged Off and Paid Off are typically absorbing states. Their diagonal value is 1.0 and all off-diagonal values in the same row are 0.
A/E ratio (Actual-to-Expected) The ratio of observed event rates to model-predicted rates over a validation period. An A/E of 1.0 means the model is perfectly calibrated; values above 1.0 indicate underprediction.
B
CGCO (Cumulative Gross Charge-Off)
The sum of all charge-off dollars from origination through the current observation date, expressed as a percentage of original principal (cgco_pct). The primary output metric of vintage curve analysis.
Champion/challenger
A model governance pattern where a primary ("champion") model is monitored against one or more alternative ("challenger") models. cranalytics uses this pattern in backtest variant selection for both rollforward and vintage models.
Charged Off
The canonical terminal loss state for loan snapshots, loan history, and transition-matrix examples in cranalytics. It represents a loan that has been written off on the lender's books and is treated as the absorbing loss state in package examples.
Cohort A group of loans originated in the same time period (month or quarter). Vintage analysis groups loans into cohorts and tracks their performance as they age together. See also: vintage.
Concordance index (C-index) A discrimination metric for survival models. Ranges 0–1; 0.5 is random, 1.0 is perfect. Measures the probability that a model correctly ranks two randomly chosen borrowers by their event time.
Cox PH (Cox Proportional Hazards)
A semi-parametric survival regression model that estimates how covariates (e.g., FICO score, DTI) scale the baseline hazard rate. Available via cranalytics.survival.
D
Default
A legacy loss-state label still accepted by some older transition matrices. Newer cranalytics workflows and examples use Charged Off as the canonical terminal loss state instead.
DPD (Days Past Due)
The number of calendar days since the last required payment was missed. Used to define delinquency buckets (Late-30, Late-60, Late-90).
DTI (Debt-to-Income ratio) A borrower characteristic: total monthly debt obligations divided by gross monthly income. Expressed as a decimal (0.35 = 35%) in model features.
E
EAD (Exposure at Default)
The outstanding principal balance at the time a loan reaches the loss terminal state. In cranalytics, this is typically the principal column of a portfolio DataFrame.
Expected loss
The statistical average loss on a portfolio: EAD × PD × LGD, summed across all loans. The output of forecast_lifetime_loss.
F
FICO band
A bucketed grouping of FICO scores used for portfolio segmentation:
<600 / 600–649 / 650–699 / 700–749 / 750–799 / 800+
FICO score
A credit score from 300 (highest risk) to 850 (lowest risk) produced by Fair Isaac Corporation. Used throughout cranalytics for segmentation and feature engineering.
FPF (First Payment Failure)
When a borrower misses their very first scheduled loan payment. A binary target for early-warning predictive models. Variants: fpf30_flag, fpf60_flag.
Rollforward data
Aggregated monthly rollforward observations of a loan cohort's activity: how much was paid, how much charged off, and what balance remained. The input shape for fit_flow_hazard_curves and run_rollforward_workflow. See Input Data Contracts.
G
Gini coefficient
A model discrimination metric equal to 2 × AUC − 1. Ranges −1 to 1; higher is better. Computed by calculate_gini.
Gompertz curve
A parametric sigmoid curve used for vintage loss projection. Suitable for portfolios where loss acceleration slows earlier in the vintage life. One of the four built-in distribution families in cranalytics.distributions.
H
Hazard rate
The instantaneous probability that an event (payment or charge-off) occurs in a given period, given that it has not yet occurred. Monthly hazard rates in cranalytics represent: "of the balance that has not yet been paid off or charged off, what fraction exits this month?"
Holdout period
In backtesting, the set of recent observations withheld from model training and used to measure out-of-sample forecast accuracy. Configured via holdout_months in the CLI and workflow functions.
I
IV (Information Value)
A scalar measure of a feature's predictive power for a binary target. Computed alongside WOE by compute_woe_iv. Rule of thumb: < 0.02 = not predictive; 0.02–0.1 = weak; 0.1–0.3 = medium; > 0.3 = strong.
K
Kaplan-Meier
A non-parametric estimator of the survival function — the probability of surviving (not defaulting, not prepaying) past time t. Does not require a distributional assumption. Available via cranalytics.survival.
KS statistic (Kolmogorov-Smirnov)
The maximum separation between the cumulative distributions of positive and negative classes in a scorecard. Ranges 0–1; higher is better. Computed by calculate_ks.
L
LGD (Loss Given Default)
The fraction of EAD that is lost after recovery efforts: LGD = 1 − (Recovery / EAD). Computed by calculate_lgd. Expressed as a decimal (0.6 = 60% loss).
Lifetime loss The expected total credit loss on a portfolio from the current date through final resolution of all loans. The primary output of the loss forecasting workflow.
Lognormal curve A parametric curve used for vintage loss projection where the loss rate follows a lognormal CDF as a function of MOB. One of the four built-in distribution families.
M
MAPE (Mean Absolute Percentage Error) The average of |actual − predicted| / actual across validation observations. Used to score backtest variants. Lower is better.
Migration matrix See transition matrix.
Mix shift
A change in portfolio composition over time — e.g., a shift toward lower-FICO borrowers — that can cause aggregate performance changes independent of individual loan performance. Analyzed via calculate_fico_mix.
MOB (Months on Book)
The age of a loan cohort in months, measured from origination. MOB 1 is the first full month after origination. The primary x-axis variable in vintage analysis. Column name: mob or month_on_book.
P
PD (Probability of Default)
The probability that a loan reaches the loss terminal state over a specified horizon. In transition-matrix forecasting, this is the cumulative probability of reaching the Charged Off state from the current state; older matrices may still label that absorbing state Default.
PSI (Population Stability Index)
Measures how much a score distribution has shifted between a reference period and a current period. PSI < 0.1 = stable; 0.1–0.25 = moderate shift; > 0.25 = significant shift. Computed by compute_psi.
R
Recovery rate The fraction of defaulted principal recovered through collections, liquidation, or sale. Recovery rate = 1 − LGD.
Reserve ratio
Lifetime loss divided by total portfolio balance. The primary output ratio of summarize_lifetime_loss.
Risk grade
A cranalytics-specific integer label (1–6) mapped from FICO band, where 1 = lowest risk (800+) and 6 = highest risk (<600). Added by segment_fico.
Roll rate The monthly flow of loans from one delinquency status to another (e.g., from Current to Late-30). The empirical basis for fitting transition matrices and flow hazard curves.
S
Scorecard
A predictive model that assigns a numeric risk score to each loan or borrower. cranalytics supports binary classification scorecards via train_binary_model.
Segment A subset of a portfolio grouped by a shared characteristic (FICO band, product type, origination channel). Vintage and rollforward analyses can be run independently per segment.
Survival function The probability S(t) that a loan survives (does not default or prepay) past time t. The complement of the cumulative hazard function. Estimated by Kaplan-Meier or parametric models.
T
Transition matrix A square DataFrame where rows and columns are loan states and each cell [i, j] is the monthly probability of moving from state i to state j. Rows must sum to 1.0. See Input Data Contracts — Transition Matrix.
U
Ultimate loss
The final cumulative loss rate a vintage is expected to reach at full maturity. The asymptote of a fitted vintage loss curve. Stored as CurveFitter.ultimate_ after fitting.
UPB (Unpaid Principal Balance)
Synonym for outstanding principal balance. Common column alias for outstanding_balance in rollforward data.
V
Vintage
A group of loans originated within the same time window (typically a month or quarter). "The 2023-Q1 vintage" refers to all loans originated January–March 2023. Column name: vintage_name.
Vintage triangle A matrix where rows are vintages, columns are months on book, and each cell is the cumulative loss rate for that vintage at that age. The raw data format for vintage curve fitting before reshaping to long format.
W
WAL (Weighted Average Life)
The average time until principal is returned, weighted by the amount repaid at each period. Computed by calculate_wal.
Weibull curve A parametric curve used for vintage loss projection where loss development follows a Weibull CDF. Flexible shape makes it suitable for a wide range of portfolio types. One of the four built-in distribution families.
WOE (Weight of Evidence)
A transformation that replaces feature bin values with log(P(good) / P(bad)) in each bin, linearizing the relationship between a feature and a binary target. Computed by compute_woe_iv and fit_woe_binning.