Predictive Modeling API Reference

Feature Analytics

Use feature-analytics helpers (early-performance rates, WoE/IV, feature engineering, lift/gain, score calibration and monitoring) to prepare and validate signal before fitting a final estimator. No single run() here — these are independent analytics tools rather than one pipeline.

`cranalytics.feature_analytics`

Feature analytics compatibility re-export surface.

This package groups early-performance rate/separation analytics (_early_performance), feature engineering and WoE binning (_model_development), and score calibration/monitoring (_score_monitoring) — independent analytics tools rather than one end-to-end pipeline, so there is no single run() entry point here. Reach for the specific helper you need.

This module carries no logic of its own — it only re-exports the focused submodules so that existing from cranalytics.feature_analytics import ... call sites keep working. Add new behavior to the appropriate submodule, not here.

`calculate_early_performance_rates(df: pd.DataFrame, flag_columns: list[str], weight_col: str | None = None, confidence: float = 0.95) -> pd.DataFrame`

Compute portfolio-level event rates and Wilson confidence intervals.

`compute_conditional_loss_table(df: pd.DataFrame, segment_cols: list[str], flag_col: str, outcome_col: str, n_bins_score: int = 5, score_col: str | None = None, weight_col: str | None = None, confidence: float = 0.95) -> pd.DataFrame`

Compute conditional lifetime-loss summaries by segment, score bucket, and flag.

`compute_marginal_impact(df: pd.DataFrame, feature_col: str, flag_col: str, control_col: str, n_bins_control: int = 5, n_bins_feature: int | None = None, weight_col: str | None = None) -> pd.DataFrame`

Estimate feature effect within control strata via within-bucket rate deltas.

`compute_segment_rates(df: pd.DataFrame, flag_col: str, group_by: str | list[str], weight_col: str | None = None, n_bins: int | None = None, confidence: float = 0.95) -> pd.DataFrame`

Compute mature-event rates by segment(s) with volume and contribution shares.

`compute_woe_iv(df: pd.DataFrame, feature_col: str, flag_col: str, n_bins: int = 10, weight_col: str | None = None) -> tuple[pd.DataFrame, float]`

Compute a Weight of Evidence table and total Information Value.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Loan-level feature and outcome data.	required
`feature_col`	`str`	Candidate feature to bin and evaluate.	required
`flag_col`	`str`	Binary event column. Null outcomes are excluded.	required
`n_bins`	`int`	Maximum number of bins for numeric features.	`10`
`weight_col`	`str \| None`	Optional exposure or observation-weight column.	`None`

Returns:

Type	Description
`tuple[DataFrame, float]`	Tuple of WoE detail table and total IV.

Examples:

>>> import pandas as pd
>>> from cranalytics.feature_analytics import compute_woe_iv
>>> frame = pd.DataFrame({"fico": [600, 650, 750, 800], "bad": [1, 1, 0, 0]})
>>> table, total_iv = compute_woe_iv(frame, "fico", "bad", n_bins=2)
>>> {"bin", "woe", "iv_contribution"} <= set(table.columns)
True
>>> total_iv >= 0
True

`estimate_vintage_lifetime_profit(df: pd.DataFrame, expected_loss_col: str, avg_life_col: str | None = None, avg_life: float | None = None, coupon_rate: float = 0.0, servicing_cost_rate: float = 0.0, funding_cost_rate: float = 0.0) -> pd.DataFrame`

Estimate vintage-level net margin from pricing assumptions and expected loss.

Formula::

net_margin = coupon_rate * avg_life
             - expected_loss_rate
             - servicing_cost_rate
             - funding_cost_rate

All rate inputs are decimals (e.g. 0.18 for 18%). avg_life is in years. coupon_rate is an annualised yield multiplied by avg_life to give the lifetime interest income. servicing_cost_rate and funding_cost_rate are lifetime rates (not annualised) and are subtracted directly without scaling by avg_life. expected_loss_rate is likewise a lifetime rate (e.g. from compute_conditional_loss_table).

Parameters

df : DataFrame with at least expected_loss_col (and optionally avg_life_col). expected_loss_col : Column of lifetime loss rates (decimals, e.g. 0.05 = 5%). avg_life_col : Per-row average life column (years). Takes precedence over avg_life. avg_life : Scalar average life (years) applied to all rows. Required if avg_life_col is None. coupon_rate : Annualised all-in yield (decimal); multiplied by avg_life in the formula. servicing_cost_rate : Lifetime servicing cost rate (decimal); not scaled by avg_life. funding_cost_rate : Lifetime cost-of-funds rate (decimal); not scaled by avg_life.

Returns

Copy of df with an added net_margin column.

`rank_features_by_separation(df: pd.DataFrame, feature_cols: list[str], flag_col: str, n_bins: int = 10, weight_col: str | None = None) -> pd.DataFrame`

Rank candidate features by IV, Gini, and KS against a binary target.

`validate_performance_flags(df: pd.DataFrame, flag_columns: list[str]) -> tuple[pd.DataFrame, pd.DataFrame]`

Validate binary flag columns and return maturity coverage statistics.

`engineer_loan_features(df: pd.DataFrame, *, as_of_date: pd.Timestamp | None = None, reference_date: pd.Timestamp | None = None) -> pd.DataFrame`

Add derived credit-risk features to a loan DataFrame.

Source columns that are absent are silently skipped. Original columns are always preserved.

Parameters

df : pd.DataFrame Loan-level DataFrame. as_of_date : pd.Timestamp Snapshot or model development date used to compute loan_age_months. Must be provided explicitly — never default to max(start_date). reference_date : pd.Timestamp, optional Deprecated alias for as_of_date. Will be removed in v2.0.

Returns

pd.DataFrame Copy of df with derived columns appended.

`fit_woe_binning(df: pd.DataFrame, feature_cols: list[str], target_col: str, *, special_codes: list | None = None, **binning_kwargs)`

Fit an optimal WoE binning process on the provided features.

Wraps optbinning.BinningProcess. Requires optbinning.

Parameters

df : pd.DataFrame feature_cols : list[str] target_col : str Binary target column (0/1). special_codes : list, optional Sentinel values to group into a special bin. **binning_kwargs Passed to BinningProcess.

Returns

optbinning.BinningProcess Fitted process. Call .transform(X, metric="woe") to encode.

`lift_gain_table(y_true: pd.Series | np.ndarray, y_prob: pd.Series | np.ndarray, *, n_bins: int = 10) -> pd.DataFrame`

Compute lift and gain table from binary labels and predicted probabilities.

Sorts observations by descending score, splits into n_bins equal-sized buckets, and returns per-bin statistics.

Parameters

y_true : array-like Binary labels (0/1). y_prob : array-like Predicted probabilities in [0, 1]. n_bins : int Number of equal-size buckets (default 10).

Returns

pd.DataFrame Columns: bin, n, n_events, event_rate, cumulative_n, cumulative_events, cumulative_gain, lift, score_min, score_max, baseline_rate.

`calibrate_score_to_event_rate(df: pd.DataFrame, score_col: str, flag_col: str, n_bins: int = 10, weight_col: str | None = None, method: str = 'binned') -> tuple[pd.DataFrame, dict[str, float | str]]`

Calibrate a score to observed target event rates.

`compute_actual_vs_expected(df: pd.DataFrame, score_col: str, flag_col: str, group_by: str | list[str], calibration_table: pd.DataFrame | None = None, weight_col: str | None = None, confidence: float = 0.95, n_bins_group: int | None = None) -> pd.DataFrame`

Compare observed event rates to expected rates by segment.

`compute_psi(expected: pd.Series, actual: pd.Series, n_bins: int = 10, bin_edges: list[float] | None = None, expected_weights: pd.Series | None = None, actual_weights: pd.Series | None = None) -> tuple[pd.DataFrame, float]`

Compute population stability index between expected and actual distributions.

`score_performance_monitoring_report(df: pd.DataFrame, score_col: str, flag_col: str, group_by: str | list[str] | None = None, baseline_df: pd.DataFrame | None = None, n_bins: int = 10, weight_col: str | None = None) -> dict`

Bundled score monitoring report: discrimination, calibration, A/E, PSI.

Parameters

df : Scored portfolio DataFrame with mature flags. score_col : Numeric score column (higher = higher risk). flag_col : Binary 0/1 event flag (NaN = immature). group_by : Column(s) for actual-vs-expected breakdown. Omitted if None. baseline_df : Reference period DataFrame for PSI. Omitted if None. n_bins : Score bins for calibration table. weight_col : Optional loan-balance weight.

Returns

dict with keys:

"discrimination": dict of auc, gini, ks, spearman_rho
"calibration_table": DataFrame from calibrate_score_to_event_rate
"calibration_stats": summary stats dict
"actual_vs_expected": DataFrame or None (requires group_by)
"psi": {"table": DataFrame, "total_psi": float} or None (requires baseline_df)

`simulate_policy_cutoff(df: pd.DataFrame, flag_col: str, cutoff_col: str, cutoffs: list[float], direction: str = 'min', weight_col: str | None = None) -> pd.DataFrame`

Simulate the volume/loss trade-off across candidate policy cutoffs.

For each cutoff value, computes approval rate, projected event rate, volume lost, and loss reduction relative to no cutoff (all mature loans).

Parameters

df : DataFrame with mature-flag and cutoff columns. flag_col : Binary 0/1 event flag (NaN = immature, excluded). cutoff_col : Numeric column to threshold (e.g. fico_score, dti). cutoffs : Candidate threshold values to evaluate. direction : "min" approves rows where cutoff_col >= cutoff (FICO floor); "max" approves rows where cutoff_col <= cutoff (DTI ceiling). weight_col : Optional loan-balance weight.

Returns

DataFrame with columns: cutoff, n_approved, n_total_mature, approval_rate, projected_event_rate, volume_loss_pct, loss_reduction_pct.

.. warning:: Selection bias: this analysis is backward-looking. You only observe performance for loans that were approved. The performance of declined loans is unknown (rejection inference problem). Use for directional analysis only — not for precise P&L projection.

Predictive Modeling

Use predictive.run() for the end-to-end path: target construction, training, scoring, and temporal out-of-time backtesting behind one call, returning a result with .summary() and .plot(). Use train_binary_model() and score_model() directly for narrow training and scoring steps, or the forecasting bridge to turn model scores into calendar-month portfolio projections.

`cranalytics.predictive`

Predictive modeling compatibility re-export surface.

The deep entry point for the predictive workflow is cranalytics.predictive.run (defined in :mod:cranalytics.predictive._session), which trains a binary model, scores it, and runs a temporal out-of-time backtest behind one call. Prefer it for end-to-end modeling; it returns a result with .summary() and .plot(). Reach for the narrower predictive.* helpers directly only when you need one concern such as training, scoring, or backtesting.

This module carries no logic of its own — it only re-exports the focused submodules (targets, modeling, backtest, forecasting bridge, session) so that existing from cranalytics.predictive import ... call sites keep working. Add new behavior to the appropriate submodule, not here.

`PredictiveModelingSessionResult` `dataclass`

Bases: _SessionResultMapping

`summary() -> pd.DataFrame`

Backtest metrics aggregated across folds (mean AUC/Gini/KS).

`plot(**kwargs: Any) -> Any`

Gini stability across backtest folds. Requires matplotlib.

`run_predictive_backtest(df: pd.DataFrame, feature_cols: list[str], target_col: str, split_col: str, *, model_family: str, window_type: str = 'expanding', n_splits: int = 5, rolling_window_size: int | None = None, model_params: dict | None = None, random_state: int = 42, strict: bool = False) -> pd.DataFrame`

Rolling or expanding temporal OOT backtest.

Splits df by unique values of split_col (e.g., origination_month) in ascending order. No data shuffling — all splits are strictly temporal.

Parameters

df : pd.DataFrame feature_cols : list[str] target_col : str Binary target (0/1/NaN). NaN rows excluded from training and scoring. split_col : str Column defining temporal ordering (e.g., origination_month, vintage). model_family : str Passed to _build_binary_estimator. window_type : str "expanding" — all prior periods in train. "rolling" — fixed-width window of rolling_window_size prior periods. n_splits : int Number of OOT folds. Requires len(unique_periods) >= n_splits + 1. rolling_window_size : int, optional Number of prior periods in each rolling train window. Only used when window_type="rolling". Defaults to n_splits when not provided. model_params : dict, optional random_state : int

Returns

pd.DataFrame One row per fold: split, train_periods, val_periods, n_train, n_val, auc, gini, ks.

`summarize_predictive_backtest(backtest_df: pd.DataFrame, *, by: str | list[str] | None = None) -> pd.DataFrame`

Aggregate backtest metrics across folds.

Parameters

backtest_df : pd.DataFrame Output of run_predictive_backtest. by : str or list[str], optional Column(s) to group by (e.g., "model"). If None, aggregates all rows.

Returns

pd.DataFrame One row per group with mean_auc, mean_gini, mean_ks, n_folds.

`forecast_calendar_chargeoff_from_predictions(scored_df: pd.DataFrame, *, score_col: str, as_of_col: str, current_mob_col: str, hazard_curves: pd.DataFrame, principal_col: str = 'original_principal', segment_col: str | None = None, strict: bool = True) -> pd.DataFrame`

Convert loan-level PD predictions to calendar-month charge-off forecasts.

Algorithm

Per loan: expected_co_dollars = score * original_principal
Look up segment hazard curve (fall back to global profile if missing)
Normalize future chargeoff_hazard_rate from current_mob forward → weights
Allocate expected_co_dollars across future MOBs proportionally
Map future MOB → calendar month via as_of_date + offset
Aggregate by forecast_month

Parameters

scored_df : pd.DataFrame Loan-level. Required: score_col, as_of_col, current_mob_col, principal_col. Optional: segment_col. score_col : str PD probability column (values in [0, 1]). as_of_col : str Snapshot date for each loan. current_mob_col : str Current month-on-book for each loan. hazard_curves : pd.DataFrame Output of fit_flow_hazard_curves(). Required columns: segment_id, month_on_book, chargeoff_hazard_rate. principal_col : str Original principal column (default "original_principal"). segment_col : str, optional Column for segment lookup in hazard_curves. strict : bool If True, raise on missing segment. If False, fall back to global profile.

Returns

pd.DataFrame Columns: forecast_month, expected_chargeoff_amount, loan_count, expected_chargeoff_rate.

`score_model(df: pd.DataFrame, estimator: Any, feature_cols: list[str], *, output_col: str, prediction_type: str) -> pd.DataFrame`

Score a fitted estimator against a DataFrame.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	DataFrame to score. It is the first argument for `.pipe()` compatibility.	required
`estimator`	`Any`	Fitted scikit-learn-compatible estimator.	required
`feature_cols`	`list[str]`	Columns passed to the estimator.	required
`output_col`	`str`	Name for the appended prediction column.	required
`prediction_type`	`str`	`probability`, `class`, or `value`.	required

Returns:

Type	Description
`DataFrame`	Copy of `df` with `output_col` appended.

Examples:

>>> import pandas as pd
>>> from sklearn.linear_model import LogisticRegression
>>> from cranalytics.predictive import score_model
>>> train = pd.DataFrame({"fico": [600, 650, 750, 800], "bad": [1, 1, 0, 0]})
>>> estimator = LogisticRegression().fit(train[["fico"]].to_numpy(), train["bad"])
>>> scored = score_model(
...     train, estimator, ["fico"], output_col="pd", prediction_type="probability"
... )
>>> bool(scored["pd"].between(0, 1).all())
True

`train_binary_model(df: pd.DataFrame, feature_cols: list[str], target_col: str, *, model_family: str, model_params: dict[str, Any] | None = None, calibrate: bool = False, random_state: int = 42, strict: bool = False) -> tuple`

Train a binary classification model.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Modeling frame.	required
`feature_cols`	`list[str]`	Columns passed to the estimator.	required
`target_col`	`str`	Binary 0/1 target. Null rows are dropped before training.	required
`model_family`	`str`	`logistic`, `hgb_classifier`, or `xgboost_classifier`.	required
`model_params`	`dict[str, Any] \| None`	Optional parameters merged over estimator defaults.	`None`
`calibrate`	`bool`	Reserved calibration switch. Currently raises when enabled.	`False`
`random_state`	`int`	Seed forwarded to supported estimators.	`42`
`strict`	`bool`	Whether predictive contract warnings should fail validation.	`False`

Returns:

Type	Description
`tuple`	Tuple of fitted estimator, metadata dictionary, and diagnostics
`tuple`	DataFrame.

Raises:

Type	Description
`NotImplementedError`	If `calibrate=True`.
`ValueError`	If the modeling frame or family is invalid.

Examples:

>>> import pandas as pd
>>> from cranalytics.predictive import train_binary_model
>>> frame = pd.DataFrame({"fico": [600, 650, 750, 800], "bad": [1, 1, 0, 0]})
>>> estimator, metadata, diagnostics = train_binary_model(
...     frame, ["fico"], "bad", model_family="logistic"
... )
>>> metadata["model_family"]
'logistic'
>>> diagnostics["split"].tolist()
['train']

`train_regression_model(df: pd.DataFrame, feature_cols: list[str], target_col: str, *, model_family: str, model_params: dict[str, Any] | None = None, random_state: int = 42, strict: bool = False) -> tuple`

Train a regression model.

Parameters

model_family : str "tweedie", "hgb_regressor", "quantile_hgb", "xgboost_regressor".

Returns

tuple[estimator, dict, pd.DataFrame]

`run(df: pd.DataFrame, feature_cols: list[str], target_col: str, split_col: str, *, model_family: str, model_params: dict | None = None, window_type: str = 'expanding', n_splits: int = 5, rolling_window_size: int | None = None, score_output_col: str = 'prediction', scoring_df: pd.DataFrame | None = None, random_state: int = 42, strict: bool = False) -> PredictiveModelingSessionResult`

Run one end-to-end predictive modeling session.

The representative session trains a binary model on df, scores either df or scoring_df, executes a temporal out-of-time backtest on df, and returns a typed session result with fold-level and summarized outputs.

`assemble_modeling_frame(features_df: pd.DataFrame, targets_df: pd.DataFrame, *, on: str | list[str], leakage_guard: bool = True) -> pd.DataFrame`

Join features and targets with optional leakage detection.

Parameters

features_df : pd.DataFrame targets_df : pd.DataFrame on : str or list[str] Join key(s), typically "loan_id" or ["loan_id", "as_of_date"]. leakage_guard : bool If True, raises if any non-key target column appears in features_df.

Returns

pd.DataFrame Inner join of features and targets.

`build_targets(panel_df: pd.DataFrame, *, mode: str, targets: list[str], target_overrides: dict | None = None) -> pd.DataFrame`

Derive or validate binary/regression targets from a monthly performance panel.

Parameters

panel_df : pd.DataFrame Monthly performance panel. Required columns (panel mode): loan_id, mob, as_of_date, origination_date, original_principal, dpd, chargeoff_amount. mode : str "panel" — derive targets from dpd/chargeoff columns. "prelabeled" — validate and pass through pre-built target columns. targets : list[str] Target column names to derive or validate. target_overrides : dict, optional Per-target config overrides, e.g. {"fpf30_flag": {"dpd_threshold": 60, "mob_horizon": 6}}.

Returns

pd.DataFrame Loan-level DataFrame with loan_id + one column per target. NaN indicates immature (required observation horizon not yet reached).

Predictive Modeling API Reference

Feature Analytics

cranalytics.feature_analytics

calculate_early_performance_rates(df: pd.DataFrame, flag_columns: list[str], weight_col: str | None = None, confidence: float = 0.95) -> pd.DataFrame

compute_conditional_loss_table(df: pd.DataFrame, segment_cols: list[str], flag_col: str, outcome_col: str, n_bins_score: int = 5, score_col: str | None = None, weight_col: str | None = None, confidence: float = 0.95) -> pd.DataFrame

compute_marginal_impact(df: pd.DataFrame, feature_col: str, flag_col: str, control_col: str, n_bins_control: int = 5, n_bins_feature: int | None = None, weight_col: str | None = None) -> pd.DataFrame

compute_segment_rates(df: pd.DataFrame, flag_col: str, group_by: str | list[str], weight_col: str | None = None, n_bins: int | None = None, confidence: float = 0.95) -> pd.DataFrame

compute_woe_iv(df: pd.DataFrame, feature_col: str, flag_col: str, n_bins: int = 10, weight_col: str | None = None) -> tuple[pd.DataFrame, float]

estimate_vintage_lifetime_profit(df: pd.DataFrame, expected_loss_col: str, avg_life_col: str | None = None, avg_life: float | None = None, coupon_rate: float = 0.0, servicing_cost_rate: float = 0.0, funding_cost_rate: float = 0.0) -> pd.DataFrame

Parameters

Returns

rank_features_by_separation(df: pd.DataFrame, feature_cols: list[str], flag_col: str, n_bins: int = 10, weight_col: str | None = None) -> pd.DataFrame

validate_performance_flags(df: pd.DataFrame, flag_columns: list[str]) -> tuple[pd.DataFrame, pd.DataFrame]

engineer_loan_features(df: pd.DataFrame, *, as_of_date: pd.Timestamp | None = None, reference_date: pd.Timestamp | None = None) -> pd.DataFrame

Parameters

Returns

fit_woe_binning(df: pd.DataFrame, feature_cols: list[str], target_col: str, *, special_codes: list | None = None, **binning_kwargs)

Parameters

Returns

lift_gain_table(y_true: pd.Series | np.ndarray, y_prob: pd.Series | np.ndarray, *, n_bins: int = 10) -> pd.DataFrame

Parameters

Returns

calibrate_score_to_event_rate(df: pd.DataFrame, score_col: str, flag_col: str, n_bins: int = 10, weight_col: str | None = None, method: str = 'binned') -> tuple[pd.DataFrame, dict[str, float | str]]

compute_actual_vs_expected(df: pd.DataFrame, score_col: str, flag_col: str, group_by: str | list[str], calibration_table: pd.DataFrame | None = None, weight_col: str | None = None, confidence: float = 0.95, n_bins_group: int | None = None) -> pd.DataFrame

compute_psi(expected: pd.Series, actual: pd.Series, n_bins: int = 10, bin_edges: list[float] | None = None, expected_weights: pd.Series | None = None, actual_weights: pd.Series | None = None) -> tuple[pd.DataFrame, float]

score_performance_monitoring_report(df: pd.DataFrame, score_col: str, flag_col: str, group_by: str | list[str] | None = None, baseline_df: pd.DataFrame | None = None, n_bins: int = 10, weight_col: str | None = None) -> dict

Parameters

Returns

simulate_policy_cutoff(df: pd.DataFrame, flag_col: str, cutoff_col: str, cutoffs: list[float], direction: str = 'min', weight_col: str | None = None) -> pd.DataFrame

Parameters

Returns

Predictive Modeling

cranalytics.predictive

PredictiveModelingSessionResult dataclass

summary() -> pd.DataFrame

plot(**kwargs: Any) -> Any

Parameters

Returns

summarize_predictive_backtest(backtest_df: pd.DataFrame, *, by: str | list[str] | None = None) -> pd.DataFrame

Parameters

Returns

forecast_calendar_chargeoff_from_predictions(scored_df: pd.DataFrame, *, score_col: str, as_of_col: str, current_mob_col: str, hazard_curves: pd.DataFrame, principal_col: str = 'original_principal', segment_col: str | None = None, strict: bool = True) -> pd.DataFrame

Algorithm

Parameters

Returns

score_model(df: pd.DataFrame, estimator: Any, feature_cols: list[str], *, output_col: str, prediction_type: str) -> pd.DataFrame

train_binary_model(df: pd.DataFrame, feature_cols: list[str], target_col: str, *, model_family: str, model_params: dict[str, Any] | None = None, calibrate: bool = False, random_state: int = 42, strict: bool = False) -> tuple

train_regression_model(df: pd.DataFrame, feature_cols: list[str], target_col: str, *, model_family: str, model_params: dict[str, Any] | None = None, random_state: int = 42, strict: bool = False) -> tuple

Parameters

Returns

assemble_modeling_frame(features_df: pd.DataFrame, targets_df: pd.DataFrame, *, on: str | list[str], leakage_guard: bool = True) -> pd.DataFrame

Parameters

Returns

build_targets(panel_df: pd.DataFrame, *, mode: str, targets: list[str], target_overrides: dict | None = None) -> pd.DataFrame

Parameters

Returns

`cranalytics.feature_analytics`

`calculate_early_performance_rates(df: pd.DataFrame, flag_columns: list[str], weight_col: str | None = None, confidence: float = 0.95) -> pd.DataFrame`

`compute_conditional_loss_table(df: pd.DataFrame, segment_cols: list[str], flag_col: str, outcome_col: str, n_bins_score: int = 5, score_col: str | None = None, weight_col: str | None = None, confidence: float = 0.95) -> pd.DataFrame`

`compute_marginal_impact(df: pd.DataFrame, feature_col: str, flag_col: str, control_col: str, n_bins_control: int = 5, n_bins_feature: int | None = None, weight_col: str | None = None) -> pd.DataFrame`

`compute_segment_rates(df: pd.DataFrame, flag_col: str, group_by: str | list[str], weight_col: str | None = None, n_bins: int | None = None, confidence: float = 0.95) -> pd.DataFrame`

`compute_woe_iv(df: pd.DataFrame, feature_col: str, flag_col: str, n_bins: int = 10, weight_col: str | None = None) -> tuple[pd.DataFrame, float]`

`estimate_vintage_lifetime_profit(df: pd.DataFrame, expected_loss_col: str, avg_life_col: str | None = None, avg_life: float | None = None, coupon_rate: float = 0.0, servicing_cost_rate: float = 0.0, funding_cost_rate: float = 0.0) -> pd.DataFrame`

`rank_features_by_separation(df: pd.DataFrame, feature_cols: list[str], flag_col: str, n_bins: int = 10, weight_col: str | None = None) -> pd.DataFrame`

`validate_performance_flags(df: pd.DataFrame, flag_columns: list[str]) -> tuple[pd.DataFrame, pd.DataFrame]`

`engineer_loan_features(df: pd.DataFrame, *, as_of_date: pd.Timestamp | None = None, reference_date: pd.Timestamp | None = None) -> pd.DataFrame`

`fit_woe_binning(df: pd.DataFrame, feature_cols: list[str], target_col: str, *, special_codes: list | None = None, **binning_kwargs)`

`lift_gain_table(y_true: pd.Series | np.ndarray, y_prob: pd.Series | np.ndarray, *, n_bins: int = 10) -> pd.DataFrame`

`calibrate_score_to_event_rate(df: pd.DataFrame, score_col: str, flag_col: str, n_bins: int = 10, weight_col: str | None = None, method: str = 'binned') -> tuple[pd.DataFrame, dict[str, float | str]]`

`compute_actual_vs_expected(df: pd.DataFrame, score_col: str, flag_col: str, group_by: str | list[str], calibration_table: pd.DataFrame | None = None, weight_col: str | None = None, confidence: float = 0.95, n_bins_group: int | None = None) -> pd.DataFrame`

`compute_psi(expected: pd.Series, actual: pd.Series, n_bins: int = 10, bin_edges: list[float] | None = None, expected_weights: pd.Series | None = None, actual_weights: pd.Series | None = None) -> tuple[pd.DataFrame, float]`

`score_performance_monitoring_report(df: pd.DataFrame, score_col: str, flag_col: str, group_by: str | list[str] | None = None, baseline_df: pd.DataFrame | None = None, n_bins: int = 10, weight_col: str | None = None) -> dict`

`simulate_policy_cutoff(df: pd.DataFrame, flag_col: str, cutoff_col: str, cutoffs: list[float], direction: str = 'min', weight_col: str | None = None) -> pd.DataFrame`

`cranalytics.predictive`

`PredictiveModelingSessionResult` `dataclass`

`summary() -> pd.DataFrame`

`plot(**kwargs: Any) -> Any`

`summarize_predictive_backtest(backtest_df: pd.DataFrame, *, by: str | list[str] | None = None) -> pd.DataFrame`

`forecast_calendar_chargeoff_from_predictions(scored_df: pd.DataFrame, *, score_col: str, as_of_col: str, current_mob_col: str, hazard_curves: pd.DataFrame, principal_col: str = 'original_principal', segment_col: str | None = None, strict: bool = True) -> pd.DataFrame`

`score_model(df: pd.DataFrame, estimator: Any, feature_cols: list[str], *, output_col: str, prediction_type: str) -> pd.DataFrame`

`train_binary_model(df: pd.DataFrame, feature_cols: list[str], target_col: str, *, model_family: str, model_params: dict[str, Any] | None = None, calibrate: bool = False, random_state: int = 42, strict: bool = False) -> tuple`

`train_regression_model(df: pd.DataFrame, feature_cols: list[str], target_col: str, *, model_family: str, model_params: dict[str, Any] | None = None, random_state: int = 42, strict: bool = False) -> tuple`

`assemble_modeling_frame(features_df: pd.DataFrame, targets_df: pd.DataFrame, *, on: str | list[str], leakage_guard: bool = True) -> pd.DataFrame`

`build_targets(panel_df: pd.DataFrame, *, mode: str, targets: list[str], target_overrides: dict | None = None) -> pd.DataFrame`