Skip to content

API reference

Generated from docstrings and type hints in the gamfit source. See the topical guides for narrative explanations.

Top-level functions

fit

fit(
    data: Any,
    formula: str,
    *,
    family: str = ...,
    offset: str | None = ...,
    weights: str | None = ...,
    transformation_normal: bool | None = ...,
    survival_likelihood: str | None = ...,
    baseline_target: str | None = ...,
    baseline_scale: float | None = ...,
    baseline_shape: float | None = ...,
    baseline_rate: float | None = ...,
    baseline_makeham: float | None = ...,
    z_column: str | None = ...,
    link: str | None = ...,
    logslope_formula: str | None = ...,
    frailty_kind: str | None = ...,
    frailty_sd: float | None = ...,
    hazard_loading: str | None = ...,
    scale_dimensions: bool | None = ...,
    adaptive_regularization: bool | None = ...,
    firth: bool | None = ...,
    precision_hyperpriors: Any | None = ...,
    response_geometry: None = ...,
    response_columns: list[str] | tuple[str, ...] | None = ...,
    response_coordinates: str | None = ...,
    response_reference: int | None = ...,
    config: dict[str, Any] | None = ...,
) -> Model
fit(
    data: Any,
    formula: str,
    *,
    family: str = ...,
    offset: str | None = ...,
    weights: str | None = ...,
    transformation_normal: bool | None = ...,
    survival_likelihood: str | None = ...,
    baseline_target: str | None = ...,
    baseline_scale: float | None = ...,
    baseline_shape: float | None = ...,
    baseline_rate: float | None = ...,
    baseline_makeham: float | None = ...,
    z_column: str | None = ...,
    link: str | None = ...,
    logslope_formula: str | None = ...,
    frailty_kind: str | None = ...,
    frailty_sd: float | None = ...,
    hazard_loading: str | None = ...,
    scale_dimensions: bool | None = ...,
    adaptive_regularization: bool | None = ...,
    firth: bool | None = ...,
    precision_hyperpriors: Any | None = ...,
    response_geometry: str,
    response_columns: list[str] | tuple[str, ...] | None = ...,
    response_coordinates: str | None = ...,
    response_reference: int | None = ...,
    config: dict[str, Any] | None = ...,
) -> ResponseGeometryModel
fit(
    data: Any,
    formula: str,
    *,
    family: str = "auto",
    offset: str | None = None,
    weights: str | None = None,
    transformation_normal: bool | None = None,
    survival_likelihood: str | None = None,
    baseline_target: str | None = None,
    baseline_scale: float | None = None,
    baseline_shape: float | None = None,
    baseline_rate: float | None = None,
    baseline_makeham: float | None = None,
    z_column: str | None = None,
    link: str | None = None,
    logslope_formula: str | None = None,
    frailty_kind: str | None = None,
    frailty_sd: float | None = None,
    hazard_loading: str | None = None,
    scale_dimensions: bool | None = None,
    adaptive_regularization: bool | None = None,
    firth: bool | None = None,
    precision_hyperpriors: Any | None = None,
    response_geometry: str | None = None,
    response_columns: list[str] | tuple[str, ...] | None = None,
    response_coordinates: str | None = None,
    response_reference: int | None = None,
    config: dict[str, Any] | None = None,
) -> Model | ResponseGeometryModel

Fit a GAM model from a formula and a tabular dataset.

Parameters:

Name Type Description Default
data Any

Input table. Accepts a pandas DataFrame, pyarrow Table, dict of columns, list of records, or any object normalize_table understands.

required
formula str

Wilkinson-style formula string (e.g. "y ~ s(x1) + te(x2, x3)").

required
family str

Likelihood family, or "auto" to infer from the response. Corresponds to the --family CLI flag.

'auto'
offset str | None

Name of the offset column. Corresponds to --offset-column.

None
weights str | None

Name of the observation-weight column. Corresponds to --weights-column.

None
transformation_normal bool | None

Fit a conditional transformation-normal model (h(Y|x) ~ N(0,1))). Corresponds to --transformation-normal.

None
survival_likelihood str | None

Survival likelihood formulation. One of "transformation", "weibull", "location-scale", "marginal-slope", "latent", or "latent-binary". Corresponds to --survival-likelihood.

None
baseline_target str | None

Parametric baseline target for survival models. One of "linear", "weibull", "gompertz", "gompertz-makeham". Corresponds to --baseline-target.

None
baseline_scale float | None

Weibull baseline scale (>0) when baseline_target="weibull". Corresponds to --baseline-scale.

None
baseline_shape float | None

Weibull baseline shape (>0). Corresponds to --baseline-shape.

None
baseline_rate float | None

Gompertz hazard rate (>0) when baseline_target is "gompertz" or "gompertz-makeham". Corresponds to --baseline-rate.

None
baseline_makeham float | None

Makeham additive hazard (>0) when baseline_target="gompertz-makeham". Corresponds to --baseline-makeham.

None
z_column str | None

Name of the latent/observed z-score column used by score-warp families and latent transformation models. Corresponds to --z-column.

None
link str | None

Override the default link function. Corresponds to --link.

None
logslope_formula str | None

Secondary formula for the logslope / score-warp submodel. Corresponds to --logslope-formula.

None
frailty_kind str | None

Frailty family for frailty-aware survival models. One of "gaussian-shift" or "hazard-multiplier". Corresponds to --frailty-kind.

None
response_geometry str | None

Optional manifold-valued response geometry. Use "spherical" for unit-sphere responses, or "simplex" / "clr" / "alr" for strictly positive compositional responses. The base point is the intrinsic Fréchet mean of the training responses, not an extrinsic arithmetic mean.

None
response_columns list[str] | tuple[str, ...] | None

Sequence of response component columns used when response_geometry is set. One scalar Gaussian GAM is fitted for each tangent coordinate.

None
response_coordinates str | None

Coordinate chart for simplex responses: "clr" (default) or "alr". Spherical responses always use ambient tangent coordinates.

None
response_reference int | None

Reference component for "alr" coordinates (default: last column).

None
frailty_sd float | None

Fixed frailty standard deviation. Omit to let latent hazard-multiplier models learn it. Corresponds to --frailty-sd.

None
hazard_loading str | None

Hazard loading for frailty_kind="hazard-multiplier". One of "full" or "loaded-vs-unloaded". Corresponds to --hazard-loading.

None
scale_dimensions bool | None

When True, enables learned per-axis anisotropic length scales on spatial smooths (e.g. multi-dim Duchon / Matern / TPS). Per-axis scales are learned, not specified. Corresponds to --scale-dimensions.

None
adaptive_regularization bool | None

Enable exact local adaptive regularization for compatible spatial smooths. Omit to use the quality-first automatic policy, which leaves it off unless explicitly requested.

None
firth bool | None

Enable Firth bias-reduced estimation. Corresponds to --firth.

None
config dict[str, Any] | None

Escape-hatch dict of extra pipeline keys. Any key already set via a dedicated kwarg wins over the same key in config.

None

Returns:

Type Description
Model

A fitted model object with predict, summary, and save/load helpers.

fit_array

fit_array(
    X: Any,
    Y: Any,
    formula: str,
    *,
    family: str = "auto",
    offset: str | None = None,
    weights: str | None = None,
    transformation_normal: bool | None = None,
    survival_likelihood: str | None = None,
    baseline_target: str | None = None,
    baseline_scale: float | None = None,
    baseline_shape: float | None = None,
    baseline_rate: float | None = None,
    baseline_makeham: float | None = None,
    z_column: str | None = None,
    link: str | None = None,
    logslope_formula: str | None = None,
    frailty_kind: str | None = None,
    frailty_sd: float | None = None,
    hazard_loading: str | None = None,
    scale_dimensions: bool | None = None,
    adaptive_regularization: bool | None = None,
    firth: bool | None = None,
    precision_hyperpriors: Any | None = None,
    config: dict[str, Any] | None = None,
) -> Model

Fit directly from numeric NumPy-compatible arrays.

X is named x0, x1, ... at the formula boundary. A one-column Y is named from the formula response; multi-column Y is named y0, y1, ...

load

load(path: str | Path) -> Model

Load a fitted :class:Model previously written with :meth:Model.save.

Reads the raw bytes from path and dispatches to :func:loads.

Parameters:

Name Type Description Default
path str or Path

Filesystem path to the serialized model file.

required

Returns:

Type Description
Model

Fitted model ready for prediction.

Raises:

Type Description
GamError

If the file cannot be decoded by the Rust engine.

Examples:

>>> model = gamfit.load("model.gam")
>>> model.predict(test_df)

loads

loads(model_bytes: bytes) -> Model

Load a fitted :class:Model from an in-memory bytes payload.

Parameters:

Name Type Description Default
model_bytes bytes

Raw serialized model produced by :meth:Model.save or :meth:Model.saves.

required

Returns:

Type Description
Model

Fitted model ready for prediction.

Raises:

Type Description
GamError

If the payload is malformed or incompatible with the current engine.

Examples:

>>> with open("model.gam", "rb") as fh:
...     model = gamfit.loads(fh.read())

load_posterior

load_posterior(path: str | Path) -> PosteriorSamples

Load a :class:PosteriorSamples archive from disk.

Thin wrapper around :meth:PosteriorSamples.load provided for symmetry with :func:gamfit.load / :func:gamfit.fit at module level.

Parameters:

Name Type Description Default
path str or Path

Filesystem path to an .npz archive previously written by :meth:PosteriorSamples.save.

required

Returns:

Type Description
PosteriorSamples

Reconstructed posterior draws and metadata.

Examples:

>>> draws = gamfit.load_posterior("posterior.npz")
>>> draws.beta.shape
(1000, 42)

competing_risks_cif

competing_risks_cif(
    predictions: Mapping[str, "SurvivalPrediction"]
    | Sequence["SurvivalPrediction"],
    *,
    times: Any,
    endpoint_names: Sequence[str] | None = None,
) -> CompetingRisksCIF

Assemble competing-risks CIFs from cause-specific survival predictions.

cross_fit_shared_precision_groups

cross_fit_shared_precision_groups(
    models: Sequence[Model] | Mapping[str, Model],
    groups: Sequence[SharedPrecisionGroup | Mapping[str, Any]]
    | Mapping[str, Any],
) -> dict[str, dict[str, Any]]

Compute EB precision updates shared across separately fitted models.

For each declared group p, the update is

lambda_p = (N_fits(p) * d_p + 2 * (a_p - 1)) / (sum_q_p + 2 * b_p),

where sum_q_p pools ||beta_p||² + tr(Sigma_pp) over models where the selected term/column/label appears. If a model does not contain the selected block, it is skipped for that group.

validate_formula

validate_formula(
    data: Any,
    formula: str,
    *,
    family: str = "auto",
    offset: str | None = None,
    weights: str | None = None,
    transformation_normal: bool | None = None,
    survival_likelihood: str | None = None,
    baseline_target: str | None = None,
    baseline_scale: float | None = None,
    baseline_shape: float | None = None,
    baseline_rate: float | None = None,
    baseline_makeham: float | None = None,
    z_column: str | None = None,
    link: str | None = None,
    logslope_formula: str | None = None,
    frailty_kind: str | None = None,
    frailty_sd: float | None = None,
    hazard_loading: str | None = None,
    scale_dimensions: bool | None = None,
    adaptive_regularization: bool | None = None,
    firth: bool | None = None,
    config: dict[str, Any] | None = None,
) -> FormulaValidation

Validate a formula against a dataset without fitting.

Accepts every pipeline kwarg that :func:fit accepts, with identical semantics. See :func:fit for parameter documentation.

build_info

build_info() -> dict[str, Any]

Return build/runtime metadata for the Rust extension.

Reports whether gamfit._rust was importable and, when available, the build-time information exposed by the extension (version, commit, feature flags). Useful for bug reports and for confirming a development build is being used.

Returns:

Type Description
dict

Always contains available (bool) and module (str). When the extension loaded, additional engine-specific keys are merged in; otherwise reason describes why import failed.

Examples:

>>> info = gamfit.build_info()
>>> info["available"]
True

cuda_diagnostics

cuda_diagnostics() -> dict[str, object]

Return CUDA loader diagnostics without forcing Rust GPU dispatch.

format_cuda_diagnostics

format_cuda_diagnostics() -> str

Return CUDA loader diagnostics as stable, grep-friendly text.

explain_error

explain_error(exc: BaseException) -> str

Return a short, actionable hint describing how to recover from exc.

Inspects the exception type and returns a one-line suggestion tailored to the gamfit error hierarchy (:class:FormulaError, :class:SchemaMismatchError, :class:PredictionError, :class:GamError, :class:RustExtensionUnavailableError). Unrecognized exceptions fall back to a generic message.

Parameters:

Name Type Description Default
exc BaseException

The exception caught from a gamfit call.

required

Returns:

Type Description
str

Human-readable remediation hint.

Examples:

>>> try:
...     gamfit.fit(df, "y ~ s(nope)")
... except gamfit.GamError as exc:
...     print(gamfit.explain_error(exc))
Check the formula syntax and confirm every referenced column exists.

Fitted model

Model

Model(*, _model_bytes: bytes, _training_table_kind: str | None = None)

formula property

formula: str

The fitted Wilkinson-style formula string.

family_name property

family_name: str

Human-readable family + link name (e.g. "Gaussian Identity").

model_class property

model_class: str

Fitted model class string (e.g. "standard", "survival marginal-slope").

is_survival property

is_survival: bool

True if this is a survival-family model.

is_marginal_slope property

is_marginal_slope: bool

True if this model was fit with a marginal-slope likelihood.

is_transformation_normal property

is_transformation_normal: bool

True if this is a conditional transformation-normal model.

response_name property

response_name: str | None

Name of the response column, inferred from the formula.

Returns None for survival formulas (Surv(...)) and other cases where the left-hand side isn't a single identifier.

training_table_kind property

training_table_kind: str | None

The kind of table the model was fit on.

One of "pandas", "polars", "pyarrow", "numpy", "mapping" (dict of columns), "records" (list of dicts), "rows" (2-D sequence), or None if the input kind wasn't retained. Used as a default return_type for :meth:predict and :meth:diagnose.

group_metadata property

group_metadata: dict[str, Any] | None

Per-group metadata persisted with the fitted model, if present.

deployment_extensions property

deployment_extensions: tuple[dict[str, Any], ...]

No-refit group extensions applied after fitting.

predict

predict(
    data: Any,
    *,
    interval: float | None = None,
    return_type: str | None = None,
    id_column: str | None = None,
    with_uncertainty: bool = False,
) -> Any

Predict from data.

Default return (when id_column and return_type are both omitted) depends on the fitted model class:

  • Gaussian / Binomial / Standard models: a table (dict, pandas DataFrame, pyarrow Table, ...) matching the training table kind with an eta and mean column (plus interval columns when interval is given).
  • Transformation-normal models: a per-row transformed z-score as a 1-D numpy array of shape (n_samples,).
  • Bernoulli marginal-slope: a calibrated probability vector in (0, 1) as a 1-D numpy array of shape (n_samples,).
  • Survival models: a :class:SurvivalPrediction whose .hazard_at, .survival_at, .failure_at, and .cumulative_hazard_at helpers evaluate the fitted hazard surface on a user-supplied time grid.

Passing id_column or return_type switches the array-returning model classes (transformation-normal and Bernoulli marginal-slope) to the table form: a 2-column table (id_column, "z" or "mean") rather than a bare 1-D array. Naively flattening that table with np.asarray(...) / .to_numpy() yields shape (n_samples, 2), which is a common cause of silent broadcasting bugs in downstream metric code that expects a 1-D probability vector. When you need the probabilities as an array after asking for an id column, extract the column explicitly, e.g. out["mean"] / np.asarray(out["mean"], dtype=float).

with_uncertainty (survival only): when True, the returned :class:SurvivalPrediction also carries delta-method standard errors on the survival surface (survival_se) and the linear predictor (eta_se). Only honored for the location-scale survival likelihood mode; requesting with_uncertainty=True with any other survival likelihood ("transformation", "weibull", "marginal-slope", "latent", "latent-binary") or with competing-risks survival models raises an error.

predict_array

predict_array(X: Any, *, interval: float | None = None) -> Any

Predict directly from a numeric NumPy-compatible feature matrix.

Columns are named x0, x1, ... at the Rust formula boundary. The return value is a dense NumPy array with columns ordered as eta, mean, then any uncertainty columns.

summary

summary() -> Summary

Return the model summary (coefficients, family, deviance, REML score).

Returns:

Type Description
Summary

A dict-like :class:Summary containing the fitted formula, family / link name, model class, deviance, REML or LAML score, iteration count, and the per-coefficient table (estimates, standard errors, credible-interval bounds). The summary is cached on first call.

Examples:

>>> model = gamfit.fit(train, "y ~ s(x)")
>>> s = model.summary()
>>> print(s["family_name"], s["deviance"])
>>> s.coefficients_frame()      # pandas DataFrame, requires pandas

smoothing_parameters

smoothing_parameters() -> dict[int, float]

Return fitted smoothing/precision parameters by penalty index.

check

check(data: Any) -> SchemaCheck

Validate data against the model's training schema.

Inexpensive: runs the schema validator only, no prediction. Use this before :meth:predict to surface column-name or type issues as structured :class:SchemaIssue records rather than as a raised :class:SchemaMismatchError.

Parameters:

Name Type Description Default
data Any

Any table-like input (pandas DataFrame, dict of columns, list of records, numpy array, etc.).

required

Returns:

Type Description
SchemaCheck

check.ok is True when the data matches the training schema; otherwise check.issues enumerates the problems. check.raise_for_error() raises ValueError on failure.

Examples:

>>> check = model.check(test_df)
>>> if not check:
...     for issue in check.issues:
...         print(issue.kind, issue.column, issue.message)

report

report(path: str | Path | None = None) -> str

Generate a standalone HTML report of the fitted model.

The report contains the summary table, smooth-term visualisations, and convergence diagnostics. It is self-contained (no external assets), so the file can be emailed or attached to a PR.

Parameters:

Name Type Description Default
path str | Path | None

If given, write the HTML to this path and return the path. If None (default), return the HTML as a string.

None

Returns:

Type Description
str

HTML string (when path is None) or the written path.

Examples:

>>> model.report("report.html")
>>> html = model.report()             # for inline Jupyter display

sample

sample(
    data: Any,
    *,
    samples: int | None = None,
    warmup: int | None = None,
    chains: int | None = None,
    target_accept: float | None = None,
    seed: int | None = None,
) -> PosteriorSamples

Draw from the model's posterior with NUTS.

Returns a :class:PosteriorSamples object carrying the raw (n_draws, n_coeffs) numpy matrix, per-coefficient mean / std / credible intervals, and convergence diagnostics (rhat, ess, converged).

Defaults are dimension-aware — leaving every keyword unset gives you a chain count, warmup length, and total sample budget tuned to the fitted coefficient size (see :func:gam::hmc::NutsConfig::for_dimension on the Rust side). That heuristic already covers most usage; the keywords are there for power users who want a longer run, a different acceptance target, or a fixed seed for reproducibility.

Parameters:

Name Type Description Default
data Any

Table-like input matching the model's training schema; the same input formats accepted by :meth:predict are supported here. For survival models, the entry/exit/event columns are consumed in addition to covariates.

required
samples int | None

Posterior draws per chain after warmup. When omitted, chosen automatically from the coefficient count.

None
warmup int | None

Warmup iterations per chain (defaults to samples when both are left unset, otherwise to the adaptive default).

None
chains int | None

Number of independent chains. Defaults adaptively to 2 or 4.

None
target_accept float | None

Target HMC acceptance rate in (0, 1). Higher values give smaller leapfrog steps and slower-but-more-robust mixing.

None
seed int | None

RNG seed for deterministic chain initialisation.

None
Notes

Sampling currently supports standard GLM family models (Gaussian, Binomial logit/probit/cloglog, Poisson, Gamma — with or without a link-wiggle component) and survival likelihood modes other than the latent and location-scale variants. Unsupported model classes raise :class:gamfit.GamError with a message mirroring the CLI's gam sample behaviour.

sample_paired

sample_paired(
    competing: "Model",
    data: Any,
    competing_data: Any | None = None,
    *,
    samples: int | None = None,
    warmup: int | None = None,
    chains: int | None = None,
    target_accept: float | None = None,
    seed: int | None = None,
) -> PairedPosteriorSamples

Draw this fit and a linked competing fit with paired draw indices.

design_matrix

design_matrix(data: Any) -> Any

Materialised design matrix for data against the saved model.

Returns an (n_rows, n_coeffs) numpy array — exactly the matrix the engine uses internally for linear-predictor evaluation. Useful for custom posterior reasoning (e.g. feeding draws into your own predictive routine) or for debugging term layouts.

Currently restricted to standard non-link-wiggle GAM models; other classes raise a clear error pointing at :meth:Model.predict for the class-specific prediction path.

design_matrix_array

design_matrix_array(X: Any) -> Any

Materialised design matrix for a numeric feature matrix.

predict_with_coverage

predict_with_coverage(
    rows: Any, *, coverage: float = 0.95
) -> tuple[Any, Any, Any, dict[str, Any]]

Predict with covariance-based confidence intervals and group attribution.

Returns (point, lower, upper, per_group_variance_contributions). The first three entries are numpy arrays on the response-mean scale. The fourth entry is a covariance-block variance decomposition: per-group arrays contain x_g' Cov(beta_g, beta_g) x_g and cross-term arrays contain 2 x_g' Cov(beta_g, beta_h) x_h.

difference_smooth

difference_smooth(
    *,
    view: str,
    group: str | None = None,
    pairs: Sequence[tuple[Any, Any]] | None = None,
    n: int = 100,
    level: float = 0.95,
    simultaneous: bool = False,
    n_sim: int = 10000,
    seed: int | None = 12345,
    marginalise_random: bool = True,
    group_means: bool = True,
    data: Any | None = None,
    return_type: str | None = None,
) -> Any

Covariance-aware pairwise difference smooths.

Builds two model matrices on a grid, subtracts them, and uses the fitted joint coefficient covariance for pointwise bands. With simultaneous=True the band critical value is estimated from posterior coefficient simulation using the max standardized deviation across the whole grid.

save

save(path: str | Path) -> None

Serialise the fitted model to path.

Writes a self-contained binary .gam file that :func:gamfit.load round-trips.

Examples:

>>> model.save("model.gam")
>>> loaded = gamfit.load("model.gam")

extend_with_group

extend_with_group(
    new_group_spec: dict[str, Any],
    metadata: Any | None = None,
    prior: Any | None = None,
) -> "Model"

Return a no-refit model extended with deployment-time group levels.

new_group_spec currently targets an existing random-effect term: {"kind": "random-effect-level", "term": "group_term", "level": "new"} or {"term": "group_term", "levels": ["a", "b"]}. The returned model reuses the fitted coefficients and inserts zero-initialized coefficients, or prior["mean"] / prior["mu"] when supplied.

dumps

dumps() -> bytes

Return the serialised model as raw bytes.

Useful for in-memory transport. :func:gamfit.loads is the inverse.

Examples:

>>> blob = model.dumps()
>>> loaded = gamfit.loads(blob)

diagnose

diagnose(
    data: Any, *, y: str | None = None, interval: float | None = 0.95
) -> Diagnostics

Score the fitted model on held-out data.

Calls :meth:predict on the feature columns of data and compares the result against the observed response, packaging the prediction, residuals, observed values, and (when requested) Wald bands into a :class:Diagnostics object. Useful for ad-hoc held-out checks and for feeding the :meth:plot method.

Parameters:

Name Type Description Default
data table - like

Any table-like input accepted by :meth:predict that also carries the response column.

required
y str

Name of the response column. Defaults to :attr:response_name; required when that cannot be inferred (e.g. survival formulas).

None
interval float or None

Pointwise Wald-interval probability passed through to :meth:predict. Set to None to skip interval columns. Defaults to 0.95.

0.95

Returns:

Type Description
Diagnostics

A :class:Diagnostics record containing the formula, response name, observed values, the predicted table, and residuals.

Raises:

Type Description
ValueError

If the response column cannot be inferred or is missing from data.

Examples:

>>> diag = model.diagnose(test_df)
>>> diag.rmse, diag.r_squared
(0.42, 0.81)
>>> diag.predicted["mean"][:3]
[1.04, 1.21, 0.99]
See Also

Model.predict Model.plot

plot

plot(
    data: Any,
    *,
    x: str | None = None,
    y: str | None = None,
    interval: float | None = 0.95,
    kind: str = "prediction",
    ax: Any | None = None,
) -> Any

Plot the model's behaviour on data with matplotlib.

Runs :meth:diagnose against data and then renders one of three standard diagnostic plots onto a matplotlib Axes.

Parameters:

Name Type Description Default
data table - like

Held-out data with the response column present (same requirements as :meth:diagnose).

required
x str

Feature column to plot on the x-axis when kind="prediction". Inferred automatically when there is exactly one non-response feature column.

None
y str

Response column name. Defaults to :attr:response_name.

None
interval float or None

Pointwise Wald-interval probability for the shaded band on prediction plots. Ignored for residuals and observed_vs_predicted plots. Defaults to 0.95.

0.95
kind ('prediction', 'residuals', 'observed_vs_predicted')
  • "prediction" (default) — mean curve over x with a pointwise Wald band and observed scatter overlay.
  • "residuals" — residuals vs predicted mean.
  • "observed_vs_predicted" — observed vs predicted with a reference y = x line.
"prediction"
ax Axes

Existing axes to draw onto. When omitted, a fresh Axes is created via plt.subplots().

None

Returns:

Type Description
Axes

The axes that were drawn on.

Raises:

Type Description
ValueError

If kind is not one of the supported choices, or if x cannot be inferred for a multi-feature prediction plot, or if the named x column is missing from data.

Examples:

>>> model.plot(test_df)                       # prediction with band
>>> model.plot(test_df, kind="residuals")
>>> ax = model.plot(test_df, kind="observed_vs_predicted")
>>> ax.set_title("Calibration on held-out fold")
See Also

Model.diagnose Model.predict

SurvivalPrediction dataclass

SurvivalPrediction(
    model_class: str,
    parameters: Any,
    parameter_names: Sequence[str] = tuple(),
    times: Any | None = None,
    hazard: Any | None = None,
    survival: Any | None = None,
    cumulative_hazard: Any | None = None,
    linear_predictor: Any | None = None,
    id_column: str | None = None,
    row_ids: Sequence[str] | None = None,
    survival_se: Any | None = None,
    eta_se: Any | None = None,
)

Per-row survival functions evaluated on demand.

Returned by :meth:Model.predict for survival-family models. The *_at helpers (:meth:hazard_at, :meth:cumulative_hazard_at, :meth:survival_at, :meth:failure_at) evaluate the fitted hazard surface at any user-supplied time grid.

When the FFI produced a dense (n_samples, n_times) grid of hazard / survival / cumulative-hazard values, the *_at helpers linearly interpolate against that grid. Otherwise they fall back to the legacy plug-in piecewise-constant hazard reconstructed from parameters so bare-dataclass construction keeps working.

For very large queries (n_rows * n_times exceeds roughly one million cells), the *_at helpers internally evaluate the surface in blocks via the matching *_at_chunks generator and then assemble the dense result; callers that want to avoid the dense allocation can iterate the chunk generators directly or stream a CSV with :meth:write_survival_at_csv.

Attributes:

Name Type Description
model_class str

The fitted model class string (e.g. "survival marginal-slope").

parameters ndarray

Flat per-row parameters returned by the FFI. Shape (n_samples, n_params_per_row). The exact column semantics depend on model_class; callers should treat this as opaque and prefer the *_at helpers.

parameter_names tuple of str

Column names corresponding to parameters, in order.

times ndarray or None

Shared 1-D time grid at which the hazard surfaces were evaluated.

hazard ndarray or None

(n_samples, len(times)) dense hazard surface from the FFI.

survival ndarray or None

(n_samples, len(times)) dense survival surface from the FFI.

cumulative_hazard ndarray or None

(n_samples, len(times)) dense cumulative-hazard surface from the FFI.

linear_predictor ndarray or None

(n_samples,) per-row linear predictor at each row's own exit time.

id_column str or None

Optional name of the id column carried through from :meth:Model.predict for use by :meth:write_survival_at_csv.

row_ids sequence of str or None

Per-row identifiers aligned with parameters rows, populated when id_column was supplied to :meth:Model.predict.

survival_se ndarray or None

(n_samples, len(times)) delta-method standard errors on the survival surface (response scale). None unless the prediction was issued with with_uncertainty=True; then populated for location-scale survival models.

eta_se ndarray or None

(n_samples,) delta-method SE on the linear predictor at each row's own exit time, under the same conditions as survival_se.

Examples:

>>> import numpy as np
>>> pred = model.predict(test_df)        # survival model
>>> times = np.linspace(0.0, 10.0, 50)
>>> S = pred.survival_at(times)          # (n_rows, 50) ndarray
>>> h = pred.hazard_at(times)
>>> H = pred.cumulative_hazard_at(times)
See Also

Model.predict : Returns a :class:SurvivalPrediction for survival models.

hazard_at

hazard_at(times: Any) -> Any

Evaluate the hazard rate h(t) at each requested time.

When the FFI produced a dense hazard surface this linearly interpolates against the returned grid; otherwise the hazard is reconstructed from the cumulative-hazard differences. Large requests are evaluated in chunks internally before assembling the dense result.

Parameters:

Name Type Description Default
times array_like

1-D sequence of finite, non-negative times at which to evaluate the per-row hazard.

required

Returns:

Type Description
ndarray

(n_samples, len(times)) array of non-negative hazard values, one row per prediction sample.

Examples:

>>> import numpy as np
>>> pred = model.predict(test_df)
>>> h = pred.hazard_at(np.linspace(0.0, 5.0, 11))
>>> h.shape
(len(test_df), 11)
See Also

SurvivalPrediction.hazard_at_chunks : streaming chunked variant. SurvivalPrediction.cumulative_hazard_at

cumulative_hazard_at

cumulative_hazard_at(times: Any) -> Any

Evaluate the cumulative hazard H(t) = -log S(t).

When the FFI provided a dense cumulative-hazard surface this interpolates against it directly; otherwise H(t) is derived from :meth:survival_at via -log S(t) (clipped away from zero for numerical safety).

Parameters:

Name Type Description Default
times array_like

1-D sequence of finite, non-negative times.

required

Returns:

Type Description
ndarray

(n_samples, len(times)) array of non-negative cumulative hazard values.

Examples:

>>> import numpy as np
>>> H = pred.cumulative_hazard_at(np.array([1.0, 2.0, 5.0]))
>>> np.all(np.diff(H, axis=1) >= 0)   # monotone non-decreasing
True
See Also

SurvivalPrediction.survival_at SurvivalPrediction.hazard_at

survival_at

survival_at(times: Any) -> Any

Evaluate the survival probability S(t) at each requested time.

When the FFI produced a dense hazard/survival surface this linearly interpolates against the returned grid. Otherwise it falls back to the plug-in identity S(t) = exp(-H(t)) using a per-row piecewise-constant hazard derived from parameters (supports bare-dataclass construction). Large requests are evaluated in chunks internally before assembling the dense result.

Parameters:

Name Type Description Default
times array_like

1-D sequence of finite, non-negative times.

required

Returns:

Type Description
ndarray

(n_samples, len(times)) array of survival probabilities in [0, 1].

Examples:

>>> import numpy as np
>>> times = np.linspace(0.0, 5.0, 6)
>>> S = pred.survival_at(times)
>>> S[:, 0]                  # S(0) is 1 for every row
array([1., 1., ..., 1.])
See Also

SurvivalPrediction.failure_at : returns 1 - S(t). SurvivalPrediction.survival_se_at : delta-method standard error. SurvivalPrediction.survival_at_chunks : streaming chunked variant.

failure_at

failure_at(times: Any) -> Any

Evaluate the failure (event) probability F(t) = 1 - S(t).

Convenience wrapper around :meth:survival_at; the output is clipped to [0, 1] to guard against tiny interpolation excursions.

Parameters:

Name Type Description Default
times array_like

1-D sequence of finite, non-negative times.

required

Returns:

Type Description
ndarray

(n_samples, len(times)) array of failure probabilities in [0, 1].

Examples:

>>> F = pred.failure_at([1.0, 5.0, 10.0])
>>> F.shape[1]
3
See Also

SurvivalPrediction.survival_at

survival_se_at

survival_se_at(times: Any) -> Any

Delta-method standard error on S(t) at each requested time.

Returns None when the prediction was not issued with with_uncertainty=True (or the model class does not yet support response-scale uncertainty). When available, the returned array has shape (n_samples, len(times)) and is clipped to be non-negative.

Parameters:

Name Type Description Default
times array_like

1-D sequence of finite, non-negative times.

required

Returns:

Type Description
ndarray or None

(n_samples, len(times)) array of standard errors on the survival surface, or None if no uncertainty was requested.

Notes

Pair with :meth:survival_at for response-scale Wald-style bands: S +/- z * SE with the standard caveats around the Gaussian approximation near the [0, 1] boundaries.

Examples:

>>> pred = model.predict(test_df, with_uncertainty=True)
>>> S = pred.survival_at([1.0, 2.0])
>>> SE = pred.survival_se_at([1.0, 2.0])
>>> lower = (S - 1.96 * SE).clip(0.0, 1.0)
See Also

SurvivalPrediction.survival_at Model.predict : pass with_uncertainty=True to populate this.

survival_at_chunks

survival_at_chunks(
    times: Any,
    *,
    people_chunk: int = DEFAULT_SURVIVAL_PEOPLE_CHUNK,
    time_grid_chunk: int = DEFAULT_SURVIVAL_TIME_GRID_CHUNK,
) -> Any

Yield S(t) evaluations in row/time blocks.

Streaming counterpart to :meth:survival_at for queries large enough that the dense (n_samples, len(times)) allocation is unwelcome. Each yielded block can be consumed (written to disk, reduced, fed into a metric) and discarded before the next one is produced.

Parameters:

Name Type Description Default
times array_like

1-D sequence of finite, non-negative times.

required
people_chunk int

Maximum number of rows per yielded block. Defaults to DEFAULT_SURVIVAL_PEOPLE_CHUNK (50 000).

DEFAULT_SURVIVAL_PEOPLE_CHUNK
time_grid_chunk int

Maximum number of time points per yielded block. Defaults to DEFAULT_SURVIVAL_TIME_GRID_CHUNK (64).

DEFAULT_SURVIVAL_TIME_GRID_CHUNK

Yields:

Type Description
tuple of (slice, slice, ndarray)

(row_slice, time_slice, block) where block has shape (row_slice.stop - row_slice.start, time_slice.stop - time_slice.start) and the slices index into the full (n_samples, len(times)) result.

Examples:

>>> import numpy as np
>>> times = np.linspace(0.0, 10.0, 200)
>>> total = 0.0
>>> for _r, _t, block in pred.survival_at_chunks(times):
...     total += float(block.sum())
See Also

SurvivalPrediction.survival_at SurvivalPrediction.write_survival_at_csv

cumulative_hazard_at_chunks

cumulative_hazard_at_chunks(
    times: Any,
    *,
    people_chunk: int = DEFAULT_SURVIVAL_PEOPLE_CHUNK,
    time_grid_chunk: int = DEFAULT_SURVIVAL_TIME_GRID_CHUNK,
) -> Any

Yield H(t) evaluations in row/time blocks.

Streaming counterpart to :meth:cumulative_hazard_at. When the FFI provided a dense cumulative-hazard surface this iterates that surface directly; otherwise it derives H(t) from each survival block returned by :meth:survival_at_chunks.

Parameters:

Name Type Description Default
times array_like

1-D sequence of finite, non-negative times.

required
people_chunk int

Maximum number of rows per yielded block. Defaults to DEFAULT_SURVIVAL_PEOPLE_CHUNK.

DEFAULT_SURVIVAL_PEOPLE_CHUNK
time_grid_chunk int

Maximum number of time points per yielded block. Defaults to DEFAULT_SURVIVAL_TIME_GRID_CHUNK.

DEFAULT_SURVIVAL_TIME_GRID_CHUNK

Yields:

Type Description
tuple of (slice, slice, ndarray)

(row_slice, time_slice, block) of cumulative-hazard values with shape matching the slice extents.

Examples:

>>> for r, t, H_block in pred.cumulative_hazard_at_chunks(times):
...     handle.write(H_block.tobytes())
See Also

SurvivalPrediction.cumulative_hazard_at SurvivalPrediction.survival_at_chunks

hazard_at_chunks

hazard_at_chunks(
    times: Any,
    *,
    people_chunk: int = DEFAULT_SURVIVAL_PEOPLE_CHUNK,
    time_grid_chunk: int = DEFAULT_SURVIVAL_TIME_GRID_CHUNK,
) -> Any

Yield h(t) evaluations in row/time blocks.

Streaming counterpart to :meth:hazard_at. When the FFI provided a dense hazard surface this iterates that surface directly; otherwise the hazard is derived from successive cumulative-hazard blocks, carrying the previous block's tail forward so the finite-difference at each block boundary stays consistent with the non-chunked :meth:hazard_at result.

Parameters:

Name Type Description Default
times array_like

1-D sequence of finite, non-negative times.

required
people_chunk int

Maximum number of rows per yielded block. Defaults to DEFAULT_SURVIVAL_PEOPLE_CHUNK.

DEFAULT_SURVIVAL_PEOPLE_CHUNK
time_grid_chunk int

Maximum number of time points per yielded block. Defaults to DEFAULT_SURVIVAL_TIME_GRID_CHUNK.

DEFAULT_SURVIVAL_TIME_GRID_CHUNK

Yields:

Type Description
tuple of (slice, slice, ndarray)

(row_slice, time_slice, block) of non-negative hazard values with shape matching the slice extents.

Examples:

>>> peak = 0.0
>>> for _r, _t, h_block in pred.hazard_at_chunks(times):
...     peak = max(peak, float(h_block.max()))
See Also

SurvivalPrediction.hazard_at SurvivalPrediction.cumulative_hazard_at_chunks

write_survival_at_csv

write_survival_at_csv(
    path: str | Path,
    times: Any,
    *,
    people_chunk: int = DEFAULT_SURVIVAL_PEOPLE_CHUNK,
    time_grid_chunk: int = DEFAULT_SURVIVAL_TIME_GRID_CHUNK,
) -> str

Stream survival predictions to a CSV file.

Iterates :meth:survival_at_chunks and writes one row per (prediction_row, time) pair, avoiding materialising the full (n_samples, len(times)) matrix in memory. When the prediction was issued with an id_column (via :meth:Model.predict), that column is included.

Parameters:

Name Type Description Default
path str or Path

Destination CSV file. Overwritten if it already exists.

required
times array_like

1-D sequence of finite, non-negative times at which to evaluate S(t).

required
people_chunk int

Maximum number of rows per internal block. Defaults to DEFAULT_SURVIVAL_PEOPLE_CHUNK.

DEFAULT_SURVIVAL_PEOPLE_CHUNK
time_grid_chunk int

Maximum number of time points per internal block. Defaults to DEFAULT_SURVIVAL_TIME_GRID_CHUNK.

DEFAULT_SURVIVAL_TIME_GRID_CHUNK

Returns:

Type Description
str

The string form of path.

Notes

Columns written are row, time, survival (or row, <id_column>, time, survival when an id column is present). The file is opened in text mode with UTF-8 encoding.

Examples:

>>> import numpy as np
>>> pred = model.predict(test_df, id_column="patient_id")
>>> pred.write_survival_at_csv(
...     "survival.csv", np.linspace(0.0, 10.0, 64)
... )
'survival.csv'
See Also

SurvivalPrediction.survival_at_chunks

CompetingRisksPrediction dataclass

CompetingRisksPrediction(
    model_class: str,
    likelihood_mode: str,
    endpoint_names: tuple[str, ...],
    times: Any,
    hazard: Any,
    survival: Any,
    cumulative_hazard: Any,
    cif: Any,
    overall_survival: Any,
    linear_predictor: Any,
    columns: dict[str, list[float]],
)

Rust-computed joint cause-specific competing-risks prediction.

CompetingRisksCIF dataclass

CompetingRisksCIF(
    times: Any,
    cif: Any,
    overall_survival: Any,
    cumulative_hazard: Any,
    endpoint_names: tuple[str, ...],
)

Cause-specific cumulative incidence assembled by the Rust core.

Posterior sampling

SamplingConfig dataclass

SamplingConfig(
    n_samples: int,
    n_warmup: int,
    n_chains: int,
    target_accept: float,
    seed: int,
)

Echo of the NUTS configuration the engine ran with.

All fields are populated from the FFI payload so callers can reconstruct exactly which sampler invocation produced the draws — useful for reproducibility logs and for telling whether an explicit samples=... request was honored or auto-derived from the model dimension.

Attributes:

Name Type Description
n_samples int

Post-warmup draws kept per chain.

n_warmup int

Warmup draws discarded per chain before collecting n_samples.

n_chains int

Number of independent NUTS chains run by the engine.

target_accept float

Step-size adaptation target acceptance probability in (0, 1).

seed int

RNG seed actually consumed by the sampler.

Examples:

>>> post = model.sample(samples=500)
>>> post.config.n_samples
500
>>> post.config.target_accept
0.95

to_dict

to_dict() -> dict[str, Any]

Return the config as a plain JSON-serialisable dict.

Returns:

Type Description
dict[str, Any]

Mapping with keys n_samples, n_warmup, n_chains, target_accept, seed.

Examples:

>>> cfg = SamplingConfig(500, 1000, 4, 0.95, 42)
>>> cfg.to_dict()["n_chains"]
4

PosteriorSamples dataclass

PosteriorSamples(
    samples: Any,
    coefficient_names: tuple[str, ...],
    mean: Any,
    std: Any,
    rhat: float,
    ess: float,
    converged: bool,
    method: str,
    model_class: str,
    family_kind: str,
    config: SamplingConfig,
    _model_bytes: bytes = _NO_MODEL,
    _name_index: Mapping[str, int] = dict(),
)

Posterior draws over the model's coefficient vector.

Returned by :meth:gamfit.Model.sample. This is the user-facing surface for posterior reasoning: a numpy-first container with named-column subscripting, credible-interval helpers, posterior predictive utilities, .save / :meth:load round-trip, trace plotting, a concise :meth:__repr__, and a notebook-friendly rich-HTML representation (_repr_html_) that delegates to :meth:summary.

Attributes:

Name Type Description
samples ndarray

(n_draws, n_coeffs) numpy array of draws. n_draws is n_chains * n_samples (warmup is already discarded by the engine).

coefficient_names tuple[str, ...]

Column labels for samples. Currently the FFI emits ("beta_0", "beta_1", ...); future releases may carry the same names the fitted model exposes via :class:Summary.

mean ndarray

Per-coefficient posterior mean reported by the sampler.

std ndarray

Per-coefficient posterior standard deviation reported by the sampler.

rhat float

Maximum split-Rhat across coefficients (exact NUTS only; 1.0 exactly for Laplace iid draws).

ess float

Minimum effective sample size across coefficients.

converged bool

Boolean convenience for rhat < 1.1.

method str

"nuts" for exact NUTS, "laplace" for the Gaussian Laplace approximation around the fitted joint mode.

model_class str

Saved-model predictive class string the draws came from.

family_kind str

Inverse-link tag ("identity", "logit", "probit", "cloglog", "log", ...). Used by :meth:predict to push draws through the correct inverse link.

config SamplingConfig

:class:SamplingConfig recording the chain count, warmup, target_accept, and seed actually used.

Examples:

>>> post = model.sample(samples=1000, warmup=1000, chains=4)
>>> post.n_draws, post.n_coeffs
(4000, 12)
>>> post["x1"].mean()
0.342
>>> bands = post.predict(new_data, level=0.9)
>>> post.save("posterior.npz")

n_draws property

n_draws: int

Total number of post-warmup draws across all chains.

Returns:

Type Description
int

n_chains * n_samples; the leading axis length of :attr:samples.

Examples:

>>> post.n_draws
4000

n_coeffs property

n_coeffs: int

Number of model coefficients (columns of :attr:samples).

Returns:

Type Description
int

Trailing axis length of :attr:samples.

Examples:

>>> post.n_coeffs
12

shape property

shape: tuple[int, int]

Shape of the underlying draws matrix.

Returns:

Type Description
tuple[int, int]

(n_draws, n_coeffs).

Examples:

>>> post.shape
(4000, 12)

is_exact property

is_exact: bool

Whether the draws are exact NUTS rather than Laplace iid.

Returns:

Type Description
bool

True if :attr:method is "nuts", False for "laplace" (the Gaussian Laplace approximation).

Examples:

>>> post = model.sample(samples=1000)
>>> post.is_exact
True

from_ffi_payload classmethod

from_ffi_payload(
    payload: Mapping[str, Any], *, model_bytes: bytes = _NO_MODEL
) -> "PosteriorSamples"

Internal factory: build a :class:PosteriorSamples from the FFI payload.

Used by :meth:gamfit.Model.sample to wrap the dict produced by the Rust sampler. End users should not call this directly.

Parameters:

Name Type Description Default
payload Mapping[str, Any]

Decoded FFI JSON payload. Must contain n_draws, n_coeffs, samples_flat (row-major), rhat, ess, converged and may contain coefficient_names, posterior_mean, posterior_std, method, model_class, family_kind, and config.

required
model_bytes bytes

Saved-model byte blob to bundle so downstream methods like :meth:predict work without the user re-passing the model.

_NO_MODEL

Returns:

Type Description
PosteriorSamples

Reified posterior with samples reshaped to (n_draws, n_coeffs).

Notes

samples_flat is sent flat (row-major) so we round-trip through numpy.reshape once. Building a nested list of lists from JSON would otherwise dominate decode time for biobank-scale draws.

from_ffi_json classmethod

from_ffi_json(
    raw: str, *, model_bytes: bytes = _NO_MODEL
) -> "PosteriorSamples"

Internal factory: build a :class:PosteriorSamples from a raw FFI JSON string.

Thin convenience around :meth:from_ffi_payload that decodes the JSON itself. Used by :meth:gamfit.Model.sample; not intended as a public API.

Parameters:

Name Type Description Default
raw str

JSON-encoded FFI payload from the Rust sampler.

required
model_bytes bytes

Saved-model byte blob bundled into the returned object.

_NO_MODEL

Returns:

Type Description
PosteriorSamples

Same as :meth:from_ffi_payload.

to_numpy

to_numpy() -> Any

Return the raw draws as a numpy array.

Returns:

Type Description
ndarray

(n_draws, n_coeffs) view of :attr:samples (not a copy).

Examples:

>>> arr = post.to_numpy()
>>> arr.shape
(4000, 12)

to_pandas

to_pandas() -> Any

Return draws as a pandas DataFrame with named coefficient columns.

Returns:

Type Description
DataFrame

(n_draws, n_coeffs) DataFrame whose columns are :attr:coefficient_names.

Examples:

>>> df = post.to_pandas()
>>> df.columns.tolist()[:2]
['beta_0', 'beta_1']
>>> df["beta_1"].mean()
0.342

interval

interval(level: float = 0.95) -> Any

Equal-tailed credible interval for each coefficient.

Parameters:

Name Type Description Default
level float

Coverage probability in (0, 1). Default 0.95.

0.95

Returns:

Type Description
ndarray

(n_coeffs, 2) array of (lower, upper) bounds at the requested coverage.

Raises:

Type Description
ValueError

If level is not strictly between 0 and 1.

Examples:

>>> ci = post.interval(level=0.9)
>>> ci.shape
(12, 2)

summary

summary(level: float = 0.95) -> Summary

Per-coefficient posterior summary as a :class:Summary.

Parameters:

Name Type Description Default
level float

Coverage probability for the credible interval columns, in (0, 1). Default 0.95.

0.95

Returns:

Type Description
Summary

Coefficient rows (index, name, estimate, std_error, ci_lower, ci_upper) plus top-level convergence diagnostics (rhat, ess, converged), sampler method, and the :class:SamplingConfig echo. Renders nicely in a notebook via :class:Summary HTML.

Notes

The payload mirrors what :meth:gamfit.Model.summary returns for fitted models, so downstream rendering helpers work uniformly on both fitted and sampled views.

Examples:

>>> post.summary(level=0.95)
Summary(method='nuts', n_coeffs=12, rhat=1.0021, converged=True)

predict

predict(
    new_data: Any, *, chunk_size: int | None = 4096, level: float = 0.95
) -> dict[str, Any]

Posterior credible bands for eta and E[y | x] on new data.

Parameters:

Name Type Description Default
new_data Any

Tabular new data (DataFrame, dict of columns, or any object accepted by the engine's table normaliser) at which to evaluate the posterior fitted means.

required
chunk_size int or None

Number of prediction rows processed at once. Default 4096. Pass None to disable chunking and form the full (n_draws, n_rows) matrix (consider :meth:predict_draws instead in that case).

4096
level float

Coverage probability for the credible bands in (0, 1). Default 0.95.

0.95

Returns:

Type Description
dict[str, ndarray]

Six length-n_rows arrays: eta_mean, eta_lower, eta_upper (link scale) and mean, mean_lower, mean_upper (response scale, inverse link applied).

Raises:

Type Description
RuntimeError

If this :class:PosteriorSamples was loaded from disk without a model context.

NotImplementedError

For model classes lacking a closed-form design matrix (e.g. link-wiggle, survival) — use :meth:gamfit.Model.predict instead.

Notes

Walks chunks of rows through draws @ X.T and reduces each chunk to quantiles immediately, so memory stays bounded at roughly n_draws * chunk_size floats regardless of the prediction-set size. For Laplace-method posteriors the returned bands match what model.predict(new_data, interval=level) produces analytically, up to Monte Carlo error.

Examples:

>>> bands = post.predict(new_data, level=0.9)
>>> bands["mean_lower"].shape
(50,)
>>> bands["mean_upper"][0]
0.812

predict_draws

predict_draws(new_data: Any) -> PosteriorPredictive

Full posterior fitted-mean draws on new data.

Parameters:

Name Type Description Default
new_data Any

Tabular new data (DataFrame, dict of columns, or any object accepted by the engine's table normaliser).

required

Returns:

Type Description
PosteriorPredictive

Container whose :attr:PosteriorPredictive.eta and :attr:PosteriorPredictive.mean are (n_draws, n_rows) matrices on the link and response scales respectively.

Raises:

Type Description
RuntimeError

If this :class:PosteriorSamples was loaded from disk without a model context.

Notes

Materialises the full (n_draws, n_rows) matrix in memory. For very large prediction sets prefer :meth:predict, which streams per-row credible bands chunk-by-chunk.

Examples:

>>> pp = post.predict_draws(new_data)
>>> pp.shape
(4000, 50)
>>> pp.mean.std(axis=0).mean()
0.087

save

save(path: str | Path) -> str

Save the posterior to an .npz archive.

Parameters:

Name Type Description Default
path str or Path

Destination .npz file path.

required

Returns:

Type Description
str

String form of the resolved output path.

Notes

The archive carries the full (n_draws, n_coeffs) samples matrix, the per-coefficient mean and std, convergence diagnostics, method / class / family tags, the :class:SamplingConfig, and the saved model bytes (so :meth:predict continues to work after a round-trip via :meth:load).

Examples:

>>> post.save("posterior.npz")
'posterior.npz'
>>> reloaded = PosteriorSamples.load("posterior.npz")

load classmethod

load(path: str | Path) -> 'PosteriorSamples'

Load a :class:PosteriorSamples from an .npz archive.

Parameters:

Name Type Description Default
path str or Path

Path to an archive previously written by :meth:save.

required

Returns:

Type Description
PosteriorSamples

Reconstructed posterior, including bundled model bytes so :meth:predict keeps working.

Notes

The archive uses allow_pickle=True to round-trip the JSON metadata stored as a 0-d object array; only load archives you produced via :meth:save.

Examples:

>>> post.save("posterior.npz")
'posterior.npz'
>>> reloaded = PosteriorSamples.load("posterior.npz")
>>> reloaded.n_draws == post.n_draws
True

plot_trace

plot_trace(
    *, coefficients: Any = None, max_panels: int = 8, ax: Any = None
) -> Any

Matplotlib trace + marginal-density plot.

Parameters:

Name Type Description Default
coefficients None, str, int, or iterable of str/int

Coefficients to plot. If None, auto-selects the first max_panels coefficients. A single name or integer index plots one panel row; an iterable plots one row per element.

None
max_panels int

Cap on the number of panel rows when coefficients is None. Default 8.

8
ax numpy.ndarray of matplotlib Axes

Pre-existing 2-D axes array of shape (n_panels, 2). If None, a fresh (n_panels, 2) figure is created.

None

Returns:

Type Description
Figure

The figure containing the trace and density panels.

Raises:

Type Description
ValueError

If the resolved coefficient selection is empty.

Notes

Each row has two panels: trace (draws vs iteration index) on the left and a marginal density histogram on the right.

Examples:

>>> fig = post.plot_trace()
>>> fig = post.plot_trace(coefficients=["beta_0", "beta_1"])
>>> fig.savefig("trace.png")

PairedPosteriorSamples dataclass

PairedPosteriorSamples(
    target: "PosteriorSamples", competing: "PosteriorSamples"
)

Posterior samples from two linked fits with draw rows paired by index.

cumulative_incidence

cumulative_incidence(
    new_data: Any, times: Any, *, level: float = 0.95
) -> CumulativeIncidenceDraws

Compute target-cause CIF draws using paired target/competing rows.

PosteriorPredictive dataclass

PosteriorPredictive(eta: Any, mean: Any, family_kind: str, model_class: str)

Per-row posterior fitted-mean draws on the link and response scales.

Returned by :meth:PosteriorSamples.predict_draws, this container holds the full (n_draws, n_rows) matrices of fitted-mean draws on both the linear-predictor (eta) and response (mean) scales, along with link/class metadata used to re-apply the inverse link on demand.

Attributes:

Name Type Description
eta ndarray

(n_draws, n_rows) float matrix of draws on the link scale.

mean ndarray

(n_draws, n_rows) float matrix of draws pushed through the model's inverse link (mean response scale).

family_kind str

Inverse-link tag emitted by the engine ("identity", "logit", "probit", "cloglog", "log", ...).

model_class str

Saved-model predictive class string the underlying :class:PosteriorSamples came from.

Notes

Use :meth:summary to collapse the matrices to per-row credible bands without writing the quantile reductions yourself. For very large prediction sets, prefer :meth:PosteriorSamples.predict which streams chunk-by-chunk instead of materialising the full (n_draws, n_rows) matrix here.

Examples:

>>> pp = post.predict_draws(new_data)
>>> pp.shape
(1000, 50)
>>> bands = pp.summary(level=0.9)

shape property

shape: tuple[int, int]

Shape of the link-scale draw matrix.

Returns:

Type Description
tuple[int, int]

(n_draws, n_rows).

Examples:

>>> pp = post.predict_draws(new_data)
>>> pp.shape
(1000, 50)

n_draws property

n_draws: int

Number of posterior fitted-mean draws.

Returns:

Type Description
int

Length of the leading axis of :attr:eta.

Examples:

>>> pp = post.predict_draws(new_data)
>>> pp.n_draws
1000

n_rows property

n_rows: int

Number of prediction rows.

Returns:

Type Description
int

Length of the trailing axis of :attr:eta.

Examples:

>>> pp = post.predict_draws(new_data)
>>> pp.n_rows
50

summary

summary(level: float = 0.95) -> dict[str, Any]

Collapse fitted-mean draws to per-row credible bands.

Parameters:

Name Type Description Default
level float

Coverage probability of the equal-tailed credible interval in (0, 1). Default 0.95.

0.95

Returns:

Type Description
dict[str, ndarray]

Dict with six length-n_rows arrays: eta_mean, eta_lower, eta_upper (link scale) and mean, mean_lower, mean_upper (response scale).

Notes

Because the supported inverse links are monotone, response-scale quantiles are computed as the inverse link applied to the link quantiles rather than as quantiles of :attr:mean directly — the two agree up to numerical noise and the link-quantile form avoids re-walking the response-scale matrix.

Examples:

>>> pp = post.predict_draws(new_data)
>>> bands = pp.summary(level=0.9)
>>> bands["mean_lower"].shape
(50,)

CumulativeIncidenceDraws dataclass

CumulativeIncidenceDraws(
    times: Any, draws: Any, mean: Any, lower: Any, upper: Any, level: float
)

Paired posterior draws for a target-cause cumulative incidence curve.

Diagnostics and metadata

Summary dataclass

Summary(payload: dict[str, Any])

Frozen view of a fitted-model summary payload.

A Summary is the structured equivalent of print(model) for a fitted GAM. It wraps a plain dict returned by the Rust engine and exposes convenient accessors plus a notebook-friendly HTML representation. The typical entry point is :meth:Model.summary.

The payload typically contains keys such as formula, family_name, model_class, deviance, reml_score, and coefficients (a list of per-term dictionaries). Use :meth:coefficients_frame to view the coefficient table as a pandas DataFrame.

Examples:

>>> summary = model.summary()
>>> summary["family_name"]
'gaussian'
>>> summary.coefficients_frame().head()

coefficients property

coefficients: list[dict[str, Any]]

List of per-term coefficient records.

Returns:

Type Description
list of dict

One record per fitted term, each with keys such as term, estimate, std_error, and edf depending on the model.

Examples:

>>> summary.coefficients[0]["term"]
'(Intercept)'

from_dict classmethod

from_dict(payload: dict[str, Any]) -> 'Summary'

Build a :class:Summary from a raw payload dictionary.

Parameters:

Name Type Description Default
payload dict

Mapping of summary keys to values, as produced by the Rust engine.

required

Returns:

Type Description
Summary

A new immutable summary view over a shallow copy of payload.

Examples:

>>> Summary.from_dict({"formula": "y ~ s(x)", "family_name": "gaussian"})
Summary(formula='y ~ s(x)', family_name='gaussian')

get

get(key: str, default: Any = None) -> Any

Return payload[key] if present, otherwise default.

Parameters:

Name Type Description Default
key str

Payload key to look up.

required
default Any

Value returned when key is not in the payload.

None

Returns:

Type Description
Any

The looked-up value, or default when key is absent.

Examples:

>>> summary.get("deviance", float("nan"))
12.34

to_dict

to_dict() -> dict[str, Any]

Return a shallow copy of the underlying payload dictionary.

Returns:

Type Description
dict

Plain dict mirror of the summary payload, safe to mutate.

Examples:

>>> raw = summary.to_dict()
>>> sorted(raw)[:3]
['coefficients', 'deviance', 'family_name']

coefficients_frame

coefficients_frame() -> Any

Return :attr:coefficients as a :class:pandas.DataFrame.

Returns:

Type Description
DataFrame

One row per term, columns mirror the keys in :attr:coefficients records.

Examples:

>>> frame = summary.coefficients_frame()
>>> frame.columns.tolist()[:2]
['term', 'estimate']

Diagnostics dataclass

Diagnostics(
    formula: str,
    response_name: str,
    observed: list[float],
    residuals: list[float],
    predicted: dict[str, list[float]],
    metrics: dict[str, float],
    interval_lower: list[float] | None = None,
    interval_upper: list[float] | None = None,
)

Held-out / in-sample diagnostics for a fitted GAM.

Bundles observed responses, model-implied predictions, residuals, and aggregate fit metrics (MAE, RMSE, bias, optional :math:R^2) into a single immutable record. Returned by :meth:Model.diagnose and rendered inline in notebooks via :meth:_repr_html_.

Key fields:

  • formula: the model formula used to produce the predictions.
  • response_name: name of the response column in the input table.
  • observed: actual response values aligned with predicted["mean"].
  • residuals: observed - predicted["mean"] per row.
  • predicted: dictionary of prediction series (mean plus optional mean_lower / mean_upper interval bounds).
  • metrics: scalar fit metrics (n_obs, mae, rmse, bias, and r_squared when the response varies).
  • interval_lower / interval_upper: optional pointwise prediction bands when the underlying call requested an interval.

Examples:

>>> diag = model.diagnose(test)
>>> diag.metrics["rmse"]
0.42

from_predictions classmethod

from_predictions(
    *,
    formula: str,
    response_name: str,
    observed: list[float],
    predicted: dict[str, list[float]],
) -> "Diagnostics"

Construct a :class:Diagnostics from raw observed and predicted series.

Computes residuals and aggregate fit metrics (n, MAE, RMSE, bias, and :math:R^2 when the response variance is positive) from the inputs.

Parameters:

Name Type Description Default
formula str

Model formula associated with the predictions.

required
response_name str

Name of the response column.

required
observed list of float

Observed response values.

required
predicted dict of str to list of float

Prediction series. Must contain key "mean"; may contain "mean_lower" and "mean_upper" for interval bands.

required

Returns:

Type Description
Diagnostics

Populated diagnostics record with computed residuals and metrics.

Examples:

>>> Diagnostics.from_predictions(
...     formula="y ~ s(x)",
...     response_name="y",
...     observed=[1.0, 2.0, 3.0],
...     predicted={"mean": [1.1, 1.9, 3.2]},
... ).metrics["mae"]
0.13333333333333336

to_dict

to_dict() -> dict[str, Any]

Return a plain dict snapshot of the diagnostics record.

Returns:

Type Description
dict

Mapping with copies of every field, suitable for JSON-style serialization or further inspection.

Examples:

>>> diag.to_dict()["metrics"]["rmse"]
0.42

SchemaCheck dataclass

SchemaCheck(ok: bool, issues: tuple[SchemaIssue, ...])

Result of comparing serving data against a fitted model's training schema.

Returned by :meth:Model.check. Truthy when the check passes (ok=True with no issues); rendered as an HTML table in notebooks.

Key fields:

  • ok: True when the data matches the training schema.
  • issues: tuple of :class:SchemaIssue records describing each detected problem (empty when ok is True).

Examples:

>>> check = model.check(serving_df)
>>> if not check:
...     check.raise_for_error()

from_dict classmethod

from_dict(payload: dict[str, Any]) -> 'SchemaCheck'

Build a :class:SchemaCheck from a raw payload dictionary.

Parameters:

Name Type Description Default
payload dict

Mapping with keys ok (bool) and issues (list of dicts with kind, message, and optional column).

required

Returns:

Type Description
SchemaCheck

Parsed schema-check result.

Examples:

>>> SchemaCheck.from_dict({"ok": True, "issues": []}).ok
True

raise_for_error

raise_for_error() -> None

Raise :class:ValueError if the schema check failed.

Concatenates every issue message into a single ValueError. A no-op when :attr:ok is True.

Raises:

Type Description
ValueError

If at least one :class:SchemaIssue is recorded.

Examples:

>>> check = model.check(serving_df)
>>> check.raise_for_error()  # raises ValueError on mismatch

SchemaIssue dataclass

SchemaIssue(kind: str, message: str, column: str | None = None)

A single schema-validation problem detected against the training schema.

Key fields:

  • kind: short tag describing the issue category (e.g. "missing", "type_mismatch").
  • message: human-readable explanation.
  • column: name of the offending column, when applicable.

Examples:

>>> SchemaIssue(kind="missing", message="column 'age' is missing", column="age")
SchemaIssue(kind='missing', message="column 'age' is missing", column='age')

FormulaValidation dataclass

FormulaValidation(payload: dict[str, Any])

Outcome of :func:gamfit.validate_formula (no fit performed).

Wraps the JSON payload returned by the Rust validator. Typical keys include formula, model_class, family_name, and supported_by_python. Use this to confirm a formula parses, infer the family that would be picked, and check whether the Python binding can fit the resulting model before committing to a full :func:gamfit.fit call.

Examples:

>>> info = gamfit.validate_formula(df, "y ~ s(x)")
>>> info["family_name"]
'gaussian'
>>> info.supported_by_python
True

supported_by_python property

supported_by_python: bool

Whether the Python binding can fit the validated model.

Returns:

Type Description
bool

True when :func:gamfit.fit can produce a fitted model for this formula/family combination, False if only the CLI / Rust engine can handle it.

Examples:

>>> info.supported_by_python
True

from_dict classmethod

from_dict(payload: dict[str, Any]) -> 'FormulaValidation'

Build a :class:FormulaValidation from a raw payload dictionary.

Parameters:

Name Type Description Default
payload dict

Mapping of validation keys to values, as produced by the Rust validator.

required

Returns:

Type Description
FormulaValidation

Immutable view over a shallow copy of payload.

Examples:

>>> FormulaValidation.from_dict({"formula": "y ~ x", "supported_by_python": True})
FormulaValidation(formula='y ~ x', model_class=None, family_name=None, supported_by_python=True)

to_dict

to_dict() -> dict[str, Any]

Return a shallow copy of the underlying payload dictionary.

Returns:

Type Description
dict

Plain dict mirror of the validation payload.

Examples:

>>> raw = info.to_dict()
>>> raw["formula"]
'y ~ s(x)'

SharedPrecisionGroup dataclass

SharedPrecisionGroup(
    name: str,
    shape: float = 1.0,
    rate: float = 0.0,
    labels: str | Mapping[str | int, str] | None = None,
)

Cross-fit coefficient precision group.

name is the shared precision coordinate. By default it selects the same named coefficient term/column/label in every model. labels can override that with either one label for all models or a mapping keyed by the model name/index supplied to :func:cross_fit_shared_precision_groups.

Basis and ridge primitives

bspline_basis

bspline_basis(
    t: Any, knots: Any = None, *, degree: int = 3, periodic: bool = False
) -> Any

Evaluate the Rust B-spline basis as a NumPy array.

knots may be:

  • None — auto-derive a clamped knot vector with quantile-spaced interior knots inferred from t.
  • an int K — auto-derive with K interior knots.
  • an array-like — used verbatim (must be a valid clamped knot vector).

bspline_basis_derivative

bspline_basis_derivative(
    t: Any,
    knots: Any = None,
    *,
    degree: int = 3,
    order: int = 1,
    periodic: bool = False,
) -> Any

Evaluate derivatives of the Rust B-spline basis as a NumPy array.

knots accepts None / int / array — see :func:bspline_basis.

duchon_basis_1d

duchon_basis_1d(
    t: Any, centers: Any = None, *, m: int = 2, periodic: bool = False
) -> Any

Evaluate the Rust one-dimensional Duchon basis as a NumPy array.

centers may be:

  • None — auto-derive K = 10 centers at empirical quantiles of t.
  • an int K — auto-derive K quantile centers.
  • an array-like — used verbatim.

duchon_basis_1d_derivative

duchon_basis_1d_derivative(
    t: Any,
    centers: Any = None,
    *,
    m: int = 2,
    order: int = 1,
    periodic: bool = False,
) -> Any

Evaluate derivatives of the Rust one-dimensional Duchon basis.

centers accepts None / int / array — see :func:duchon_basis_1d.

smoothness_penalty

smoothness_penalty(
    knots: Any, *, degree: int = 3, order: int = 2
) -> tuple[Any, Any]

Return (S, null_basis) for the Rust B-spline difference penalty.

knots must be a knot vector here — auto-derivation requires sample positions, which this penalty constructor does not take. Build one with :func:bspline_basis's defaults (or pass any 1D array).

gaussian_weighted_ridge

gaussian_weighted_ridge(
    X: Any, Y: Any, penalty: Any, weights: Any, *, ridge_lambda: float
) -> tuple[Any, Any]

Closed-form Gaussian row-weighted ridge on NumPy-compatible arrays.

weights are likelihood row weights. They are not a multiplicative gate on the mean/design row.

gaussian_weighted_ridge_batch

gaussian_weighted_ridge_batch(
    X: Any,
    Y: Any,
    penalty: Any,
    weights: Any,
    *,
    ridge_lambda: float,
    row_counts: Any | None = None,
) -> tuple[Any, Any]

Batched closed-form Gaussian row-weighted ridge.

X has shape (K, Nmax, M), Y has shape (K, Nmax, D), and weights has shape (K, Nmax). row_counts optionally marks the active row prefix for each problem in a padded ragged batch.

Gaussian REML primitives

gaussian_reml_fit

gaussian_reml_fit(
    x: Any,
    y: Any,
    penalty: Any,
    *,
    weights: Any | None = None,
    init_lambda: float | None = None,
    by: Any | None = None,
    by_start_col: int = 0,
) -> dict[str, Any]

Fit a closed-form Gaussian REML problem from NumPy-compatible arrays.

gaussian_reml_fit_backward

gaussian_reml_fit_backward(
    x: Any,
    y: Any,
    penalty: Any,
    *,
    grad_lambda: float = 0.0,
    grad_coefficients: Any | None = None,
    grad_fitted: Any | None = None,
    grad_reml_score: float = 0.0,
    grad_edf: float = 0.0,
    forward_state: dict[str, Any] | None = None,
    weights: Any | None = None,
    init_lambda: float | None = None,
    by: Any | None = None,
    by_start_col: int = 0,
) -> dict[str, Any]

Run the analytic VJP for gaussian_reml_fit outputs.

gaussian_reml_fit_batched

gaussian_reml_fit_batched(
    x: Any,
    y: Any,
    row_offsets: Any,
    penalty: Any,
    *,
    weights: Any | None = None,
    init_lambda: float | None = None,
    by: Any | None = None,
    by_start_col: int = 0,
) -> dict[str, Any]

Fit K closed-form Gaussian REML problems packed by row offsets.

gaussian_reml_fit_batched_backward

gaussian_reml_fit_batched_backward(
    x: Any,
    y: Any,
    row_offsets: Any,
    penalty: Any,
    *,
    grad_lambda: Any | None = None,
    grad_coefficients: Any | None = None,
    grad_fitted: Any | None = None,
    grad_reml_score: Any | None = None,
    grad_edf: Any | None = None,
    forward_state: dict[str, Any] | None = None,
    weights: Any | None = None,
    init_lambda: float | None = None,
    by: Any | None = None,
    by_start_col: int = 0,
) -> dict[str, Any]

Run packed ragged analytic VJPs for gaussian_reml_fit_batched.

gaussian_reml_fit_positions

gaussian_reml_fit_positions(
    t: Any,
    y: Any,
    basis_kind: str | None = None,
    knots_or_centers: Any = None,
    penalty: Any | None = None,
    *,
    basis: str | None = None,
    basis_order: int | None = None,
    periodic: bool = False,
    period: float | None = None,
    weights: Any | None = None,
    init_lambda: float | None = None,
    by: Any | None = None,
    by_start_col: int = 0,
) -> dict[str, Any]

Fit closed-form Gaussian REML from 1D positions and an internal basis.

knots_or_centers may be None, an int (basis count), or an array; the basis-location vector is auto-derived from t when not supplied. penalty may be None for a neutral identity ridge of matching size.

gaussian_reml_fit_positions_backward

gaussian_reml_fit_positions_backward(
    t: Any,
    y: Any,
    basis_kind: str | None = None,
    knots_or_centers: Any = None,
    penalty: Any | None = None,
    *,
    basis: str | None = None,
    grad_lambda: float = 0.0,
    grad_coefficients: Any | None = None,
    grad_fitted: Any | None = None,
    grad_reml_score: float = 0.0,
    grad_edf: float = 0.0,
    forward_state: dict[str, Any] | None = None,
    basis_order: int | None = None,
    periodic: bool = False,
    period: float | None = None,
    weights: Any | None = None,
    init_lambda: float | None = None,
    by: Any | None = None,
    by_start_col: int = 0,
) -> dict[str, Any]

Run the analytic VJP for gaussian_reml_fit_positions outputs.

knots_or_centers and penalty accept the same auto-derived defaults as :func:gaussian_reml_fit_positions.

gaussian_reml_fit_positions_batched

gaussian_reml_fit_positions_batched(
    t: Any,
    y: Any,
    row_offsets: Any,
    basis_kind: str | None = None,
    knots_or_centers: Any = None,
    penalty: Any | None = None,
    *,
    basis: str | None = None,
    basis_order: int | None = None,
    periodic: bool = False,
    period: float | None = None,
    weights: Any | None = None,
    init_lambda: float | None = None,
    by: Any | None = None,
    by_start_col: int = 0,
) -> dict[str, Any]

Fit packed ragged closed-form Gaussian REML problems from positions.

knots_or_centers and penalty accept the same auto-derived defaults as :func:gaussian_reml_fit_positions. The basis locations are inferred from the concatenated positions across all groups.

gaussian_reml_fit_positions_batched_backward

gaussian_reml_fit_positions_batched_backward(
    t: Any,
    y: Any,
    row_offsets: Any,
    basis_kind: str | None = None,
    knots_or_centers: Any = None,
    penalty: Any | None = None,
    *,
    basis: str | None = None,
    grad_lambda: Any | None = None,
    grad_coefficients: Any | None = None,
    grad_fitted: Any | None = None,
    grad_reml_score: Any | None = None,
    grad_edf: Any | None = None,
    forward_state: dict[str, Any] | None = None,
    basis_order: int | None = None,
    periodic: bool = False,
    period: float | None = None,
    weights: Any | None = None,
    init_lambda: float | None = None,
    by: Any | None = None,
    by_start_col: int = 0,
) -> dict[str, Any]

Run the analytic VJP for packed position-based Gaussian REML fits.

knots_or_centers and penalty accept the same auto-derived defaults as :func:gaussian_reml_fit_positions_batched.

gaussian_reml_fit_formula

gaussian_reml_fit_formula(
    data: Any, formula: str, y: Any, *, config: dict[str, Any] | None = None
) -> dict[str, Any]

Fit closed-form Gaussian REML after materialising a formula design.

scikit-learn integration

GAMRegressor dataclass

GAMRegressor(
    formula: str,
    family: str = "auto",
    offset: str | None = None,
    weights: str | None = None,
    config: dict[str, Any] | None = None,
)

Bases: _BaseGAMEstimator, RegressorMixin

scikit-learn-compatible regressor wrapping :func:gamfit.fit.

Construct with a formula string and (optionally) pipeline kwargs such as family, offset, weights, or a free-form config dict, then call :meth:fit with either a fully-formed table (X) or a feature table plus a target column / vector (y). After fitting, the estimator exposes the standard predict / score interface plus pass-through helpers :meth:summary, :meth:report, and :meth:check from the underlying :class:Model.

Parameters:

Name Type Description Default
formula str

Wilkinson-style formula. May or may not include the response on the left-hand side; the response is resolved from y if missing.

required
family str

Likelihood family forwarded to :func:gamfit.fit.

``"auto"``
offset str or None

Offset column name, forwarded to :func:gamfit.fit.

None
weights str or None

Observation-weight column name.

None
config dict or None

Escape-hatch dict of extra pipeline keys.

None

Examples:

>>> from gamfit.sklearn import GAMRegressor
>>> reg = GAMRegressor(formula="y ~ s(x1) + s(x2)").fit(X_train, y_train)
>>> preds = reg.predict(X_test)
>>> reg.score(X_test, y_test)
0.87

fit

fit(X: Any, y: Any = None) -> 'GAMRegressor'

Fit the underlying GAM and return self.

Parameters:

Name Type Description Default
X Any

Training table (pandas DataFrame, pyarrow Table, dict of columns, list of records, or anything :func:gamfit.fit accepts). May include the response column or not.

required
y str, array-like, or None

Target. str names a column already in X; an array-like is bound to X under the response name implied by formula; None means X already contains the response named by formula.

None

Returns:

Type Description
GAMRegressor

Fitted estimator (self) with model_, formula_, feature_names_in_, and n_features_in_ attributes set.

Examples:

>>> GAMRegressor(formula="y ~ s(x)").fit(df, y="y")

predict

predict(X: Any) -> np.ndarray

Predict the conditional mean for each row in X.

Parameters:

Name Type Description Default
X Any

Serving table with the feature columns seen at fit time.

required

Returns:

Type Description
ndarray

One-dimensional float array of predicted means, one per row.

Examples:

>>> reg.predict(X_test)[:3]
array([1.02, 0.98, 1.41])

score

score(X: Any, y: Any, sample_weight: Any = None) -> float

Return the coefficient of determination :math:R^2.

Parameters:

Name Type Description Default
X Any

Test feature table.

required
y array - like

True response values.

required
sample_weight array - like or None

Per-row weights forwarded to :func:sklearn.metrics.r2_score.

None

Returns:

Type Description
float

:math:R^2 of the predictions.

Examples:

>>> reg.score(X_test, y_test)
0.87

GAMClassifier dataclass

GAMClassifier(
    formula: str,
    family: str = "auto",
    offset: str | None = None,
    weights: str | None = None,
    config: dict[str, Any] | None = None,
)

Bases: _BaseGAMEstimator, ClassifierMixin

scikit-learn-compatible binary classifier wrapping :func:gamfit.fit.

Same construction and fit semantics as :class:GAMRegressor (see that class for parameter documentation). Predictions interpret the model's mean as the probability of the positive class; classes are fixed to [0, 1] and a threshold of 0.5 is used by :meth:predict.

Examples:

>>> from gamfit.sklearn import GAMClassifier
>>> clf = GAMClassifier(formula="y ~ s(x1) + s(x2)", family="binomial")
>>> clf.fit(X_train, y_train)
>>> clf.predict_proba(X_test)[:1]
array([[0.34, 0.66]])

fit

fit(X: Any, y: Any = None) -> 'GAMClassifier'

Fit the binary GAM classifier and return self.

Parameters:

Name Type Description Default
X Any

Training table. See :meth:GAMRegressor.fit for accepted forms.

required
y str, array-like, or None

Binary target. See :meth:GAMRegressor.fit for accepted forms.

None

Returns:

Type Description
GAMClassifier

Fitted estimator (self) with classes_ set to [0, 1].

Examples:

>>> GAMClassifier(formula="y ~ s(x)", family="binomial").fit(df, y="y")

predict_proba

predict_proba(X: Any) -> np.ndarray

Predict class probabilities for each row in X.

Parameters:

Name Type Description Default
X Any

Serving table with the feature columns seen at fit time.

required

Returns:

Type Description
ndarray

Two-column float array [[P(y=0), P(y=1)], ...], clipped to [0, 1].

Examples:

>>> clf.predict_proba(X_test).shape
(100, 2)

predict

predict(X: Any) -> np.ndarray

Predict the binary class label using a 0.5 threshold on the positive class.

Parameters:

Name Type Description Default
X Any

Serving table with the feature columns seen at fit time.

required

Returns:

Type Description
ndarray

One-dimensional integer array of class labels (0 or 1).

Examples:

>>> clf.predict(X_test)[:5]
array([1, 0, 1, 1, 0])

score

score(X: Any, y: Any, sample_weight: Any = None) -> float

Return classification accuracy.

Parameters:

Name Type Description Default
X Any

Test feature table.

required
y array - like

True binary labels.

required
sample_weight array - like or None

Per-row weights forwarded to :func:sklearn.metrics.accuracy_score.

None

Returns:

Type Description
float

Accuracy in [0, 1].

Examples:

>>> clf.score(X_test, y_test)
0.91

Exceptions

GamError

Bases: Exception

Base class for Python-facing GAM errors.

All gamfit-specific exceptions raised by the Python binding inherit from GamError, so catching this class is the broadest way to handle a failure originating from the Rust engine or the binding layer.

Examples:

>>> try:
...     gamfit.fit(df, "y ~ s(x)")
... except gamfit.GamError as exc:
...     print(gamfit.explain_error(exc))

FormulaError

Bases: GamError

The formula is invalid or unsupported.

Raised when the Wilkinson-style formula string cannot be parsed, references columns missing from the input table, or describes a model the engine does not support.

Examples:

>>> try:
...     gamfit.fit(df, "y ~ s(nope)")
... except gamfit.FormulaError as exc:
...     print(exc)

SchemaMismatchError

Bases: GamError

Prediction input does not match the training schema.

Raised when the table passed to :meth:Model.predict or related methods lacks columns that were present at fit time, has incompatible dtypes, or introduces unknown categorical levels.

Examples:

>>> try:
...     model.predict(serving_df)
... except gamfit.SchemaMismatchError as exc:
...     print(model.check(serving_df))

PredictionError

Bases: GamError

Prediction failed.

Raised for runtime failures during prediction that are not pure schema problems (numerical issues, unsupported prediction modes for the fitted model, etc.).

Examples:

>>> try:
...     model.predict(test_df)
... except gamfit.PredictionError as exc:
...     print(gamfit.explain_error(exc))

RustExtensionUnavailableError

Bases: ImportError

Raised when the compiled gamfit._rust extension cannot be imported.

The Rust engine ships as a maturin-built extension module. When it is missing (typical in a fresh source checkout that has not been built yet), every Rust-backed API in :mod:gamfit raises this error eagerly so users see a single, actionable message instead of an opaque ImportError.

The fix is to build or install the package, e.g. maturin develop from the gamfit source tree, or pip install gamfit from PyPI.

Examples:

>>> try:
...     gamfit.fit(df, "y ~ s(x)")
... except gamfit.RustExtensionUnavailableError as exc:
...     print("build the extension first:", exc)

Response geometry

ResponseGeometryModel dataclass

ResponseGeometryModel(
    models: Sequence[Any],
    response_geometry: str,
    response_columns: tuple[str, ...],
    base_point: Any,
    coordinates: str,
    reference: int = -1,
    training_table_kind: str | None = None,
    shared_tangent_fit: SharedGaussianRemlTangentFit | None = None,
)

A fitted response-geometry GAM with shared smoothing across tangent coordinates.

clr

clr(values: Any) -> Any

Centered log-ratio coordinates for positive compositions.

alr

alr(values: Any, *, reference: int = -1) -> Any

Additive log-ratio coordinates for positive compositions.

closure

closure(values: Any) -> Any

Normalize rows onto the probability simplex.

simplex_frechet_mean

simplex_frechet_mean(values: Any, weights: Any | None = None) -> Any

Intrinsic Fréchet mean under Aitchison simplex geometry.

sphere_frechet_mean

sphere_frechet_mean(
    values: Any,
    weights: Any | None = None,
    *,
    tol: float = 1e-12,
    max_iter: int = 256,
) -> Any

Intrinsic Fréchet/Karcher mean on the unit sphere.

If the minimizer is not unique, as for an exactly antipodal pair, this returns one deterministic minimizer rather than an endpoint surrogate.