Skip to content

scikit-learn integration

gamfit.sklearn exposes two scikit-learn estimators that wrap gamfit.fit:

  • GAMRegressor (inherits RegressorMixin) — continuous responses.
  • GAMClassifier (inherits ClassifierMixin) — binary classification.

Install with pip install gamfit[sklearn].

GAMRegressor

from gamfit.sklearn import GAMRegressor
import pandas as pd
import numpy as np

X = pd.DataFrame({"x": np.linspace(0, 10, 50)})
y = 2 * X["x"] + np.random.normal(0, 0.5, len(X))

est = GAMRegressor(formula="y ~ s(x)")
est.fit(X, y)

preds = est.predict(X)        # ndarray, shape (n,)
r2    = est.score(X, y)       # r2_score

Constructor

GAMRegressor(
    formula: str,
    family: str = "auto",
    offset: str | None = None,
    weights: str | None = None,
    config: dict[str, Any] | None = None,
)

All five arguments are surfaced as get_params() keys, so they work with GridSearchCV and related utilities.

Binding the response

If the formula contains ~, the LHS column is the response. If y is an array-like, it is bound to X under the response name implied by the formula (defaulting to y). If y is a string, it names a column already present in X. If y is None, X must already contain the response.

GAMRegressor(formula="y ~ s(x)").fit(X, y)        # array y
GAMRegressor(formula="y ~ s(x)").fit(df)          # df contains "y"
GAMRegressor(formula="y ~ s(x)").fit(df, y="y")   # name a column

If the formula has no ~, gamfit prepends <target> ~.

Methods

Method Returns Notes
fit(X, y=None) self Sets model_, formula_, feature_names_in_, n_features_in_.
predict(X) ndarray (n,) Predicted mean.
score(X, y, sample_weight=None) float r2_score.
summary() Summary Delegates to model_.summary().
check(X) SchemaCheck Delegates to model_.check(). Scalar models only.
report(path) str Delegates to model_.report(path). Scalar models only.

The fitted Model is available at est.model_ for access to the full gamfit.Model API (sample, predict(..., interval=...), etc.).

GAMClassifier

from gamfit.sklearn import GAMClassifier

est = GAMClassifier(formula="y ~ s(x)", family="binomial")
est.fit(X, y)

probs = est.predict_proba(X)   # (n, 2): [P(y=0), P(y=1)]
hard  = est.predict(X)         # (n,) int, threshold 0.5
acc   = est.score(X, y)        # accuracy_score

classes_ is np.array([0, 1]) after fit(). predict_proba() clips the positive-class probability to [0, 1] and stacks [1 - p, p]. The threshold for predict() is hardcoded to 0.5.

Pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("gam",    GAMRegressor(formula="y ~ s(x0) + s(x1)")),
])
pipe.fit(X, y)
preds = pipe.predict(X_test)

The GAM step accepts a pandas.DataFrame, a numpy array, a dict of columns, or a list of records.

Cross-validation

from sklearn.model_selection import cross_val_score

scores = cross_val_score(
    GAMRegressor(formula="y ~ s(x)"),
    X, y, cv=5, scoring="r2",
)
from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(
    GAMRegressor(formula="y ~ s(x)"),
    param_grid={
        "formula": ["y ~ s(x)", "y ~ s(x, k=10)", "y ~ s(x, k=20)"],
    },
    cv=5,
)
grid.fit(X, y)

No survival wrapper

There is no sklearn wrapper for survival models. Surv(...) responses do not match sklearn's (X, y) contract, and survival prediction produces a per-time-grid surface rather than a single vector. Call gamfit.fit(...) directly for survival; see survival.md.