scikit-learn integration¶
gamfit.sklearn exposes two scikit-learn estimators that wrap
gamfit.fit:
GAMRegressor(inheritsRegressorMixin) — continuous responses.GAMClassifier(inheritsClassifierMixin) — binary classification.
Install with pip install gamfit[sklearn].
GAMRegressor¶
from gamfit.sklearn import GAMRegressor
import pandas as pd
import numpy as np
X = pd.DataFrame({"x": np.linspace(0, 10, 50)})
y = 2 * X["x"] + np.random.normal(0, 0.5, len(X))
est = GAMRegressor(formula="y ~ s(x)")
est.fit(X, y)
preds = est.predict(X) # ndarray, shape (n,)
r2 = est.score(X, y) # r2_score
Constructor¶
GAMRegressor(
formula: str,
family: str = "auto",
offset: str | None = None,
weights: str | None = None,
config: dict[str, Any] | None = None,
)
All five arguments are surfaced as get_params() keys, so they work with
GridSearchCV and related utilities.
Binding the response¶
If the formula contains ~, the LHS column is the response. If y is an
array-like, it is bound to X under the response name implied by the
formula (defaulting to y). If y is a string, it names a column already
present in X. If y is None, X must already contain the response.
GAMRegressor(formula="y ~ s(x)").fit(X, y) # array y
GAMRegressor(formula="y ~ s(x)").fit(df) # df contains "y"
GAMRegressor(formula="y ~ s(x)").fit(df, y="y") # name a column
If the formula has no ~, gamfit prepends <target> ~.
Methods¶
| Method | Returns | Notes |
|---|---|---|
fit(X, y=None) |
self |
Sets model_, formula_, feature_names_in_, n_features_in_. |
predict(X) |
ndarray (n,) |
Predicted mean. |
score(X, y, sample_weight=None) |
float |
r2_score. |
summary() |
Summary |
Delegates to model_.summary(). |
check(X) |
SchemaCheck |
Delegates to model_.check(). Scalar models only. |
report(path) |
str |
Delegates to model_.report(path). Scalar models only. |
The fitted Model is available at est.model_ for access to the full
gamfit.Model API (sample, predict(..., interval=...), etc.).
GAMClassifier¶
from gamfit.sklearn import GAMClassifier
est = GAMClassifier(formula="y ~ s(x)", family="binomial")
est.fit(X, y)
probs = est.predict_proba(X) # (n, 2): [P(y=0), P(y=1)]
hard = est.predict(X) # (n,) int, threshold 0.5
acc = est.score(X, y) # accuracy_score
classes_ is np.array([0, 1]) after fit(). predict_proba() clips the
positive-class probability to [0, 1] and stacks [1 - p, p]. The
threshold for predict() is hardcoded to 0.5.
Pipeline¶
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipe = Pipeline([
("scaler", StandardScaler()),
("gam", GAMRegressor(formula="y ~ s(x0) + s(x1)")),
])
pipe.fit(X, y)
preds = pipe.predict(X_test)
The GAM step accepts a pandas.DataFrame, a numpy array, a dict of
columns, or a list of records.
Cross-validation¶
from sklearn.model_selection import cross_val_score
scores = cross_val_score(
GAMRegressor(formula="y ~ s(x)"),
X, y, cv=5, scoring="r2",
)
Grid search¶
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(
GAMRegressor(formula="y ~ s(x)"),
param_grid={
"formula": ["y ~ s(x)", "y ~ s(x, k=10)", "y ~ s(x, k=20)"],
},
cv=5,
)
grid.fit(X, y)
No survival wrapper¶
There is no sklearn wrapper for survival models. Surv(...) responses do
not match sklearn's (X, y) contract, and survival prediction produces a
per-time-grid surface rather than a single vector. Call gamfit.fit(...)
directly for survival; see survival.md.