Survival models¶

gamfit fits left-truncated, right-censored survival data with smooth covariate effects. The response is Surv(entry, exit, event) and the likelihood mode controls how the baseline and covariate effects are parameterised.

The `Surv(...)` response¶

gamfit.fit(df, "Surv(entry, exit, event) ~ age + s(bmi)")

The three arguments are column names:

entry: left-truncation time. Use 0 if there is no truncation.
exit: observation time (event time or censoring time).
event: 1 if the event occurred at exit, 0 if censored.

All three columns must be numeric. exit >= entry is required per row. The 2-column R/mgcv form Surv(time, status) is rejected with an error message that suggests adding a zero entry column.

Likelihood modes¶

Pass one of the following via survival_likelihood=:

Mode	Description
`"transformation"`	I-spline monotone log-cumulative-hazard baseline with linear or smooth covariate effects. Default.
`"weibull"`	Weibull parametric baseline with linear covariate effects on the log hazard.
`"location-scale"`	Joint location and log-scale survival model; requires a `noise_formula`. See location-scale.md.
`"marginal-slope"`	Separates a calibrated risk-score effect from the baseline. See marginal-slope.md.
`"latent"`	Parametric baseline with latent-Gaussian frailty integration.
`"latent-binary"`	Binary response under the same latent-Gaussian framework as `"latent"`. Incompatible with `--predict-noise`.

gamfit.fit(df,
    "Surv(t0, t1, event) ~ s(age) + bmi",
    survival_likelihood="transformation",
)

Parametric baselines¶

For modes that support a parametric baseline ("weibull", "location-scale", "latent", and "transformation" when used with timewiggle(...)), select it with baseline_target=:

`baseline_target`	Required extra parameters	Notes
`"linear"`	none	I-spline monotone log-cumulative-hazard baseline.
`"weibull"`	`baseline_scale > 0`, `baseline_shape > 0`	Monotone hazard.
`"gompertz"`	`baseline_rate > 0`	Exponentially-rising hazard.
`"gompertz-makeham"`	`baseline_rate > 0`, `baseline_makeham > 0`	Gompertz hazard plus a constant additive floor.

gamfit.fit(df,
    "Surv(entry, exit, event) ~ s(bmi)",
    survival_likelihood="latent",
    baseline_target="gompertz",
    baseline_rate=0.08,
)

`timewiggle` for flexible baseline departures¶

timewiggle(...) adds a spline offset to a non-linear scalar baseline:

Surv(entry, exit, event) ~ s(bmi) + timewiggle(internal_knots=8)

It accepts the same options as linkwiggle(...): internal_knots, degree, penalty_order, and double_penalty. With survival_likelihood="transformation", set baseline_target to "weibull", "gompertz", or "gompertz-makeham" when using timewiggle(...).

Frailty¶

frailty_kind= enables a latent random effect:

`frailty_kind`	Effect
`"gaussian-shift"`	Additive Gaussian shift on the linear predictor.
`"hazard-multiplier"`	Multiplicative log-normal frailty on the hazard.

gamfit.fit(df,
    "Surv(entry, exit, event) ~ s(age) + bmi",
    survival_likelihood="latent",
    baseline_target="gompertz",
    baseline_rate=0.08,
    frailty_kind="hazard-multiplier",
    hazard_loading="full",
)

frailty_sd: fix the frailty standard deviation. Required for gaussian-shift (learnable sigma is not implemented for the exact marginal-slope outer solver) and for some other modes; omit to let hazard-multiplier latent models learn it where supported.
hazard_loading: only used with frailty_kind="hazard-multiplier". "full" loads frailty into every observation; "loaded-vs-unloaded" splits observations into two regimes.

Survival marginal-slope accepts only frailty_kind="gaussian-shift" with a fixed frailty_sd; "hazard-multiplier" is rejected at fit time.

Prediction¶

Model.predict(...) returns a SurvivalPrediction that evaluates the survival surface on a user-supplied time grid:

pred = model.predict(test_df)

S = pred.survival_at([1, 5, 10, 20])
F = pred.failure_at([10, 20])
h = pred.hazard_at([1, 5, 10, 20])
H = pred.cumulative_hazard_at([10, 20])

For dense surfaces on large cohorts use the chunked iterators or stream to CSV:

for row_slice, time_slice, block in pred.survival_at_chunks([1, 5, 10, 20]):
    process(block)

pred.write_survival_at_csv("surv.csv", times=[1, 5, 10, 20])

For competing risks, predict each cause-specific endpoint and assemble CIFs on the same grid:

cif = gamfit.competing_risks_cif(
    {"disease": disease_pred, "death": death_pred},
    times=[1, 5, 10, 20],
)

disease_cif = cif.cif[0]
overall_survival = cif.overall_survival

Uncertainty on the survival surface¶

For location-scale survival, with_uncertainty=True produces delta-method standard errors:

pred = model.predict(test_df, with_uncertainty=True)

S = pred.survival_at([1, 5, 10])
se_S = pred.survival_se_at([1, 5, 10])

upper = (S + 1.96 * se_S).clip(0.0, 1.0)
lower = (S - 1.96 * se_S).clip(0.0, 1.0)

For other survival modes, use Model.sample(...) to draw posterior coefficients, then push them through PosteriorSamples.predict(...) / predict_draws(...). Those methods are restricted to standard, non-link-wiggle GAMs; see posterior-sampling.md.

Paired competing-risks posterior CIF¶

For two cause-specific fits, sample_paired(...) aligns draw k from the target-cause fit with draw k from the competing-cause fit. CIF integration and equal-tailed bands are computed in the Rust engine from those paired draws:

disease_post = disease_model.sample_paired(
    death_model,
    train_df,
    samples=1000,
    chains=4,
    seed=42,
)

times = [0, 1, 5, 10, 20]
cif = disease_post.cumulative_incidence(test_df, times, level=0.95)

cif.draws   # (n_draws, n_rows, n_times)
cif.mean    # (n_rows, n_times)
cif.lower
cif.upper

Pass competing_data= to sample_paired(...) when the two fits need different training tables.

Example¶

import gamfit
import pandas as pd

df = pd.DataFrame({
    "entry": [0, 0, 0, 5, 5, 0],
    "exit":  [12, 8, 30, 22, 14, 15],
    "event": [1, 0, 1, 1, 0, 1],
    "age":   [55, 60, 45, 70, 50, 65],
    "bmi":   [24, 31, 22, 28, 26, 30],
})

model = gamfit.fit(df,
    "Surv(entry, exit, event) ~ s(age) + s(bmi) + timewiggle(internal_knots=6)",
    survival_likelihood="transformation",
    baseline_target="weibull",
)

grid_df = pd.DataFrame({"age": [50, 60, 70], "bmi": [25, 27, 29]})
pred = model.predict(grid_df)
print(pred.survival_at([1, 5, 10, 20]))

Marginal-slope for risk scores¶

When a standardised risk score (e.g. a polygenic score) has an effect that varies across covariate space, survival_likelihood="marginal-slope" together with z_column= and logslope_formula= models the score's spatially-varying effect separately from the baseline. See marginal-slope.md.