# Introducing gratia

I use generalized additive models (GAMs) in my research work. I use them a lot! Simon Wood’s **mgcv** package is an excellent set of software for specifying, fitting, and visualizing GAMs for very large data sets. Despite recently dabbling with **brms**, **mgcv** is still my go-to GAM package. The only down-side to **mgcv** is that it is not very tidy-aware and the **ggplot**-verse may as well not exist as far as it is concerned. This in itself is no bad thing, though as someone who uses **mgcv** a lot but also prefers to do my plotting with **ggplot2**, this lack of awareness was starting to hurt. So, I started working on something to help bridge the gap between these two separate worlds that I inhabit. The fruit of that labour is **gratia**, and development has progressed to the stage where I am ready to talk a bit more about it.

**gratia** is an R package for working with GAMs fitted with `gam()`

, `bam()`

or `gamm()`

from **mgcv** or `gamm4()`

from the **gamm4** package, although functionality for handling the latter is not yet implement. **gratia** provides functions to replace the base-graphics-based `plot.gam()`

and `gam.check()`

that **mgcv** provides with **ggplot2**-based versions. Recent changes have also resulted in **gratia** being much more **tidyverse** aware and it now (mostly) returns outputs as tibbles.

In this post I wanted to give a flavour of what is currently possible with **gratia** and outline what still needs to be implemented.

**gratia** currently lives on GitHub, so we need to install it from there using `devtools::install_github`

:

To do anything useful with **gratia** we need a GAM and for that we need **mgcv**

and an old favourite example data set

The simulated data in `dat`

are well-studied in GAM-related research and contain a number of covariates — labelled `x0`

through `x3`

— which have, to varying degrees, non-linear relationships with the response. We want to try to recover these relationships by approximating the true relationships between covariate and response using splines. To fit a purely additive model, we use

**mgcv** provides a `summary()`

method that is used to extract information about the fitted GAM

and the `k.check()`

function for checking whether sufficient numbers of basis functions were used in each smooth in the model. (You may not have used `k.check()`

directly — it is called by `gam.check()`

which prints out other diagnostics and also produces four model diagnostic plots, which is one thing that **gratia** provides a replacement for.)

### Plotting smooths

To visualize estimated GAMs, **mgcv** provides the `plot.gam()`

method and the `vis.gam()`

function. **gratia** currently provides a **ggplot2**-based replacement for `plot.gam()`

. Work is on-going to provide `vis.gam()`

-like functionality within **gratia** — see `?gratia::data_slice`

for early work in that direction. In **gratia**, we use the `draw()`

generic to produce **ggplot2**-like plots from objects. To visualize the four estimated smooth functions in the GAM `mod`

, we would use

Internally `draw()`

uses the `plot_grid()`

function from **cowplot** to draw multiple panels on the plot device, and to line up the individual plots.

There’s not an awful lot more you can do with this now, but the at least the plot is reasonably pretty. **gratia** includes tools for working with the underlying smooths represented in `mod`

, and if you wanted to extract most of the data used to build the plot you’d use the `evaluate_smooth()`

function.

### Producing diagnostic plots

The diagnostic plots currently produced by `gam.check()`

can also be produced using **gratia**, with the `appraise()`

function

Each of the four plots is produced via user-accessible function that implements a specific plot. For example, `qq_plot(mod)`

produces the Q-Q plot in the upper left for the figure above, and the `qq_plot.gam()`

method reproduces most of the functionality of `mgcv::qq.gam()`

, including the direct randomization procedure (`method = 'direct'`

, as shown above) and the data simulation procedure (`method = 'simulate'`

) to generate reference quantiles, which typically have better performance for GLM-like models (Augustin et al., 2012).

`draw()`

can also handle many of the more specialized smoothers currently available in **mgcv**. For example, 2D smoothers are represented as `geom_raster()`

surfaces with contours

and factor-smooth-interaction terms, which are the equivalent of random slopes and intercepts for splines, are drawn on a single panel and colour is used to distinguish the different random smooths

### What else can gratia do?

Although still quite early in the planned development cycle, **gratia** can handle most of the smooths that **mgcv** can estimate, including `by`

variable smooths with factor and continuous `by`

variables, random effect smooths (`bs = 're'`

), 2D tensor product smooths, and models with parametric terms.

Smoothers that **gratia** can’t do anything with as yet are Markov random fields (MRFs; `bs = 'mrf'`

), splines on the sphere (SoSs; `bs = 'sos'`

), soap film smoothers (`bs = 'so'`

), and linear functional models with matrix terms.

The package also includes functions for

- calculating across-the-function and simultaneous confidence intervals for smooths via
`confint()`

methods, and - calculating first and second derivatives (of currently only univariate) smooths using finite differences.
`fderiv()`

is the old home for first derivatives of GAM smooths, whilst the new`derivatives()`

function can calculate first and second derivatives using forward (like`fderiv()`

) as well as backward and central finite differences.

There is also are lot of exported functions that make it easier for working with GAMs fitted by **mgcv** and for extracting aspects of the fitted model and the smooths. The exact functionality is still being worked on so be prepared for some of the functions to come and go or change name as I work through ideas and implementations and settle on the interface for the tools that **gratia** will provide for this.

### What can’t gratia do?

I’ve already covered where **gratia** is currently lacking in respect to the types of smoother that **mgcv** can fit. It is also currently lacking in tools for exploring models in more detail, such as the plots of model predictions over slices of covariate space that `vis.gam()`

can produce (though see `gratia::data_slice()`

for functions to create the data needed for such plots.) Nor can **gratia** currently handle smooths of higher than two dimensions. I’d like to add this capability soon as it will make visualizing GAMs fitted to spatio-temporal data much easier then it currently is.

### The future?

Longer term I plan to fill out the types of smoothers that **gratia** can handle to cover all the types that **mgcv** can fit, and add `vis.gam()`

-like functionality and the ability to handle higher dimensional smooths (`plot.gam()`

can now handle 3- or 4-dimensional smooths.)

The ultimate goal of course is to just have `draw()`

work for whatever GAM model you throw at it, and at least have feature parity with `plot.gam()`

and `vis.gam()`

.

As is to be expected for such an early release, there is a lot of stabilization to function names and arguments that needs to happen in **gratia**, and a lot of documentation to be written, including some vignettes. For now, the best way to understand what **gratia** is doing or how it works is to look at the examples on the **gratia** website (built using **pkgdown**) and take a look at the package tests which contain lots of examples of GAM fits and the code to work with them.

I’m very much interested in user feedback, so please do let me know if you have any suggestions for additions or improvements to **gratia**, and if you do use **gratia** and find bugs in the package or GAMs that **gratia** can’t handle I would love to hear from you. You can get in touch via the comments below, or via GitHub Issues.

I would also be remiss if I did not mention Matteo Fasiolo’s excellent **mcgViz** package, which already has extensive capabilities for exploring GAM fits, including some very interesting approaches to handling models of millions of data points or more, which cause data visualization problems.

### References

Augustin, N. H., Sauleau, E.-A., and Wood, S. N. (2012). On quantile quantile plots for generalized linear models. *Computational statistics & data analysis* 56, 2404–2409. doi:10.1016/j.csda.2012.01.026.