Hadley Wickham ((???)) this week mentioned on Twitter his preference for saveRDS() over the more familiar save(). Being a new function to me, I thought I’d take a look…

save() and load() will be familiar to many R users. They allow you to save a named R object to a file or other connection and restore that object again. When loaded the named object is restored to the current environment (in general use this is the global environment — the workspace) with the same name it had when saved. This is annoying for example when you have a saved model object resulting from a previous fit and you want to compare it with the model object returned when the R code is rerun. Unless you change the name of the model fit object in your script you can’t have both the saved object and the newly created one available in the same environment at the same time.

Here’s an example of what I mean.

> require(mgcv)
Loading required package: mgcv
This is mgcv 1.7-13. For overview type 'help("mgcv-package")'.
> mod <- gam(Ozone ~ s(Wind), data = airquality, method = "REML")
> mod

Family: gaussian
Link function: identity

Formula:
Ozone ~ s(Wind)

Estimated degrees of freedom:
3.529  total = 4.529002

REML score: 529.4881
> save(mod, file = "mymodel.rda")
> ls()
[1] "mod"
> load(file = "mymodel.rda")
> ls()
[1] "mod"

saveRDS() provides a far better solution to this problem and to the general one of saving and loading objects created with R. saveRDS() serializes an R object into a format that can be saved. Wikipedia describes this thus

…serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment.

save() does the same thing, but with one important difference; saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized.

We can illustrate this using the model fitted earlier

> ls()
[1] "mod"
> saveRDS(mod, "mymodel.rds")
> mod2 <- readRDS("mymodel.rds")
> ls()
[1] "mod"  "mod2"
> identical(mod, mod2, ignore.environment = TRUE)
[1] TRUE

(Note that the two model objects have different environments within their representations so we have to ignore this when testing their identity.)

You’ll notice that in the call to saveRDS() I named the file with the extension .rds. This appears to be the convention used for serialized object of this sort; R uses this representation often, for example package meta-data and the databases used by help.search(). In contrast the extension .rda is often used for objects serialized via save().

So there you have it; saveRDS() and readRDS() are the newest additions to my day-to-day workflow.

Update: As Gabor points out in the comments saveRDS() isn’t a drop-in replacement for save(). The main difference is that save() can save many objects to a file in a single call, whilst saveRDS(), being a lower-level function, works with a single object at a time. This is a feature for me given the above use-case, but if you find yourself saving any more than a couple of objects at a time saveRDS() may not be ideal for you. The second significant difference is that saveRDS() forgets the original name of the object; in the use-case above this is also seen as an advantage. If maintaining the original name is important to you then there is no reason to switch from using save() to saveRDS().


Comments

comments powered by Disqus