Rootograms a new way to assess count models
07 June 2016 /posted in: R
Assessing the fit of a count regression model is not necessarily a straightforward enterprise; often we just look at residuals, which invariably contain patterns of some form due to the discrete nature of the observations, or we plot observed versus fitted values as a scatter plot. Recently, while perusing the latest statistics offerings on ArXiv I came across Kleiber and Zeileis (2016) who propose the rootogram as an improved approach to the assessment of fit of a count regression model. The paper is illustrated using R and the authors’ countreg package (currently on R-Forge only). Here, I thought I’d take a quick look at the rootogram with some simulated species abundance data.
Kleiber, Christian, and Achim Zeileis. 2016. “Visualizing Count Data Regressions Using Rootograms,” 4~may.
Harvesting more Canadian climate data
24 May 2016 /posted in: R
A while back I wrote some code to download climate data from Government of Canada’s historical climate/weather data website for one of our students. In May this year (2016) the Government of Canada changed their website a little and the API code that responded to requests had changed URL and some of the GET parameters had also changed. In fixing those functions I also noted that the original code only downloaded hourly data and not all useful weather variables are recorded hourly; precipitation for example is only in the daily and monthly data formats. This post updates the earlier one, explaining what changed and how the code has been updated. As an added benefit, the functions can now handle downloading daily and monthly data files as well as the hourly files that the original could handle.
A new default plot for multivariate dispersions tribulations of base graphics programming
17 April 2016 /posted in: R
This weekend, prompted by a pull request from Michael Friendly, I finally got round to improving the
plot method for
betadisper() in the vegan package.
betadisper() is an implementation of Marti Anderson’s Permdisp method, a multivariate analogue of Levene’s test for homogeneity of variances. In improving the default plot and allowing customisation of plot features, I was reminded of how much I dislike programming plot functions that use base graphics. But don’t worry, this isn’t going to degenerate into a ggplot love-in nor a David Robinson-esque dig at Jeff Leek.
10 April 2016 /posted in: Science
It’s fair to say I have gotten a bee1 in my bonnet about how palaeolimnologists handle time. For a group of people for whom time is everything, we sure do a poor job (in general) of dealing with it in when it comes time to analyse our data. In many instances, “poor job” means making no attempt at all to account for the special nature of the time series. LOESS comes in for particular criticism because it is widely used by palaeolimnologists despite not being particularly suited to the task. Why this is so is perhaps due to it’s promotion in influential books, papers, and software. I am far from innocent in this regard having taught LOESS and it’s use for many years on the now-defunct ECRC Numerical Course. Here I want to look at further problems with our use of LOESS, and will argue that we need to resign it to the trash can for all but exploratory analyses. I will begin the case for the prosecution with one of my own transgressions.
an entire hive is perhaps more apt!↩
Soap-film smoothers & lake bathymetries
27 March 2016 /posted in: R
A number of years ago, whilst I was still working at ENSIS, the consultancy arm of the ECRC at UCL, I worked on a project for the (then) Countryside Council for Wales (CCW; now part of Natural Resources Wales). I don’t recall why they were doing this project, but we were tasked with producing a standardised set of bathymetric maps for Welsh lakes. The brief called for the bathymetries to be provided in standard GIS formats. Either CCW’s project manager or the project lead at ENSIS had proposed to use inverse distance weighting (IWD) to smooth the point bathymetric measurements. This probably stemmed from the person that initiatied our bathymetric programme at ENSIS being a GIS wizard, schooled in the ways of ArcGIS. My involvement was mainly data processing of the IDW results. I was however, at the time, also somewhat familiar with the problem of finite area smoothing1 and had read a paper of Simon Wood’s on his then new soap-film smoother (Wood, Bravington, and Hedley 2008). So, as well as writing scripts to process and present the IDW-based bathymetry data in the report, I snuck a task into the work programme that allowed me to investigate using soap-film smoothers for modelling lake bathymetric data. The timing was never great to write up this method (two children and a move to Canada have occurred since the end of this project), so I’ve not done anything with the idea. Until now…
Wood, Simon N, Mark V Bravington, and Sharon L Hedley. 2008. “Soap Film Smoothing.” Journal of the Royal Statistical Society. Series B, Statistical Methodology 70 (5). Blackwell Publishing Ltd: 931–55. doi:http://doi.org/10.1111/j.1467-9868.2008.00665.x.
smoothing over a domain with known boundaries, like a lake↩
Additive modelling global temperature time series: revisited
25 March 2016 /posted in: R
Quite some time ago, back in 2011, I wrote a post that used an additive model to fit a smooth trend to the then-current Hadley Centre/CRU global temperature time series data set. Since then the media and scientific papers have been full of reports of record warm temperatures in the past couple of years, of controversies (imagined) regarding data-changes to suit the hypothesis of human induce global warming, and the brouhaha over whether global warming had stalled; the great global warming hiatus or pause. So it seemed like a good time to revisit that analysis and update it using the latest HadCRUT data.
Better use of transfer functions?
16 December 2015 /posted in: Science
Transfer functions have had a bit of a hard time of late following Steve Juggins (2013) convincing demonstration that 1) secondary gradients can influence your model, and 2) that variation down-core in a secondary variable can induce a signal in the thing being reconstructed. This was followed up by further comment on diatom-TP reconstructions (Juggins et al. 2013), and not to be left out, chironomid transfer functions have come in from some heat, if the last (that I went to) IPS meeting was any indication. In a session at the 2015 Fall Meeting of the AGU, my interest was piqued by Yarrow Axford’s talk using chironomid temperature reconstructions, but not for the reasons you might be thinking.
Juggins, Steve. 2013. “Quantitative Reconstructions in Palaeolimnology: New Paradigm or Sick Science?” Quaternary Science Reviews 64 (0): 20–32. doi:http://doi.org/10.1016/j.quascirev.2012.12.014.
Juggins, Steve, N John Anderson, Joy M Ramstack Hobbs, and Adam J Heathcote. 2013. “Reconstructing Epilimnetic Total Phosphorus Using Diatoms: Statistical and Ecological Constraints.” Journal of Paleolimnology 49 (3). Springer Netherlands: 373–90. doi:http://doi.org/10.1007/s10933-013-9678-x.
AGU Fall Meeting 2015
14 December 2015 /posted in: Science
My poster, Rapid ecological change in lake ecosystems (GC13G-1236) in the Sedimentary records of threshold change (GC13G Moscone South Poster Hall 1340–1800, monday 14th December) describes some of my recent research into methods to analyse palaeoenvironmental time series from sediment cores. Using data from a varved lake, Baldeggersee, Switzerland, I use location scale generalised additive models to simultaneously model the mean (trend) and the variance of a time series of diatom counts. Wavelets were used to investigate further variation in species dynamics during the well-documented history of eutrophication at the lake.
Are some seasons warming more than others?
23 November 2015 /posted in: R
I ended the last post with some pretty plots of air temperature change within and between years in the Central England Temperature series. The elephant in the room1 at the end of that post was is the change in the within year (seasonal) effect over time statistically significant? This is the question I’ll try to answer, or at least show how to answer, now.
well, one of the elephants; I also wasn’t happy with the AR(7) for the residuals↩
Climate change and spline interactions
21 November 2015 /posted in: R
In a series of irregular posts1 I’ve looked at how additive models can be used to fit non-linear models to time series. Up to now I’ve looked at models that included a single non-linear trend, as well as a model that included a within-year (or seasonal) part and a trend part. In this trend plus season model it is important to note that the two terms are purely additive; no matter which January you are predicting for in a long timeseries, the seasonal effect for that month will always be the same. The trend part might shift this seasonal contribution up or down a bit, but all January’s are the same. In this post I want to introduce a different type of spline interaction model that will allow us to relax this additivity assumption and fit a model that allows the seasonal part of the model to change in time along with the trend.