# From the bottom of the heap the musings of a geographer

## Simulating species abundance data with coenocliner

#### 31 July 2014 /posted in: R

Coenoclines are, according to the Oxford Dictionary of Ecology (Allaby 1998), *“gradients of communities (e.g. in a transect from the summit to the base of a hill), reflecting the changing importance, frequency, or other appropriate measure of different species populations”*. In much ecological research, and that of related fields, data on these coenoclines are collected and analyzed in a variety of ways. When developing new statistical methods or when trying to understand the behaviour of existing methods, we often resort to simulating data with known pattern or structure and then torture whatever method is of interest with the simulated data to tease out how well methods work or where they breakdown. There’s a long history of using computers to simulate species abundance data along coenoclines but until recently no **R** packages were available that performed coenocline simulation. **coenocliner** was designed to fill this gap, and today, the package was released to CRAN.

Allaby M. *et al.* (1998) *A Dictionary of Ecology*, second. Oxford University Press.

## Simultaneous confidence intervals for derivatives of splines in GAMs

#### 16 June 2014 /posted in: R

Last time out I looked at one of the complications of time series modelling with smoothers; you have a non-linear trend which may be statistically significant but it may not be increasing or decreasing everywhere. How do we identify where in the series the data are changing? In that post I explained how we can use the first derivatives of the model splines for this purpose, and used the method of finite differences to estimate them. To assess statistical significance of the derivative (the rate of change) I relied upon asymptotic normality and the usual pointwise confidence interval. That interval is fine if looking at just one point on the spline (not of much practical use), but when considering more points at once we have a multiple comparisons issue. Instead, a simultaneous interval is required, and for that we need to revisit a technique I blogged about a few years ago; posterior simulation from the fitted GAM.

## Identifying periods of change in time series with GAMs

#### 15 May 2014 /posted in: R

In previous posts (here and here) I looked at how generalized additive models (GAMs) can be used to model non-linear trends in time series data. In my previous post I extended the modelling approach to deal with seasonal data where we model both the within year (seasonal) and between year (trend) variation with separate smooth functions. One of the complications of time series modelling with smoothers is how to summarize the fitted model; you have a non-linear trend which may be statistically significant but it may not be increasing or decreasing everywhere. How do we identify where in the series that the data are changing? That’s the topic of this post, in which I’ll use the method of finite differences to estimate the rate of change (slope) in the fitted smoother and, through some **mgcv** magic, use the information recorded in the fitted model to identify periods of statistically significant change in the time series.

## Modelling seasonal data with GAMs

#### 09 May 2014 /posted in: R

In previous posts (here and here) I have looked at how generalized additive models (GAMs) can be used to model non-linear trends in time series data. At the time a number of readers commented that they were interested in modelling data that had more than just a trend component; how do you model data collected throughout the year over many years with a GAM? In this post I will show one way that I have found particularly useful in my research.

## File synchronisation with Unison

#### 25 March 2014 /posted in: Computing

It’s becoming a fairly common experience to work on two or more computing devices; say a desktop/workstation in the office and a laptop when travelling or a home desktop. Which is great, but how do you keep all those machines in sync so that you have the latest versions of your files available no matter where you need to work?

## Summarising multivariate palaeoenvironmental data part 2

#### 09 January 2014 /posted in: R

The *horseshoe effect* is a well known and discussed issue with principal component analysis (PCA) (e.g. Goodall 1954; Swan 1970; Noy-Meir & Austin 1970). Similar geometric artefacts also affect correspondence analysis (CA). In part 1 of this series I looked at the implications of these “artefacts” for the recovery of temporal or single dominant gradients from multivariate palaeoecological data. In part 2, I introduce the topic of principal curves (Hastie & Stuetzle 1989).

Goodall D.W. *et al.* (1954) Objective methods for the classification of vegetation. III. An essay in the use of factor analysis. *Australian Journal of Botany* **2**, 304–324.

Hastie T. & Stuetzle W. *et al.* (1989) Principal Curves. *Journal of the American Statistical Association* **84**, 502–516.

Noy-Meir I. & Austin M.P. *et al.* (1970) Principal Component Ordination and Simulated Vegetational Data. *Ecology* **51**, 551–552.

Swan J.M.A. *et al.* (1970) An Examination of Some Ordination Problems By Use of Simulated Vegetational Data. *Ecology* **51**, 89–102.

## Decluttering ordination plots part 4: orditkplot()

#### 31 December 2013 /posted in: R

Earlier in this series I looked at the `ordilabel()`

and then the `orditorp()`

functions, and most recently the `ordipointlabel()`

function in the **vegan** package as means to improve labelling in ordination plots. In this, the fourth and final post in the series I take a look at `orditkplot()`

. If you’ve created ordination diagrams before or been following the previous posts in the irregular series, you’ll have an appreciation for the problems of drawing plots that look, well, good! Without hand editing the diagrams, there is little that even `ordipointlable()`

can do for you if you want a plot created automagically. `orditkplot()`

sits between the automated methods for decluttering ordination plots I’ve looked at previously and hand-editing in dedicated drawing software like Inkscape or Illustrator, and allows some level of tweaking the locations of labelled points within R.

## Summarising multivariate palaeoenvironmental data part 1

#### 28 December 2013 /posted in: R

Ordination methods that yield orthogonal axes of variation are often used to summarise the multivariate data obtained from sediment cores. Usually the first or, less often, the first few ordination axes are taken as directions of change or the main patterns of variance in the multivariate data. There is an oft-overlooked issue with this approach that has the potential to complicate the interpretation of the extracted axes, especially where there is a single or strong gradient in the data.

## New version of permute on CRAN version 0.8-0

#### 17 December 2013 /posted in: R

After some time brewing on my machines, I’m happy to have released a new version of my **permute** package for R. This release took quite a while to polish and get right as there was a lot of back-and-forth between **vegan** and **permute** as I tried to get the latter working nicely for both useRs and developers, whilst Jari worked on using the new **permute** API within **vegan** itself. All these changes were prompted by Cajo ter Braak taking me to task (nicely of course) over the use in previous versions of **permute** of the term “*blocks*” for what were not true blocking factors. Cajo challenged me to add true blocking factors (these restrict permutations within their levels and are never permuted, unlike *plots*), and the new version of **permute** is the result of my attempting to meet that challenge.

## New version of analogue on CRAN version 0.12-0

#### 14 December 2013 /posted in: R

It has been almost a year since the last release of the **analogue** package. At lot has happened in the intervening period and although I’ve been busy with a new job in a new country and coding on several other R packages, activity on analogue has also progressed a pace. As the version 0.12-0 of the package hits a CRAN mirror near you, I thought I’d outline the major changes in the packages, which range from *at long last* having dissimilarity matrices computed in fast C code to lots of new functionality that makes fitting principal curves and plotting and interpreting the results much easier, from a more robust way to determine the posterior probability that two samples are analogues to rounding out the fitting of calibration models using principal components regression with ecologically-meaningful transformations.