Summarising multivariate palaeoenvironmental data part 2
09 January 2014 /posted in: R
The horseshoe effect is a well known and discussed issue with principal component analysis (PCA) (e.g. Goodall 1954; Swan 1970; Noy-Meir & Austin 1970). Similar geometric artefacts also affect correspondence analysis (CA). In part 1 of this series I looked at the implications of these “artefacts” for the recovery of temporal or single dominant gradients from multivariate palaeoecological data. In part 2, I introduce the topic of principal curves (Hastie & Stuetzle 1989).
Goodall D.W. et al. (1954) Objective methods for the classification of vegetation. III. An essay in the use of factor analysis. Australian Journal of Botany 2, 304–324.
Hastie T. & Stuetzle W. et al. (1989) Principal Curves. Journal of the American Statistical Association 84, 502–516.
Noy-Meir I. & Austin M.P. et al. (1970) Principal Component Ordination and Simulated Vegetational Data. Ecology 51, 551–552.
Swan J.M.A. et al. (1970) An Examination of Some Ordination Problems By Use of Simulated Vegetational Data. Ecology 51, 89–102.
Decluttering ordination plots part 4: orditkplot()
31 December 2013 /posted in: R
Earlier in this series I looked at the
ordilabel() and then the
orditorp() functions, and most recently the
ordipointlabel() function in the vegan package as means to improve labelling in ordination plots. In this, the fourth and final post in the series I take a look at
orditkplot(). If you’ve created ordination diagrams before or been following the previous posts in the irregular series, you’ll have an appreciation for the problems of drawing plots that look, well, good! Without hand editing the diagrams, there is little that even
ordipointlable() can do for you if you want a plot created automagically.
orditkplot() sits between the automated methods for decluttering ordination plots I’ve looked at previously and hand-editing in dedicated drawing software like Inkscape or Illustrator, and allows some level of tweaking the locations of labelled points within R.
Summarising multivariate palaeoenvironmental data part 1
28 December 2013 /posted in: R
Ordination methods that yield orthogonal axes of variation are often used to summarise the multivariate data obtained from sediment cores. Usually the first or, less often, the first few ordination axes are taken as directions of change or the main patterns of variance in the multivariate data. There is an oft-overlooked issue with this approach that has the potential to complicate the interpretation of the extracted axes, especially where there is a single or strong gradient in the data.
New version of permute on CRAN version 0.8-0
17 December 2013 /posted in: R
After some time brewing on my machines, I’m happy to have released a new version of my permute package for R. This release took quite a while to polish and get right as there was a lot of back-and-forth between vegan and permute as I tried to get the latter working nicely for both useRs and developers, whilst Jari worked on using the new permute API within vegan itself. All these changes were prompted by Cajo ter Braak taking me to task (nicely of course) over the use in previous versions of permute of the term “blocks” for what were not true blocking factors. Cajo challenged me to add true blocking factors (these restrict permutations within their levels and are never permuted, unlike plots), and the new version of permute is the result of my attempting to meet that challenge.
New version of analogue on CRAN version 0.12-0
14 December 2013 /posted in: R
It has been almost a year since the last release of the analogue package. At lot has happened in the intervening period and although I’ve been busy with a new job in a new country and coding on several other R packages, activity on analogue has also progressed a pace. As the version 0.12-0 of the package hits a CRAN mirror near you, I thought I’d outline the major changes in the packages, which range from at long last having dissimilarity matrices computed in fast C code to lots of new functionality that makes fitting principal curves and plotting and interpreting the results much easier, from a more robust way to determine the posterior probability that two samples are analogues to rounding out the fitting of calibration models using principal components regression with ecologically-meaningful transformations.
Draft Tri-Agency Open Access policy my response
10 December 2013 /posted in: Science
In October this year, the Natural Sciences and Engineering Research Council of Canada (NSERC), the Social Sciences and Humanities Research Council of Canada (SSHRC), and the Canadian institute of Health Research (CIHR) published a draft policy on Open Access for consultation. The consultation period ends in a few days, on December 13th. NSERC also has a preamble to the proposed policy. In general, the proposed policy is as progressive as those from the US NIH and NSF, the ERC, and RCUK. However, as it currently stands, the policy lacks detail on acceptable Open Access terms, unduly favours STEM publishers with a 12-month embargo period allowed on the self-archiving option, and fails to address other important aspects of the scientific record, namely research data, software and data analysis scripts. My response to the consultation is appended below. I’ve also produced a PDF should you prefer to read in that medium.
Time series plots in R with lattice & ggplot
23 October 2013 /posted in: R
I recently coauthored a couple of papers on trends in environmental data (Curtis & Simpson in press; Monteith et al. in press), which we estimated using GAMs. Both papers included plots like the one shown below wherein we show the estimated trend and associated point-wise 95% confidence interval, plus some other markings. The coloured sections show where the estimated trend is changing in a statistically significantly manner, i.e. where a 95% confidence interval on the first derivative (rate of change) of the trend does not include 0. That particular figure and the others in the papers were drawn using the lattice package (Sarkar 2008), but I could just have easily used ggplot2 (Wickham 2009) instead. I was recently asked via email how I produced the figures in the paper. Rather than just reply to that email, I thought I’d knock up a quick post for my blog to show how it was done.
Curtis C.J. & Simpson G.L. et al. (in press) Trends in bulk deposition of acidity in the UK, 1988–2007, assessed using additive models. Ecological Indicators.
Monteith D.T., Evans C.D., Henrys P.A., Simpson G.L. & Malcolm I.A.et al. (in press) Trends in the hydrochemistry of acid-sensitive surface waters in the UK 1988–2008. Ecological Indicators.
Sarkar D. et al. (2008) Lattice: Multivariate Data Visualization with R. Springer, New York.
Wickham H. et al. (2009) ggplot2: elegant graphics for data analysis. Springer New York.
Using Arial in R figures destined for PLOS ONE
09 September 2013 /posted in: R
Despite the refreshing change that the journal PLOS ONE represents in terms of open access and an refreshing change to the stupidity that is quality/novelty selection by the two or three people that review a paper, it’s submission requirements are far less progressive. Yes they make you jump through a lot of hoops getting your figures and tables just so, and I can appreciate why they want some control over this in terms of the look and feel of the journal. A couple of things grate though:
Open data and Ecology
27 August 2013 /posted in: Science
Open science was present in good order at the recent ESA meeting in Minneapolis. Much of what was being discussed under that broadest of headings, open science, was the reproducibility of the science we do and one critical aspect of this is free, open access to data. Openly sharing data that underlie research publications is a rapidly-developing area of the scientific landscape faced today by scientists, not just ecologists; many journals now require data that support research papers be deposited under a permissive licence in approved repositories, such as Dryad or figshare, and a number of journals have been founded specifically to cater for the publication of data papers, including Ubiquity Press’ the Journal of Open Archeological Data, Nature Publishing Group’s forthcoming Scientific Data, and Wiley’s Geoscience Data Journal. Unfortunately, ecologists are more likely to be known for the iron-like grip with which the cling to their hard-won data. Into this landscape, Stephanie Hampton and colleagues (Hampton et al. 2013) published (it’s been online for a few months) a paper in Frontiers in Ecology and Environment; Big data and the future of ecology
Hampton S.E., Strasser C.A., Tewksbury J.J., Gram W.K., Budden A.E. & Batcheller A.L.et al. (2013) Big data and the future of ecology. Frontiers in Ecology and the Environment 11, 156–162.
16 July 2013 /posted in: Science
Last year, Rong Wang and colleagues (Wang et al. 2012) published a very nice paper in Nature, which claimed to have observed flickering, an early warning indicator of an approaching critical transition, in a diatom sediment sequence from Erhai Lake, Yunnan, China. What was particularly pleasing about this paper was that the authors had tried to use the sediment record to investigate whether we see signs of early warning indicators prior to a transition between stable states. It was refreshing to not see a transfer function!
Wang R., Dearing J.A., Langdon P.G., Zhang E., Yang X. & Dakos V.et al. (2012) Flickering gives early warning signals of a critical transition to a eutrophic lake state. Nature 492, 419–422.