The ALA and Big Data for Biodiversity

On Wednesday 9 December, Chris Roach and I attended a webinar hosted by the Atlas of Living Australia (ALA), celebrating its 10 years of existence and showcasing research into the role of Big Data and data science modelling techniques in managing Australian biodiversity. It was a chance for me to also reflect on my journey in parallel with the ALA in the early days when I was at the Western Australian Museum. I was involved there in aligning the Arachnology database fields with the TDWG Darwin Core standard, so the web team could mobilise our data; then later in environmental consulting; and now here at Gaia Resources where we share much of the ideals of the ALA in enabling open biodiversity data sharing and aligning to internationally recognised standards.

The following provides a summary of some of the important research that was described in this particular seminar series of three speakers.

With platforms such as the ALA, the amount of biodiversity data available has dramatically increased in the last 10 years and empowered biodiversity conservation with so much more confidence in actions undertaken; but many of the ecological challenges that we have faced in the past still remain. These challenges can be summed up in three main areas:

  • Sampling bias,
  • Incomplete coverage and,
  • Data quality.

Professor Melodie McGeoch (La Trobe University) discussed the importance of not just focusing on documenting populations of threatened, vulnerable, and endangered species; but also the need to recognise the importance of occurrence data for “common” species. Whether a species is recognised as common depends on temporal trends, local abundance, and spatial range; and significant declines in any of these areas may go unnoticed when a species is thought to be common enough not to require frequent monitoring. In terms of identifying refuges for preventing diversity and biomass decline, Prof. McGeoch advocated for the modelling of ALA and other data of both rare and common species at a more localised level to understand geographic variation and abundance over time.

PhD candidate Tianxiao (August) Hao (University of Melbourne) used his research in fungal diversity in Australia to show the rapid increase in data availability. Some of this data, however, is unreliable, and so careful consideration must be taken prior to analysis as to whether the data is of a high enough standard to be useful. He acknowledged the new technology and rigorous screening that new data submitted to the ALA undergoes and the large clean up operation that is underway to increase the quality of legacy data.

Both August and Professor Jane Elith (University of Melbourne) demonstrated how the available data is still biased greatly by sampling effort due to environmental or logistical constraints. It makes sense that the easiest to reach places, such as areas near population centres, coastlines and, along roads are the most heavily sampled.

Professor Elith also highlighted the much forgotten bias introduced by a deficiency in absence data. Most ‘observation’ records are for presence data, but having knowledge of what areas have been sampled (and how) without finding occurrences, is possibly of equal significance to documenting the presence of species. Predictive modelling of species distributions are so much more powerful when they can account for bias and ideally this presence-absence type of data capture should be integrated into research and citizen science initiatives.

Professor Elith showcased the eBird initiative as a good example of where using citizen science can provide comprehensive coverage of occurrence data over time.

Gaia Resources is no stranger to considerations of presence-absence data and has developed several Citizen Science solutions over the years. We have also worked with conservation groups like the Great Victoria Desert Biodiversity Trust to plan habitat survey strategies (check out our blog here).

With the help of open-access biodiversity data such as that provided by the ALA, we can all play a part in overcoming the challenges faced in conservation. Here’s to the next 10 years!

If you’d like to know more about this topic or would like to discuss your own Big Data and biodiversity projects, please drop me a line at, or connect with us on Twitter, LinkedIn or Facebook.


Comments are closed.