Fergus Chadwick

Rhona Rodger
Monday 8 January 2024
Date: 24 January 2024
Time: 2:00 pm

University of St Andrews

Title: Do identification guides hold the key to species misclassification by citizen scientists?

Abstract: Citizen science data often contain high levels of species misclassification that can bias inference and conservation decisions. Current approaches to address mislabelling rely on expert taxonomists validating every record. This approach makes intensive use of a scarce resource and reduces the role of the citizen scientist. 2. Species, however, are not confused at random. If two species appear more similar, it is probable they will be more easily confused than two highly distinctive species. Identification guides are intended to use these patterns to aid correct classification, but misclassifications still occur due to user-error and imperfect guidebook design. Statistical models should be able to exploit this non-randomness to learn confusion patterns from small validation data-sets provided by expert taxonomists, yielding a much-needed reduction in expert workload. Here, we use a variety of Bayesian hierarchical models to probabilistically classify species based on the species-label provided by the citizen scientist. We also explore the utility of guidebooks provided by the citizen science schemes as a prior for species similarity, and hence draw conclusions for their future improvement. 3. We find that the species-label assigned to a record by a citizen scientist, even when incorrect, contains useful information about the true species-identity. The citizen scientists correctly identify the species in around 58% of records. Using models trained on only 10% of these records (validated by experts), we can correctly predict species-identity for 69 (90%CI: 64-73)% of records when the guidebook is used, vs 64 (58-69)% for models that do not use the guidebook. The fact that misclassifications can be predicted systematically indicates that improvements could be made to the guidebook to reduce misclassification.4. By using Bayesian, hierarchical models we can greatly reduce the workload for experts by providing a probabilistic correction to citizen science records, rather than requiring manual review. This is increasingly important as the number of citizen science schemes grows and the relative number of taxonomists shrinks. By learning confusion patterns statistically, we open up future avenues of research to identify what causes these confusions and how to better address them