Bayesian item response models for citizen science ecological data

16 Mar 2020  ·  Edgar Santos-Fernandez, Kerrie Mengersen ·

So-called 'citizen science' data elicited from crowds has become increasingly popular in many fields including ecology. However, the quality of this information is being frequently debated by many within the scientific community. Therefore, modern citizen science implementations require measures of the users' proficiency that account for the difficulty of the tasks. We introduce a new methodological framework of item response and linear logistic test models with application to citizen science data used in ecology research. This approach accommodates spatial autocorrelation within the item difficulties and produces relevant ecological measures of species and site-related difficulties, discriminatory power and guessing behavior. These, along with estimates of the subject abilities allow better management of these programs and provide deeper insights. This paper also highlights the fit of item response models to big data via divide-and-conquer. We found that the suggested methods outperform the traditional item response models in terms of RMSE, accuracy, and WAIC based on leave-one-out cross-validation on simulated and empirical data. We present a comprehensive implementation using a case study of species identification in the Serengeti, Tanzania. The R and Stan codes are provided for full reproducibility. Multiple statistical illustrations and visualizations are given which allow practitioners the extrapolation to a wide range of citizen science ecological problems.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper