Handling missing values in trait data

Johnson, Thomas F.; Isaac, Nick J. B.; Paviolo, Agustin; Gonzalez-Suarez, Manuela

Download

Preview

Text (Open Access)
- Published Version
· Available under License Creative Commons Attribution.

Text
- Accepted Version
· Restricted to Repository staff only

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Johnson, T. F., Isaac, N. J. B., Paviolo, A. and Gonzalez-Suarez, M. ORCID: https://orcid.org/0000-0001-5069-8900 (2020) Handling missing values in trait data. Global Ecology and Biogeography, 30 (1). pp. 51-62. ISSN 1466-8238 doi: 10.1111/geb.13185

Abstract/Summary

Aim Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Researchers traditionally have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, yet trait data are often not missing at random (e.g. more data for bigger species). Here we evaluate the performance of approaches for handling missing values considering biased datasets. Location Any Time period Any Major taxa studied Any Methods We simulated continuous traits and separate response variables to test performance of nine imputation methods and complete-case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating error in imputed trait values (deviation from the true value) and inferred trait-response relationships (deviation from the true relationship between a trait and response). Results Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response-trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete-case analysis frequently outperforming Mice imputation, and to a lesser degree BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions Imputation can effectively handle missing data under some conditions, but is not always the best solution. None of the methods we tested could effectively deal with severe biases, which may be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimise errors.

Altmetric Badge

Dimensions Badge

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/92382
Identification Number/DOI	10.1111/geb.13185
Refereed	Yes
Divisions	Life Sciences > School of Biological Sciences > Ecology and Evolutionary Biology
Publisher	Wiley
Download/View statistics	View download statistics for this item

Download Statistics

Downloads

Downloads per month over past year

Deposit Details

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	26 Aug 2020 10:40	Date item deposited into CentAUR
Last Modified:	07 Jun 2026 16:50	Date item last modified