Factors influencing taxonomic unevenness in scientific research: A mixed-methods case study of non-human primate genomic sequence data generation

Scholars have often noted major disparities in the extent of scientific research conducted among taxonomic groups. Such trends may cascade if future scientists gravitate towards study species with more data and resources already available. As new technologies emerge, do research studies employing these technologies continue these disparities? Here, using non-human primates as a case study, we first identified disparities in recently-generated massively-parallel genomic sequencing data and we then conducted interviews with the scientists who produced these data to learn their motivations when selecting species for study. Specifically, we tested whether variables including publication history and conservation status were significantly correlated with publicly-available sequence data in the NCBI Sequence Read Archive. Of the 179.6 terabases (Tb) of sequence data in this database for 519 non-human primate species, 135 Tb (~75%) were from only five species: rhesus macaques, olive baboons, green monkeys, chimpanzees, and crab-eating macaques. The strongest individual predictors of the amount of genomic data were the total number of non-medical scholarly publications (linear regression; r2=0.37; P=6.15x10-12) and number of medical publications (r2=0.27; P=9.27x10-9). In a generalized linear model, the number of non-medical publications (P=0.00062) and closer phylogenetic distance to humans (P=0.023) were the most predictive of the amount of genomic sequence data. We interviewed 33 authors of genomic data-producing publications and analyzed their responses using a grounded theory approach. Consistent with our quantitative results, authors mentioned that their choices of species were motivated by sample accessibility, prior published work, and perceived relevance (especially health-related) to humans. Our mixed-methods approach helped us to identify and contextualize some of the driving factors behind species-uneven patterns of scientific research, which can now be considered by funding agencies, scientific societies, and research teams aiming to align their broader goals with future data generation efforts.

Files

Metadata

Work Title Factors influencing taxonomic unevenness in scientific research: A mixed-methods case study of non-human primate genomic sequence data generation
Access
Open Access
Creators
  1. Margarita Hernandez
Keyword
  1. Taxonomic bias; model organisms; massively-parallel sequencing; ethnography of scientists; species conservation status
License Attribution 4.0 International (CC BY 4.0)
Work Type Dataset
DOI doi:10.26207/j7nd-ka67
Deposited April 16, 2020

Versions

Analytics

Collections

This resource is currently not in any collection.

Work History

Version 1
published

  • Created
  • Added PGP_Data_FINAL.csv
  • Added Creator Margarita Hernandez
  • Published