Seqminer2: An efficient tool to query and retrieve genotypes for statistical genetics analyses from biobank scale sequence dataset

Here, we present a highly efficient R-package seqminer2 for querying and retrieving sequence variants from biobank scale datasets of millions of individuals and hundreds of millions of genetic variants. Seqminer2 implements a novel variant-based index for querying VCF/BCF files. It improves the speed of query and retrieval by several magnitudes compared to the state-of-the-art tools based upon tabix. It also reimplements support for BGEN and PLINK format, which improves speed over alternative implementations. The improved efficiency and comprehensive support for popular file formats will facilitate method development, software prototyping and data analysis of biobank scale sequence datasets in R. Availability and implementation: The seqminer2 R package is available from https://github.com/zhanxw/seqminer. Scripts used for the benchmarks are available in https://github.com/yang-lina/seqminer/blob/master/seqminer2%20benchmark%20script.txt.

Files

Metadata

Work Title Seqminer2: An efficient tool to query and retrieve genotypes for statistical genetics analyses from biobank scale sequence dataset
Access
Open Access
Creators
  1. Lina Yang
  2. Shuang Jiang
  3. Bibo Jiang
  4. Dajiang J. Liu
  5. Xiaowei Zhan
License In Copyright (Rights Reserved)
Work Type Article
Publisher
  1. Bioinformatics
Publication Date October 1, 2020
Publisher Identifier (DOI)
  1. https://doi.org/10.1093/bioinformatics/btaa628
Deposited July 19, 2021

Versions

Analytics

Collections

This resource is currently not in any collection.

Work History

Version 1
published

  • Created
  • Added btaa628.pdf
  • Added btaa628_supplementary_data.docx
  • Added Creator Lina Yang
  • Added Creator Shuang Jiang
  • Added Creator Bibo Jiang
  • Added Creator Dajiang J. Liu
  • Added Creator Xiaowei Zhan
  • Published
  • Updated
  • Updated
  • Updated