Multisource single-cell data integration by MAW barycenter for Gaussian mixture models

One key challenge encountered in single-cell data clustering is to combine clustering results of data sets acquired from multiple sources. We propose to represent the clustering result of each data set by a Gaussian mixture model (GMM) and produce an integrated result based on the notion of Wasserstein barycenter. However, the precise barycenter of GMMs, a distribution on the same sample space, is computationally infeasible to solve. Importantly, the barycenter of GMMs may not be a GMM containing a reasonable number of components. We thus propose to use the minimized aggregated Wasserstein (MAW) distance to approximate the Wasserstein metric and develop a new algorithm for computing the barycenter of GMMs under MAW. Recent theoretical advances further justify using the MAW distance as an approximation for the Wasserstein metric between GMMs. We also prove that the MAW barycenter of GMMs has the same expectation as the Wasserstein barycenter. Our proposed algorithm for clustering integration scales well with the data dimension and the number of mixture components, with complexity independent of data size. We demonstrate that the new method achieves better clustering results on several single-cell RNA-seq data sets than some other popular methods.

This is the peer reviewed version of the following article: [Multi‐source single‐cell data integration by MAW barycenter for gaussian mixture models. Biometrics 79, 2 p866-877 (2022)], which has been published in final form at https://doi.org/10.1111/biom.13630. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions: https://authorservices.wiley.com/author-resources/Journal-Authors/licensing/self-archiving.html#3.

Files

  • main.pdf

    size: 3.94 MB | mime_type: application/pdf | date: 2024-03-11 | sha256: 2216b58

Metadata

Work Title Multisource single-cell data integration by MAW barycenter for Gaussian mixture models
Access
Open Access
Creators
  1. Lin Lin
  2. Wei Shi
  3. Jianbo Ye
  4. Jia Li
Keyword
  1. Gaussian mixture model
  2. Integrative analysis
  3. Minimized aggregated Wasserstein distance
  4. Multisource Single-cell data
  5. Wasserstein barycenter
License In Copyright (Rights Reserved)
Work Type Article
Publisher
  1. Biometrics
Publication Date February 27, 2022
Publisher Identifier (DOI)
  1. https://doi.org/10.1111/biom.13630
Deposited March 11, 2024

Versions

Analytics

Collections

This resource is currently not in any collection.

Work History

Version 1
published

  • Created
  • Added main.pdf
  • Added Creator Lin Lin
  • Added Creator Wei Shi
  • Added Creator Jianbo Ye
  • Added Creator Jia Li
  • Published
  • Updated Keyword, Publication Date Show Changes
    Keyword
    • Gaussian mixture model, Integrative analysis, Minimized aggregated Wasserstein distance, Multisource Single-cell data, Wasserstein barycenter
    Publication Date
    • 2023-06-01
    • 2022-02-27
  • Updated