Size-Based Expectation Maximization for Characterizing Nucleosome Positions and Subtypes

Genome-wide nucleosome profiles are predominantly characterized using MNase-seq, which involves extensive MNase digestion and size selection to enrich for mononucleosome-sized fragments. Most available MNase-seq analysis packages assume that nucleosomes uniformly protect 147 bp DNA fragments. However, some nucleosomes with atypical histone or chemical compositions protect shorter lengths of DNA. The rigid assumptions imposed by current nucleosome analysis packages potentially prevent investigators from understanding the regulatory roles played by atypical nucleosomes. To enable the characterization of different nucleosome types from MNase-seq data, we introduce the size-based expectation maximization (SEM) nucleosome-calling package. SEM employs a hierarchical Gaussian mixture model to estimate nucleosome positions and subtypes. Nucleosome subtypes are automatically identified based on the distribution of protected DNA fragments. Benchmark analysis indicates that SEM is on par with existing packages in terms of standard nucleosome-calling accuracy metrics, while uniquely providing the ability to characterize nucleosome subtype identities. Applying SEM to a low-dose MNase-H2B-ChIP-seq data set from mouse embryonic stem cells, we identified three nucleosome types: short-fragment nucleosomes, canonical nucleosomes, and di-nucleosomes. Short-fragment nucleosomes can be divided further into two subtypes based on their chromatin accessibility. Short-fragment nucleosomes in accessible regions exhibit high MNase sensitivity and are enriched at transcription start sites (TSSs) and CTCF peaks, similar to previously reported “fragile nucleosomes.” These SEM-defined accessible short-fragment nucleosomes are found not just in promoters but also in distal regulatory regions. Additional analyses reveal their colocalization with the chromatin remodelers CHD6, CHD8, and EP400. In summary, SEM provides an effective platform for exploration of nonstandard nucleosome subtypes.

Files

Metadata

Work Title Size-Based Expectation Maximization for Characterizing Nucleosome Positions and Subtypes
Access
Open Access
Creators
  1. Jianyu Yang
  2. Kuangyu Yen
  3. Shaun Mahony
License In Copyright (Rights Reserved)
Work Type Article
Publisher
  1. Genome Research
Publication Date June 17, 2024
Publisher Identifier (DOI)
  1. https://doi.org/10.1101/gr.279138.124
Deposited July 02, 2024

Versions

Analytics

Collections

This resource is currently not in any collection.

Work History

Version 1
published

  • Created
  • Added SEM_manuscript_final.pdf
  • Added SEM_supplemental_figures_final.pdf
  • Added Creator Jianyu Yang
  • Added Creator Kuangyu Yen
  • Added Creator Shaun Mahony
  • Published
  • Updated
  • Updated Publisher Identifier (DOI), Description, Publication Date Show Changes
    Publisher Identifier (DOI)
    • https://doi.org/10.1007/978-1-0716-3989-4_21
    • https://doi.org/10.1101/gr.279138.124
    Description
    • <p>Nucleosome landscapes across the genome are typically characterized using micrococcal nuclease sequencing (MNase-seq). MNase is an endo-exonuclease that preferentially digests accessible DNA between nucleosomes.</p>
    • Genome-wide nucleosome profiles are predominantly characterized using MNase-seq, which involves extensive MNase digestion and size selection to enrich for mononucleosome-sized fragments. Most available MNase-seq analysis packages assume that nucleosomes uniformly protect 147 bp DNA fragments. However, some nucleosomes with atypical histone or chemical compositions protect shorter lengths of DNA. The rigid assumptions imposed by current nucleosome analysis packages potentially prevent investigators from understanding the regulatory roles played by atypical nucleosomes. To enable the characterization of different nucleosome types from MNase-seq data, we introduce the size-based expectation maximization (SEM) nucleosome-calling package. SEM employs a hierarchical Gaussian mixture model to estimate nucleosome positions and subtypes. Nucleosome subtypes are automatically identified based on the distribution of protected DNA fragments. Benchmark analysis indicates that SEM is on par with existing packages in terms of standard nucleosome-calling accuracy metrics, while uniquely providing the ability to characterize nucleosome subtype identities. Applying SEM to a low-dose MNase-H2B-ChIP-seq data set from mouse embryonic stem cells, we identified three nucleosome types: short-fragment nucleosomes, canonical nucleosomes, and di-nucleosomes. Short-fragment nucleosomes can be divided further into two subtypes based on their chromatin accessibility. Short-fragment nucleosomes in accessible regions exhibit high MNase sensitivity and are enriched at transcription start sites (TSSs) and CTCF peaks, similar to previously reported “fragile nucleosomes.” These SEM-defined accessible short-fragment nucleosomes are found not just in promoters but also in distal regulatory regions. Additional analyses reveal their colocalization with the chromatin remodelers CHD6, CHD8, and EP400. In summary, SEM provides an effective platform for exploration of nonstandard nucleosome subtypes.
    Publication Date
    • 2024-01-01
    • 2024-06-17