Elite Master Program Human Biology

Breadcrumb Navigation


Preprocessing and downstream analysis of scATAC-seq data


While the collection and analysis of transcriptome data from individual cells has been established for many years, the assessment of chromatin accessibility in single cells remains comparatively recent1,2. Hence, the progressing development of experimental techniques to investigate chromatin accessibility in single cells has led to the need for suitable computational tools to analyze this data type. Recently, methods have emerged to do so, mainly focusing on establishing adequate measures for quality control and how to best cluster the cells based on their shared chromatin openness. 3,4,5 Important steps towards meaningful clustering are the selection of (variable) features and dimensionality reduction, e.g. through latent semantic indexing (LSI) or principal component analysis. However, after performing dimensionality reduction on the raw matrix, it can be observed that – similarly to what is seen from scRNA-seq data – the first component often corresponds to library size. To our knowledge, there are currently only 2 main options to overcome this: (1) to not consider the first component, or (2) to perform library-size normalization. For scRNA-seq data, it has been observed, that simple library-size normalization (e.g. through calculating counts per million) is not sufficient to distinguish technical from biological variation, potentially resulting in falsely called differentially expressed genes and poorer clustering results.6,7 
To handle this, a broad palette of normalization techniques have been developed for the analysis of scRNA-seq data, improving the ability to differ between biological and technical variance 8,9,10,11. While some of these methods rely on additional info like ERCC spike-ins, one of the most commonly used approaches pools cells with similar library sizes and sums expression values to get pool-based size factors, which are then de-convoluted into cell-specific size factors. The theory behind this is that cells from the cell type will have similar library sizes due to differences in overall mRNA content between cell types. In line with that, it has been observed that cells do not only differ in their overall mRNA content, but also in their chromatin openness, e.g. stem cells showing overall higher chromatin openness than differentiated cells.12
Therefore, this project aims to develop a better normalization technique for scATAC-seq data, similar to what is available for scRNA-seq data. We hypothesize that with suitable normalization for scATAC-seq data, we improve the ability to differentiate between cell states and to confidentially call differential openness between cell states. We hope that this can eventually lead to a better understanding of which genes start to be differentially open between cell states before differential gene expression occurs.

If you are interested to work on this project, please contact maria.richter@helmholtz-muenchen.de


  1. Jason D. Buenrostro et al., “Single-Cell Chromatin Accessibility Reveals Principles of Regulatory Variation,” Nature 523, no. 7561 (July 2015): 486–90, https://doi.org/10.1038/nature14590.
  2. Darren A. Cusanovich et al., “Multiplex Single-Cell Profiling of Chromatin Accessibility by Combinatorial Cellular Indexing,” Science 348, no. 6237 (May 22, 2015): 910–14, https://doi.org/10.1126/science.aab1601.
  3. Anna Danese et al., “EpiScanpy: Integrated Single-Cell Epigenomic Analysis,” BioRxiv, May 24, 2019, 648097, https://doi.org/10.1101/648097.
  4. Tim Stuart et al., “Multimodal Single-Cell Chromatin Analysis with Signac,” BioRxiv, November 10, 2020, 2020.11.09.373613, https://doi.org/10.1101/2020.11.09.373613.
  5. Jeffrey M. Granja et al., “ArchR: An Integrative and Scalable Software Package for Single-Cell Chromatin Accessibility Analysis,” BioRxiv, April 29, 2020, 2020.04.28.066498, https://doi.org/10.1101/2020.04.28.066498.
  6. Beate Vieth et al., “A Systematic Evaluation of Single Cell RNA-Seq Analysis Pipelines,” Nature Communications 10, no. 1 (October 11, 2019): 4667, https://doi.org/10.1038/s41467-019-12266-7.
  7. Luyi Tian et al., “Benchmarking Single Cell RNA-Sequencing Analysis Pipelines Using Mixture Control Experiments,” Nature Methods 16, no. 6 (June 2019): 479–87, https://doi.org/10.1038/s41592-019-0425-8.
  8. Rhonda Bacher et al., “SCnorm: Robust Normalization of Single-Cell RNA-Seq Data,” Nature Methods 14, no. 6 (June 2017): 584–86, https://doi.org/10.1038/nmeth.4263.
  9. Catalina A. Vallejos, John C. Marioni, and Sylvia Richardson, “BASiCS: Bayesian Analysis of Single-Cell Sequencing Data,” PLOS Computational Biology 11, no. 6 (June 24, 2015): e1004333, https://doi.org/10.1371/journal.pcbi.1004333.
  10. Aaron T. L. Lun, Karsten Bach, and John C. Marioni, “Pooling across Cells to Normalize Single-Cell RNA Sequencing Data with Many Zero Counts,” Genome Biology 17, no. 1 (April 27, 2016): 75, https://doi.org/10.1186/s13059-016-0947-7.
  11. Christoph Hafemeister and Rahul Satija, “Normalization and Variance Stabilization of Single-Cell RNA-Seq Data Using Regularized Negative Binomial Regression,” Genome Biology 20, no. 1 (December 23, 2019): 296, https://doi.org/10.1186/s13059-019-1874-1.
  12. Alexandre Gaspar-Maia et al., “Open Chromatin in Pluripotency and Reprogramming,” Nature Reviews Molecular Cell Biology 12, no. 1 (January 2011): 36–47, https://doi.org/10.1038/nrm3036.