Introduction

MIXALIME, Mixture models for Allelic Imbalance Estimation, is a command-line Python tool for the identification of allele-specific events in high-throughput sequencing experiments (e.g. ChIP-Seq, CAGE-Seq, et cetera).
Key features:
  • Advanced statistical models accounting for
    • the so-called 'reference mapping bias' in allelic read counts;
    • overdispersion yielded by technical noise;
    • overdispersion from copy-number variants (CNVs) and aneuploidy;
  • Polished command line interface for step-by-step analysis;
  • Built-in quality control plots;
  • Direct support of BED and VCF files;
  • Built-in parallelization.
For a quick introduction to the allelic imbalance estimation with MIXALIME, please see the Quickstart section of the tutorial.

MIXALIME Features

MIXALIME accounts for the read mapping 'reference bias'
Effects of the mapping bias [1], affecting the common imbalance of read counts, are mitigated by separately fitting the model distributions for the Reference and the Altenative alleles.
Various models
Multiple model distributions are available at the user's disposal that accommodate different data with varying degrees of technical noise and overdispersion.
Supports BADs/CNVs
MIXALIME allows to explicitly account for the background allelic dosage yielded by CNVs or aneupoidy. To this end, we employ a mixture model when relative allelic copy numbers are known or estimated from the data (see BABACHI).
Smart filtration of low-coverage SNVs
Low-coverage SNVs are usually filtered out of the dataset as being specifically beset by noise and erroneous SNP calls. We explicitly account for filtering by the means of introducing left-truncated distributions into the model.
Differential tests
MIXALIME can also identify differential allele-specificity between two groups of samples.
Versatile
Plethora of options to configure the tool for one's very specific needs;
Classic binomial scoring
For those old school or suspicious, MIXALIME can also do scoring with a conventional well-known binomial and beta-binomial models.

Does it work?

Before releasing MIXALIME to the masses, we have tested its applicability and performance on various datasets, originating from varying sequencing experiments: ChIP-Seq [2, 3], CAGE-Seq [4], ATAC-Seq and DNase-Seq [5]. The UDACHA database was built with the help of MIXALIME.

Acknowledgements

This work has been supported by Russian Science Foundation grant 20-74-10075.

References

[1]
Brandt, D. Y. C., Aguiar, V. R. C., Bitarello, B. D., Nunes, K., Goudet, J., & Meyer, D. (2015). Mapping Bias Overestimates Reference Allele Frequencies at the HLA Genes in the 1000 Genomes Project Phase I Data. In G3 Genes|Genomes|Genetics (Vol. 5, Issue 5, pp. 931–941). Oxford University Press (OUP).
DOI: 10.1534/g3.114.015784
[2]
Abramov, S., Boytsov, A., Bykova, D., Penzar, D. D., Yevshin, I., Kolmykov, S. K., Fridman, M. V., Favorov, A. V., Vorontsov, I. E., Baulin, E., Kolpakov, F., Makeev, V. J., & Kulakovskiy, I. V. (2021). Landscape of allele-specific transcription factor binding in the human genome. Nature communications, 12(1), 2751.
DOI: 10.1038/s41467-021-23007-0
[3]
Boytsov, A., Abramov, S., Aiusheeva, A. Z., Kasianova, A. M., Baulin, E., Kuznetsov, I. A., Aulchenko, Y. S., Kolmykov, S., Yevshin, I., Kolpakov, F., Vorontsov, I. E., Makeev, V. J., & Kulakovskiy, I. V. (2022). ANANASTRA: annotation and enrichment analysis of allele-specific transcription factor binding at SNPs. In Nucleic Acids Research (Vol. 50, Issue W1, pp. W51–W56). Oxford University Press (OUP).
DOI: 10.1093/nar/gkac262
[4]
Deviatiiarov, R. M., Gams, A., Kulakovskiy, I. V., Buyan, A., Meshcheryakov, G., Syunyaev, R., Singh, R., Shah, P., Tatarinova, T. V., Gusev, O., & Efimov, I. R. (2023). An atlas of transcribed human cardiac promoters and enhancers reveals an important role of regulatory elements in heart failure. In Nature Cardiovascular Research (Vol. 2, Issue 1, pp. 58–75). Springer Science and Business Media LLC.
DOI: 10.1038/s44161-022-00182-x
[5]
Buyan A., Meshcheryakov G., Safronov V., Abramov S., Boytsov A., Nozdrin V., Baulin E.F., Kolmykov S., Vierstra J., Kolpakov F., Makeev V.J., Kulakovskiy I.V. (2023). Statistical framework for calling allelic imbalance in high-throughput sequencing data. In bioRxiv.
DOI: 10.1101/2023.11.07.565968