Viral presence in the 1000 Genomes Project data​

Milana Djonovic1*, Alexej Abyzov2

1 Mayo Clinic College of Medicine and Science, Rochester, MN, USA

2 Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA

djonovic.milana [at] mayo.edu

Abstract

This study focuses on detecting natural viruses in whole-genome sequencing (WGS) data obtained as part of the 1000 Genomes Project consortium. Through the use of alignment tools, filtering reads based on their quality and uniqueness of mapping to the viral reference, and assessing coverage across the viral genome, the study aimed to achieve reliable and sensitive virus detection. Analysis of the dataset revealed the presence of natural Epstein-Barr virus (EBV) in at least 12 samples out of 2504, which is distinct from its artificial counterpart commonly used for cell line transformation. These findings are consistent with the earlier results obtained on a smaller dataset of 750 samples within the same project. In addition to natural EBV, we identified human betaherpesvirus 6A, herpesvirus 6B, herpesvirus 7, herpesvirus 4, and T-lymphotropic virus 1 in 19 samples. This study demonstrated that viruses can be detected in WGS data, and that our methodology could be applied to healthy human tissues as well. By shedding light on the presence of viruses in healthy human tissues, this research could have important implications for personalized medicine and public health initiatives.

Keywords: bioinformatics, virus detection, 1kGP, WGS, EBV