Microbial contamination and composition of oral samples subjected to clinical whole genome sequencing

February 07, 2023

Abhishek Kumar 1Volha Skrahina 1Joshua Atta 1Veronika Boettcher 2Nicola Hanig 2Arndt Rolfs 1 3Gabriela Oprea 1Najim Ameziane 1

Abstract

Biological material from the oral cavity is an excellent source of samples for genetic diagnostics. This is because collection is quick, easy-to-access, and non-invasive. We have set-up clinical whole genome sequence testing for patients with suspected hereditary disease. Beside the excellent quality of human DNA that can be isolated from such samples, we observed the presence of non-human DNA sequences at varying percentages. We investigated the proportion of non-human mapped reads (NHMR) sequenced from buccal swabs and saliva, the type of microbial genomes from which they were derived, and impact on molecular classification. Read sequences that did not map to the human reference genome were aligned to complete reference microbial reference sequences from the National Center for Biotechnology Information’s (NCBI) RefSeq database using Kraken2. Out of 765 analyzed samples over 80% demonstrated more than 5% NHMRs. The majority of NHMRs were from bacterial genomes (average 69%, buccal swabs and 54% saliva), while the proportion of viruses was low, averaging 0.32% (buccal swabs) and 0.07% (saliva). We identified more than 30 different bacterial families of which Streptococcus mitis and Rothia mucilaginosa were the most common species. Importantly, the level of contamination did not impact the diagnostic yield.

Keywords: WGS; genetic diagnostics; gentic testing; microbial contamination; molecular diagnostics.