SyMetrics: an integrated machine learning model for evaluating the pathogenicity of synonymous variants in the human genome

January 15, 2026

Linnaeus Bundalian 1 2, Martina Schmidt Strnadová 3, Felix Garten 1, Susanne Horn 3, Udo Stenzel 3, Denny Popp 1, Johannes R Lemke 1, Saskia Biskup 4, Björn Schulte 4, Patrick May 5, Frank Bösebeck 6, Antje Garten 7, Doreen Thor 3, Angela Schulz 3, Julia Hentschel 1, Janet Kelso 8, Torsten Schöneberg 3 9, Diana Le Duc 1 2 8 10

Abstract

Synonymous single nucleotide variants (sSNVs), traditionally seen as neutral, are now recognized for their biological impact. To assess their relevance, we developed SyMetrics, a framework that integrates predictors of splicing, RNA stability, evolutionary conservation, codon usage, synonymous variation effects, sequence properties, and allele frequency. We analyzed all possible sSNVs across the human genome, and our machine-learning model achieved 97% accuracy in distinguishing deleterious from benign variants, with a ROC-AUC of 0.89, outperforming individual predictors. Our estimates indicate that about 1.98 ± 0.17% of sSNVs absent from population databases are damaging (roughly 900 000 sSNVs), with an odds ratio of 3.87 for deleteriousness compared to common sSNVs (P < 0.05). To validate predictions, we performed functional assays on selected sSNVs in the AVPR2 gene and additionally used available large scale mutagenesis screens of RAD51C and BAP1 variants. In a clinical cohort, we identified 15 predicted deleterious sSNVs in genes linked to patient phenotypes; 9 were classified as (likely) pathogenic while 6 were variants of uncertain significance (VUS) per American College of Medical Genetics guidelines. For three VUS, segregation data supported their suspected inheritance patterns (de novo, X-linked). Our findings underscore the functional importance of sSNVs. To support further research and clinical applications, we provide a Python package and web application (https://symetrics.org/) for evaluating these variants comprehensively.