Title : Evolutionary trajectory and origin of SARS-CoV-2
Abstract:
Alignment-based phylogenetics fails to uncover the evolutionary trajectory and origin of SARS-CoV-2. This study develops a novel alignment-free system based on Fréchet distance and artificial recurrent neural network to reveal the evolutionary trajectory and origin of SARS-CoV-2 from more than two millions of genome sequences. SARS-CoV-2 gradually deletes its genome during its evolution, but the mutation rate varies with genome features. Among single nucleotides, C mutates fast but T changes slowly. C-prefix dinucleotide (e.g. CG and CT) loses dramatically fast during COVID-19 outbreaks. Similarly, the virus genome also deletes several trinucleotides prefixed by C (e.g.CCT). Interestingly, trinucleotide CCT and CT centrally control the entire SARS-CoV-2 genome, and their evolutionary trajectories fit COVID-19 case spike. Therefore C-prefixed feature deletions primarily contribute to
COVID-19 pandemic. SARS-CoV-2 variants undergo both vertical and horizontal deletions during their evolutionary trajectories to infect humans. Gradual vertical mutations help variants to increase their infection, but over-mutations reduce their infection potential. Horizontal mutations make variants gain diverse mutation markers, which dramatically increase virus infection capacity, leading to COVID-19 pandemic. Mink coronavirus variants mutate 56 genome features similar to SARS-CoV-2 wild type and they are the most likely origin of SARS-CoV-2. This origin trajectory follows the order: mink, cat, tiger, mouse, hamster, dog, lion, gorilla, leopard, bat, and pangolin. Therefore, the C-feature dynamic pattern serves as the key mark of the evolutionary trajectory SARS-CoV-2 genome, and mink-origin SARS-CoV-2 gradually deletes its genome to carry diverse mutations, causing COVID-19 pandemic