
Project Area A
Tools for nucleotide sequences and regulation
Project Area A develops computational methods to understand viruses directly from their nucleotide sequences, using large-scale genome data to reconstruct viral populations, track their evolution, and reveal how they interact with hosts.
Why this matters
Viral behaviour, host range, and pathogenicity are encoded at the nucleotide-sequence level
Viruses exist as quasispecies—populations of many related genome variants rather than single genomes
Understanding haplotypes and co-occurring mutations is essential for:
Virus classification
Evolutionary analysis
Predicting host specificity and virulence
This knowledge supports both pandemic preparedness and phage-based therapies, integrating human, animal, and environmental health within the One Health framework
Why now?
Petabase-scale virus discovery from global high-throughput sequencing
Rapid expansion of virus genome resources through NFDI4Microbiota
Full viral haplotype sequencing is now technically feasible
These advances make it possible to analyse viral evolution and host interaction at the level of complete viral populations
Methods used
High-throughput sequencing
Phylogenetics
Machine learning
Infection models
Transcriptome and expression analysis
RNA secondary-structure analysis
NFDI4Microbiota data infrastructure
Tools and methods to be developed
qs-SVG (quasispecies sequence-variation graph)
Quasispecies and haplotype reconstruction
Accurate quasispecies-aware phylogenetic reconstruction
Full virus genome alignment construction
Advanced virus genome annotation
Virulence and host-range prediction
Hybrid dry-lab / wet-lab neural networks
Do viruses exploit their quasispecies for host range evolution?
In A01 (Dutilh/Küsel), we are developing a computational tool to quantify the size, diversity, and structure of virus quasispecies—the complex populations of closely related viral genomes. Using both short- and long-read sequencing data, we will construct a quasispecies sequence variation graph (qs-SVG) that represents all viral haplotypes, tracks mutations, and links them to functional annotations.
This approach will allow virus identification at the single-haplotype level, a capability not yet established for viruses. Initially focused on phages, the tool will later expand to prophages, enabling insights into viral diversity across different environments:
Eutrophic habitats (high bacterial density) are expected to contain phages with a narrow host range
Oligotrophic habitats (low bacterial density) may host phages with a broader host range, spanning 3–4 orders of magnitude
A key goal is to understand how different haplotypes influence host selection and viral pathogenicity. Beyond phages, the tool will be applied to:
Endogenous retroviruses (ERVs) through A03
SARS-CoV-2 with B04, evaluating how quasispecies shape host responses in collaboration with A03 and C04
Influenza viruses through C02
Finally, using phages as a model, we aim to distinguish ancient viral DNA from recent infections, addressing fundamental questions in viral evolution.
This project builds on extensive existing datasets from the former CRC AquaDiva at FSU Jena, enabling a head start in mapping viral diversity and function.
Project Leaders
Prof. Dr. Bas E. Dutilh
Institute of Biodiversity, Ecology, and Evolution,
Friedrich Schiller University Jena
Prof. Dr. Kirsten Küsel
Institute of Biodiversity, Ecology, and Evolution,
Friedrich Schiller University Jena
Phylogeny of functional sequence elements in virus genomes
Virus phylogenies are commonly built from selected open reading frames (ORFs) or genes and ignore recent discoveries on virus genome complexity from functional genomics (omics) studies on virus-infected cells. These omics studies use RNA-seq, Ribo-seq, SHAPE-seq or other sequencing-based assays and vastly extended our knowledge on virus genomes by detecting numerous novel functional sequence elements (FSEs). However, these studies commonly ignore one fundamental question: Are these novel FSEs conserved during virus evolution and thus likely to play an important role in the virus life cycle?
FSEs identified in omics studies include short ORFs (sORFs) with <100 nucleotides, e.g., upstream ORFs (uORFs) within 5’ untranslated regions (UTRs) of other ORFs, or alternative proteins generated from the same locus through programmed ribosomal frameshifting or alternative splicing, Fig. A02.1. In addition, novel virus non-coding RNAs like circular RNAs (circRNAs) and microRNAs (miRNAs) have been discovered. Furthermore, binding sites of host RNA and DNA binding proteins in virus DNA or RNA can now be determined at large scale. These FSEs cannot be predicted from sequence alone and some FSEs have to form specific RNA structures to be functional.
To date, no standardised, comprehensive tool is available to detect different types of virus FSEs from omics data and analyse their conservation; existing phylogenetics approaches focus only on protein-coding genes. In this project, we will close this gap by developing tools to identify FSEs that are conserved in sequence and/or structure for (1) reconstructing their evolutionary histories; (2) incorporating them into robust virus phylogenies; and (3) predicting potential functional roles. As recombination is an important evolutionary process that affects many viruses, we will implement a method for recombination-aware reconstruction of phylogenies. We will therefore contribute to central goals G1, G2, and G3 of the CRC VirusREvolution. Our tools will initially be developed for SARS-CoV-2, vibriophage N4, hepatitis B virus (HBV), and herpes simplex virus 1 (HSV-1) and will be generalised to other viruses in subsequent funding phases. Here, inclusion of ancient HBV and HSV-1 genomes and recombination events will also enable us to describe the evolutionary histories of viruses spanning several thousand years. Genome annotations extended with conserved FSEs will be incorporated into VirJenDB within NFDI4Microbiota.
Project Leaders
Prof. Dr. Caroline Friedel
Institute for Informatics,
Ludwig-Maximilians-University Munich
Dr. Denise Kühnert
Centre for Artificial Intelligence in Public Health,
Robert Koch Institute
Detecting time-resolved and virulence-associated host responses to virus infection
Understanding the cell’s transcriptional response to virus infections is crucial for comprehending the host’s molecular defence, the pathogen’s strategies to circumvent these mechanisms, and hence the virus capability to cause severe disease (i.e. their virulence). However, currently available methods are not sufficient to accurately determine the cellular transcription response, especially for viruses that cause a global host cell shut-off. In these cases, available computational read normalisation strategies prohibit an accurate analysis of differential gene expression. Moreover, current off-the-shelf methods also cannot assess the expression of all relevant transcript classes, in particular the highly repetitive transposable elements (TEs) that have been reported to trigger innate immunity in virus-infected cells. Currently used analysis pipelines do not yet enable the assessment of individual TE copy expression.
Project Leaders
Prof. Dr. Steve Hoffmann
Faculty of Biological Sciences,
Friedrich Schiller University Jena,
Leibniz Institute on Aging, Fritz Lipmann Institute
Prof. Dr. Friedemann Weber
Institute of Virology,
Veterinary Medicine,
Justus Liebig University Giessen
Harnessing synthetic small RNAs to probe, decode, and optimise phage-host interactions
Project Leaders
Prof. Dr. Manja Marz
Faculty of Mathematics and Computer Science,
Friedrich Schiller University Jena,
Director European Virus Bioinformatics Center
Prof. Dr. Kai Papenfort
Institute for Microbiology
