Project Area A

Tools for nucleotide sequences and regulation

Project Area A develops computational methods to understand viruses directly from their nucleotide sequences, using large-scale genome data to reconstruct viral populations, track their evolution, and reveal how they interact with hosts.

 

Why this matters
  • Viral behaviour, host range, and pathogenicity are encoded at the nucleotide-sequence level

  • Viruses exist as quasispecies—populations of many related genome variants rather than single genomes

  • Understanding haplotypes and co-occurring mutations is essential for:

    • Virus classification

    • Evolutionary analysis

    • Predicting host specificity and virulence

  • This knowledge supports both pandemic preparedness and phage-based therapies, integrating human, animal, and environmental health within the One Health framework


Why now?
  • Petabase-scale virus discovery from global high-throughput sequencing

  • Rapid expansion of virus genome resources through NFDI4Microbiota

  • Full viral haplotype sequencing is now technically feasible

  • These advances make it possible to analyse viral evolution and host interaction at the level of complete viral populations

Methods used
  • High-throughput sequencing

  • Phylogenetics

  • Machine learning

  • Infection models

  • Transcriptome and expression analysis

  • RNA secondary-structure analysis

  • NFDI4Microbiota data infrastructure


Tools and methods to be developed
  • qs-SVG (quasispecies sequence-variation graph)

  • Quasispecies and haplotype reconstruction

  • Accurate quasispecies-aware phylogenetic reconstruction

  • Full virus genome alignment construction

  • Advanced virus genome annotation

  • Virulence and host-range prediction

  • Hybrid dry-lab / wet-lab neural networks

Do viruses exploit their quasispecies for host range evolution?

In A01 (Dutilh/Küsel), we are developing a computational tool to quantify the size, diversity, and structure of virus quasispecies—the complex populations of closely related viral genomes. Using both short- and long-read sequencing data, we will construct a quasispecies sequence variation graph (qs-SVG) that represents all viral haplotypes, tracks mutations, and links them to functional annotations.

This approach will allow virus identification at the single-haplotype level, a capability not yet established for viruses. Initially focused on phages, the tool will later expand to prophages, enabling insights into viral diversity across different environments:

  • Eutrophic habitats (high bacterial density) are expected to contain phages with a narrow host range

  • Oligotrophic habitats (low bacterial density) may host phages with a broader host range, spanning 3–4 orders of magnitude

A key goal is to understand how different haplotypes influence host selection and viral pathogenicity. Beyond phages, the tool will be applied to:

  • Endogenous retroviruses (ERVs) through A03

  • SARS-CoV-2 with B04, evaluating how quasispecies shape host responses in collaboration with A03 and C04

  • Influenza viruses through C02

Finally, using phages as a model, we aim to distinguish ancient viral DNA from recent infections, addressing fundamental questions in viral evolution.

This project builds on extensive existing datasets from the former CRC AquaDiva at FSU Jena, enabling a head start in mapping viral diversity and function.


Project Leaders

Prof. Dr. Bas E. Dutilh

 Institute of Biodiversity, Ecology, and Evolution,
Friedrich Schiller University Jena

Prof. Dr. Kirsten Küsel

Institute of Biodiversity, Ecology, and Evolution,
Friedrich Schiller University Jena

Phylogeny of functional sequence elements in virus genomes

Virus phylogenies are commonly built from selected open reading frames (ORFs) or genes and ignore recent discoveries on virus genome complexity from functional genomics (omics) studies on virus-infected cells. These omics studies use RNA-seq, Ribo-seq, SHAPE-seq or other sequencing-based assays and vastly extended our knowledge on virus genomes by detecting numerous novel functional sequence elements (FSEs). However, these studies commonly ignore one fundamental question: Are these novel FSEs conserved during virus evolution and thus likely to play an important role in the virus life cycle?

FSEs identified in omics studies include short ORFs (sORFs) with <100 nucleotides, e.g., upstream ORFs (uORFs) within 5’ untranslated regions (UTRs) of other ORFs, or alternative proteins generated from the same locus through programmed ribosomal frameshifting or alternative splicing, Fig. A02.1. In addition, novel virus non-coding RNAs like circular RNAs (circRNAs) and microRNAs (miRNAs) have been discovered. Furthermore, binding sites of host RNA and DNA binding proteins in virus DNA or RNA can now be determined at large scale. These FSEs cannot be predicted from sequence alone and some FSEs have to form specific RNA structures to be functional.

To date, no standardised, comprehensive tool is available to detect different types of virus FSEs from omics data and analyse their conservation; existing phylogenetics approaches focus only on protein-coding genes. In this project, we will close this gap by developing tools to identify FSEs that are conserved in sequence and/or structure for (1) reconstructing their evolutionary histories; (2) incorporating them into robust virus phylogenies; and (3) predicting potential functional roles. As recombination is an important evolutionary process that affects many viruses, we will implement a method for recombination-aware reconstruction of phylogenies. We will therefore contribute to central goals G1, G2, and G3 of the CRC VirusREvolution. Our tools will initially be developed for SARS-CoV-2, vibriophage N4, hepatitis B virus (HBV), and herpes simplex virus 1 (HSV-1) and will be generalised to other viruses in subsequent funding phases. Here, inclusion of ancient HBV and HSV-1 genomes and recombination events will also enable us to describe the evolutionary histories of viruses spanning several thousand years. Genome annotations extended with conserved FSEs will be incorporated into VirJenDB within NFDI4Microbiota.

Project Leaders

Prof. Dr. Caroline Friedel

Institute for Informatics, 
Ludwig-Maximilians-University Munich

Dr. Denise Kühnert

Centre for Artificial Intelligence in Public Health,
Robert Koch Institute

Detecting time-resolved and virulence-associated host responses to virus infection

Understanding the cell’s transcriptional response to virus infections is crucial for comprehending the host’s molecular defence, the pathogen’s strategies to circumvent these mechanisms, and hence the virus capability to cause severe disease (i.e. their virulence). However, currently available methods are not sufficient to accurately determine the cellular transcription response, especially for viruses that cause a global host cell shut-off. In these cases, available computational read normalisation strategies prohibit an accurate analysis of differential gene expression. Moreover, current off-the-shelf methods also cannot assess the expression of all relevant transcript classes, in particular the highly repetitive transposable elements (TEs) that have been reported to trigger innate immunity in virus-infected cells. Currently used analysis pipelines do not yet enable the assessment of individual TE copy expression.

Thus, we aim to develop tools for (i) normalisation of reads to accurately measure the host transcription shutoff imposed by viruses and (ii) mapping expression of individual TE copies. Moreover, we aim to (iii) combine these methods into a tool to systematically analyse and cluster expression time series to characterise expression trajectories and infer regulatory interactions during virus infections. We expect that the development and implementation of the proposed software will subsequently enable us to improve the transcriptome-based prediction of a pathogen’s virulence.

Project Leaders

Prof. Dr. Steve Hoffmann

Faculty of Biological Sciences,
Friedrich Schiller University Jena,
Leibniz Institute on Aging, Fritz Lipmann Institute

Prof. Dr. Friedemann Weber

Institute of Virology,
Veterinary Medicine,
Justus Liebig University Giessen

Harnessing synthetic small RNAs to probe, decode, and optimise phage-host interactions

Regulatory RNAs have emerged as powerful tools in synthetic biology due to their programmability and ability to modulate gene expression with high specificity. Among these, small RNAs (sRNAs) that act through base-pairing interactions offer a versatile platform for controlling molecular processes in both prokaryotic and eukaryotic systems . Indeed, synthetic regulatory RNAs have already shown potential in metabolic engineering, gene regulation, and diagnostics. However, despite their broad regulatory utility, synthetic regulatory RNAs have not yet been broadly applied to antiviral strategies, especially those targeting RNA-RNA interactions relevant during virus infection. In viruses, RNA structures and RNA-mediated gene regulation are closely linked to replication and host manipulation, making them attractive targets for RNA-based interference. However, rationally designing effective synthetic RNAs remains a major challenge due to the complexity of RNA folding, target recognition, and the dynamic nature of virus-host interactions. Recent advances in the design of synthetic regulatory RNAs and machine learning, particularly neural networks (NN), now offer a path towards predictive modelling of these interactions. Furthermore, integrating experimental feedback into model training holds promise for accelerating the design-test-learn cycle of the synthetic biology toolbox.
In this project, we aim to close this gap by systematically and adaptively optimising antiviral RNAs that target virus and/or host RNAs that are required for virus infection and replication. Specifically, we will develop a neural networkbased tool that integrates predictive modelling of RNA-RNA interactions with experimental feedback to optimise synthetic antiviral RNAs. The tool will focus on targeting both coding and non-coding features of virus and host RNAs to disrupt virus entry, replication, and exit. Our initial work will concentrate on bacteriophages, with future expansion to eukaryotic viruses.
The tool will learn from wet-lab data to improve predictions of functional RNA interactions and guide the identification of more potent RNA molecules in iterative cycles. By modelling both interacting and non-interacting RNA pairs, the system will distinguish functional mechanisms from background noise, increasing design accuracy. Through collaborations within the CRC VirusREvolution consortium, the neural network will be enhanced with diverse datasets, improving its generalisability across different phage-host systems following the overall goals G2 and G3.
The resulting platform will provide the foundation for programmable RNA therapeutics with high specificity, adaptability, and reduced likelihood of resistance. It will also establish general principles for RNA-mediated antiviral defence that can be leveraged across different organisms and virus families. From a broader perspective, this project bridges computational and experimental biology to tackle one of the central challenges in virology and synthetic biology – how to rationally design molecules that can interfere with evolving virus systems. This strategy goes beyond classical design principles and opens avenues for responsive, data-driven synthetic biology. Taken together, the proposed work will (a) advance our understanding of virus adsorption, entry, replication, and escape; (b) support the long-term goal of intelligent, programmable, and adaptive biological interventions; and (c) provide novel intervention strategies targeting viruses at the RNA level. This aligns closely with the overarching research goals of the CRC: understanding of virus evolution (G2), virus-host interactions (G3), and the mechanisms of virus infection (G4).

Project Leaders

Prof. Dr. Manja Marz

Faculty of Mathematics and Computer Science,
Friedrich Schiller University Jena,
Director European Virus Bioinformatics Center

Prof. Dr. Kai Papenfort

 Institute for Microbiology