Project Area B

Tools for Virus Interaction and Structure

Project Area B develops computational methods to understand viruses from molecular structures, functions, and interactions—linking RNA, proteins, and metabolites to reveal how viruses infect cells and how hosts respond.

Why this matters

Virus infection is driven not only by sequences, but by molecular structures and interactions
Small structural changes in:
- RNA
- Proteins
- Metabolites
  can lead to dramatic changes in viral behaviour and immune recognition
Metabolites, especially secondary metabolites, are essential for understanding how viruses reshape host biology
RNA structures determine how viruses trigger innate immunity
Protein surface structures control vaccine effectiveness and immune escape
Combining structural and molecular data enables a systems-level understanding of virus–host interactions

Why now?

Petabase-scale virus discovery from global high-throughput sequencing
Rapid expansion of virus genome resources through NFDI4Microbiota
Full viral haplotype sequencing is now technically feasible
These advances make it possible to analyse viral evolution and host interaction at the level of complete viral populations

Methods used

High-throughput sequencing
Phylogenetics
Machine learning
Infection models
Transcriptome and expression analysis
RNA secondary-structure analysis
NFDI4Microbiota data infrastructure

Tools and methods to be developed

qs-SVG (quasispecies sequence-variation graph)
Quasispecies and haplotype reconstruction
Accurate quasispecies-aware phylogenetic reconstruction
Full virus genome alignment construction
Advanced virus genome annotation
Virulence and host-range prediction
Hybrid dry-lab / wet-lab neural networks

Chemical mediators of virus infection

Metabolites are small molecules that participate in, and arise from, cellular metabolism. They span all chemical classes including sugars, amino acids, nucleotides, lipids, oxylipins and oxidised lipids, and many more. Metabolites show extensive structural diversity that is not dictated by polymeric templates. Beyond endogenously synthesised compounds, the metabolome also includes exogenous molecules and their biotransformation products. These metabolites can mediate interactions between cells and organisms of all phyla. The diversity, dynamic production, and turnover of metabolites make their analysis highly challenging. Because metabolites are key to understanding biological processes, including resistance to infection, their study is highly rewarding.

Virus infections are associated with substantial rewiring of the host metabolome. For example, viruses incorporate host metabolites into their own structures, thus repurposing compounds for virogenesis. In addition, viruses alienate host lipids for replication, and lipid droplets may support persistent propagation of virus infection. The host may defend itself against virus infection using a wide range of metabolites, including small cyclic nucleotides or modified nucleotides. In addition, (modified) lipids and other metabolites are often produced by an organism as part of the immune response. During these processes, infected cells and organisms can substantially alter their physiology and ecological function within a community.

In this project, we will establish an experimental and computational platform for untargeted metabolomics that allows us to monitor the changes in metabolism induced by a virus infection, Fig. B01.1. We will develop experimental and computational methods that cover a broad range of small molecules. We will place a special focus on (modified) lipids and cyclic nucleotides in response to virus infections. These compound classes are central to virus infection processes, but notoriously difficult to investigate using current computational approaches. We will optimise analytical and computational methods side by side. Our computational methods will be made available via the well-established and frequently used SIRIUS platform from the Böcker lab, and also integrated into the joint computational platform of the CRC VirusREvolution. We will use our platform to unravel intrinsic and induced metabolomic properties of virus infections and to monitor the associated, even subtle, changes in metabolism over time.

Our metabolomics approaches will enable the ecolog- ical and pathological monitoring of virus infections. The emerging metabolic patterns will be linked to the transcriptomics and imaging platforms of this CRC VirusREvolution (G1). Bioassay-guided, functional ver- ification of dysregulated metabolites and pathways will be analysed together with Z03 (Fröhlich/Höppener/ Reiche)8. The combination of the ecometabolomics approach with advanced computational methods, both existing and newly developed as part of this project, will enable the annotation of an unprecedented diversity of metabolites relevant in these interactions, Fig. B01.1. The resulting annotations will open new perspectives on virus mechanisms beyond primary metabolism1. All data and results from the project will be collected and made available through the VirJenDB by NFDI4Microbiota, see Z02 (Barth/Cassman/Gerlach/König-Ries).

It must be understood that studying the metabolomic response of a virus infection is not as well established as studying virus genomes. Consequently, part of this project will be to establish protocols, both on the experimental and the computational side, on how to carry out metabolomic analysis. We will publish established experimental and computational protocols to benefit the community. Our experimental and computational framework will be open to the investigation of all infection systems within this CRC, including phage infections. We will need the emerging large body of data to optimise experimental protocols and to get started with computational methods development.

Project Leaders

Prof. Dr. Sebastian Böcker

Prof. Dr. Georg Pohnert

RNA determinants of the antiviral innate immune response

Virus RNAs trigger the innate immune response, which is able not only to distinguish foreign RNA from the cell’s own diverse complement of RNA molecules, but also varies among virus pathogens. Macrophages act as the first line of defence against invading viruses and are pivotal for the innate immune response, crucially regulating the entire inflammatory process during virus infections. Single- or double-stranded RNA viruses are recognised by several classes of RNA-binding proteins, in particular, the RNA-dependent toll-like receptors (TLR), TLR-7 and TLR-8, the RIG-I-like receptors (RLR) RIG-I, MDA5, LGP2, and their homologues, and the double-strand sensing protein kinase R (PKR). While individual RNA ligands of these sensory proteins have been well studied, a comprehensive and comparative understanding of their evolutionary and structural diversity across virus groups remains lacking.

In our project, we therefore aim to systematically determine the key features of both RNA sequence and RNA structure required for binding to these sensory proteins and for the subsequent activation of the innate immune response. We aim to make this connection actionable by developing a comprehensive predictive toolkit, RNAinnate, designed to predict the tempo and mode of activation of the innate immune system from the virus RNA sequence. In detail, we will first investigate the specificity of RNA-protein binding for each purified RNA sensor in vitro, using randomised synthesised RNA libraries and employing cross-linking and immunoprecipitation sequencing methods (CLIP-seq). We will then transfect cells with distinct RNA sequences to validate our findings on the purified RNA sensors using electroporation- and DharmaFECT-based transfection. We will establish a transfection protocol using human lung epithelial cells (A549 cells and Calu-3 cells) and subsequently apply this technique to human primary macrophages. We will thoroughly examine the innate immune response, in particular the subsequent phosphorylation of downstream kinases of the specific RNA-binding proteins in the lung epithelial cells and in the macrophages. Afterwards, we will infect lung epithelial cells and human macrophages with SARS-CoV-2 to investigate virus-specific RNA-protein binding. Ultimately, we will infect cells in parallel with intact SARS-CoV-2, Influenza A virus (IAV), and respiratory syncytial virus (RSV) to map the previously identified triggering elements within their natural genomic contexts. In order to

determine characteristics of binding RNAs, we will evaluate enrichment and depletion of features such as secondary structure elements and local sequence motifs, as well as assess the distribution of folding energies. Moreover, we will employ unsupervised clustering techniques on these features, and combine the results of the different methods to extract descriptors of binding patterns in the form of covariance models and Bayesian descriptors similar to Dimont. Besides the determination of the RNA sequences, we will analyse the transcriptomic, metabolomic, and lipidomic changes of human primary macrophages related to the specific RNA sensors that occur following distinct RNA recognition. Finally, we will uncover evidence of selection pressure that removes or attenuates the individual RNA trigger elements for the innate immune system, providing insights into the mechanisms of RNA genome evolution. To achieve the latter goal, scalable, high-quality alignments of virus genomes are required that can combine inter-species comparison with information of strain-level variations. As no such tool exists, we will fill this gap with VirAligner, using novel combinations of existing approaches in comparative sequence analysis.

Project Leaders

Dr. Paul Jordan

Prof. Dr. Peter Stadler

Uncovering virus glycoprotein conformational dynamics for rational vaccine design

Structure-based prediction and design have made tremendous progress over the last thirty years with Rosetta and, especially, since 2021, when AlphaFold2 was released. This progress to achieve the holy grail of computational structural biology was rewarded with the Nobel Prize for these two algorithms in 2024. The accurate prediction of the protein structure is the first step towards an in silico first strategy to design vaccines and antibodies. Structure-based modelling will also provide estimates for host-receptor interactions, such as antibody-antigen complexes. With this information, the missing link between sequence and function can be filled. Here, we are developing methods to predict structures of virus glycoproteins that will allow us to assess emerging virus variants. With structure-based methods, we will investigate the impact of the observed mutations in virus glycoproteins on the structure and the respective conformational states. This is critical for understanding the conformational space that mediates the fusion process. One major step to overcome here is the lack of predictive power of AI tools for structure prediction to provide pre- and post-fusion receptor states. We will subsequently predict the effect these mutations have on the structure. Together with the consortium partners, we will investigate whether our methods predict the effect and the respective function of the virus glycoprotein (variants). The major virus-host interaction we will study is the interaction with the immune system, with the spike protein being the major target of the humoral response to SARS-CoV-2. Antibodies, as major determinants of this immune response, are useful research tools and therapeutics but are challenging for structure prediction and design. Moreover, antigen-antibody interactions are inherently hard to predict and evaluate, even with emerging AI tools, due to the lack of data and the complexity of the molecular interaction. Here, we will overcome these limitations and develop a new tool termed ANNtibody that takes atomic and electron density calculations into account. Thus far, these methods could not be employed for systems with more than a couple dozen atoms, but the training of AI on electron density data or on Density Functional Theory (DFT) calculations circumvents these resource-limited steps. Data from NFDI4Chem and its associated repositories will be used to benchmark these.

With this increase in resolution, we hypothesise that our method will capture the complex interaction network in the antibody-antigen interface more accurately. These calculations will be used to predict antibodyantigen interactions, which we will challenge with the experimental design of antibodies for emerging SARSCoV-2 variants using antibody interactions with the highly variable receptor-binding domain. With our method, we will update these antibody sequences and test experimentally whether we can overcome virus escape. Subsequently, we will design epitope-focused immunogens based on SARS-CoV-2 epitopes that will elicit broadly neutralising antibody populations. All experimentally obtained data will be used to refine the developed methods. Data provided by C02 (Deckert/Deinhardt-Emmer) on virus-host interactions, by A02 (Friedel/ Kühnert) and NFDI4Microbiota on virus sequences, and by B04 (Dittrich/McHardy) on surveillance will be essential for training and optimising our tool. These datasets will be integrated with our structural features, enabling our tool to generate predictions that will directly inform and support our partner projects. Altogether, we will generate computational structure-based tools that will help answers goals G1 and G3, providing insight into the host-receptor interaction. These tools will allow us to rapidly fight emerging virus infections and tune these tools for vaccine design, probing our ability to generate a broadly neutralising vaccine.

Project Leaders

Prof. Dr. Jens Meiler

Jun.-Prof. Dr. Clara Schoeder

Linking macroscopic evolution with molecular processes for rapidly evolving virus pathogens via data-driven inference and simulations

Virus pathogens such as SARS-CoV-2 and human Influenza A viruses are single-stranded RNA viruses with substantial capacity to mutate and to adapt to the human host for more efficient replication and spread. A multitude of factors affect the evolutionary patterns left in their genomes, such as adaptation to changing host immunity or for more efficient replication, phylogenetic spread, as well as uncharacterised processes on the cellular level. Continuous changes in the surface antigens of these viruses allow them to evade host immunity developed through either prior infection from previous strains or from vaccination. This capacity of a virus, known as immune escape, facilitates the reinfection of individuals. Consequently, vaccines protecting against such viruses need to be frequently updated to maintain their effectiveness against circulating variants. We hypothesise that our understanding of the complex interplay of these various processes from large-scale virus genome data can be improved by careful analysis and deconvolution with tailor-made computational techniques. This improved understanding of virus evolution will make it even more predictable on the population level and facilitate the early identification of future emerging, antigenically altered variants of concern for public health.

We have recently developed techniques that allowed us to predict the emergence of relevant variants of SARS-CoV-2, as reported by the World Health Organization (WHO), substantially prior to this classification and to their reaching their maximal abundances. We are also able to identify lineages with substantial antigenic alterations, which can inform considerations regarding vaccine strain updates. In this project, we will combine data-driven analytics of population-level virus diversity with molecular modelling across scales to link macroscopic virus evolution on a population level to molecular processes within the cell. By combining data-driven surveillance and simulation, we will be able to study evolutionary and epidemiological phenomena in both data and models, see Fig. B04.1. These include: (1) Developing approaches for early detection and further characterisation of antigenically or otherwise phenotypically altered lineages identified by the WHO as Variants of Concern (VOCs) via virus genomic surveillance (G3). Early detection methods for identifying antigenically altered lineages classified by the WHO as concerning, of interest, or under monitoring have recently been developed in the McHardy lab.

We will extend this approach to predict combinations of amino acid changes driving future predominant lineages, enabling earlier detection of potential VOCs than current methods. (2) Developing a multi-scale simulation platform consisting of (i) a micro-level – simulating virus replication within a cell; and (ii) a macro-level – simulating virus evolution. At the micro level, we will develop a new type of rule-based description language, including RNA and dynamical compartments. The rule-based replication cycle model will allow us to trace a mutation through the replication cycle. This trace helps to clarify the putative effects of the mutation on the dynamics of the replication cycle, and also to explain how the mutation could affect the fitness of the virus. (3) Combining the results from all work packages to disentangle the contributions of genetic drift, antigenic drift, and currently uncharacterised processes on the genetic diversity of circulating lineages and to study the evolutionary role of the “not-yet-explained” mutations that influence the replication cycle of the virus within the host cell.

Project Area B

Tools for Virus Interaction and Structure

Why this matters

Why now?

Methods used

Tools and methods to be developed

Chemical mediators of virus infection

Project Leaders

Prof. Dr. Sebastian Böcker

Prof. Dr. Georg Pohnert

RNA determinants of the antiviral innate immune response

Project Leaders

Dr. Paul Jordan

Prof. Dr. Peter Stadler

Uncovering virus glycoprotein conformational dynamics for rational vaccine design

Project Leaders

Prof. Dr. Jens Meiler

Jun.-Prof. Dr. Clara Schoeder

Linking macroscopic evolution with molecular processes for rapidly evolving virus pathogens via data-driven inference and simulations

Project Leaders

Prof. Dr. Alice Carolyn McHardy

Prof. Dr. Peter Dittrich