There is a critical need for essential bioinformatic and photonic tools to comprehensively characterize viruses, predict their pathogenic potential, uncover virus tropism, and study their fast adaptation and diversity. The COVID-19 pandemic – as a good example of real-time evolution – has highlighted our deficiency in fundamental data and tools related to viral genomes, host transcriptomes, virus phylogenetics, morphology, and surveillance, emphasizing the immediate need for improved preparedness in the face of future viral outbreaks.
Tools for virus genome identification and assembly of quasispecies.
Inferring viral genetic diversity in mixed samples from deep coverage sequencing data remains a major challenge. Effective viral haplotype reconstruction tools require unique haplotypes, adequate read length, and sufficient coverage. Therefore, there is an urgent need for a unified workflow that streamlines the various processing steps involved in viral diversity studies and facilitates the daily work of clinicians and virologists. In addition, targeted hypermutation processes result in substantial genomic variation, but the computational tools required to analyze these mutation patterns within quasispecies remain lacking.
Tools for virus genome annotation.
Annotation of viral genes may appear straightforward due to the relatively low gene content of viruses. However, this task is significantly complicated by a variety of features common to viruses or virus families, including polycistronic mRNAs, overlapping genes, splicing, internal ribosome entry sites, and multiple AUG triplets upstream of the open reading frame. Regrettably, there is currently no universally accepted standard pipeline for annotating viral genomes and existing annotation pipelines are generally tailored to specific virus families.
Tools for virus genome alignments.
The rapid development of high-throughput technologies has opened opportunities to address entirely new questions, approaches, and research topics in virology. Unfortunately, existing multiple sequence alignment (MSA) tools, developed primarily for bacteria and higher eukaryotes, encounter challenges when applied to viral data due to the unique characteristics of viruses, such as their small genome size and considerable diversity within individual populations. In addition, the structure of RNA is often considered more crucial than the actual sequence itself. Consequently, it is insufficient to rely only on sequence information within an MSA; it is important to also consider potential RNA-RNA interactions. While these structure-guided alignments are invaluable, they involve significant computational costs and are generally limited to processing short sequences.
Tools for virus phylogeny.
Virus phylogeny studies employ almost exclusively general-purpose tools that have not been tailored to the unique features of viral genomes. Available tools have significant scalability issues when confronted with viruses that have millions of sequenced isolates. At best, they provide only a rough approximation of the complexity within viral populations and have difficulty coping with the exceptionally large genetic distances that occur in viral phylogeny. In addition, mechanisms that are rare exceptions and can be safely ignored for phylogenetic analyses in cellular systems are far more abundant and non-negligible in viruses. These mechanisms include frameshifts, codon skipping, ribosomal shunting, leaky scan motifs, superimposed reading frames, and the influence of RNA secondary structure.
Tools for host response: Transcripts.
Our understanding of the transcriptome responses within the direct host cells of viruses remains somewhat limited. To identify shared and unique molecular mechanisms during the initial stages of infection, it becomes essential to examine a broader spectrum of viruses within their primary host cells. Nonetheless, effectively distinguishing between transcriptomic variations attributable to the virus and those stemming from the host is a non-trivial task, necessitating further advancements in statistics and computational methods.
Tools for host response: Proteins.
The cellular phenotype present during infection depends on the factors and constitutions in which the virus impacts its host. Single-cell mass spectroscopy (scMS) allows the quantification of approximately 1,000 proteins per cell and provides the ability to detect heterogeneity and cell-specific proteins, which can serve as an initial basis for further investigation. A significant advance in viral infection research would be an approach that not only comprehensively characterizes the proteome, but also captures the entirety of the cellular response, providing a holistic view of how the host responds to viral infection.
Tools for host response: Metabolites.
Metabolites as an intermediate or end product of metabolism are an important indicator of the cellular host response to viral infection. To the best of our knowledge, all previous studies restrict their attention to primary metabolites and a few, structurally restricted lipid classes.
Tools for optimizing antiviral strategies.
While there are numerous instances of successful antiviral development, current approaches face limitations, primarily due to the structural complexity of viral proteins. To strike a balance between this complexity and computational efficiency, modeling techniques rely on various approximations. One common approach is the design of sequences on a single, unchanging protein backbone. While these approximations facilitate the computation of molecular structures within manageable timeframes, the efficacy of design outcomes is modest. Consequently, a significant amount of in vitro screening and validation is necessary to create antiviral agents with the desired properties.
Tools for surveillance.
In response to the SARS-CoV-2 pandemic, numerous countries have established systematic genomic surveillance initiatives, resulting in an unprecedented increase in genomic data, with tens of thousands of sequences being deposited every day. There is a need for automated procedures to identify emergent viral lineages with altered phenotypes, detect de novo lineages and amino acid changes under selection, and statistical evaluations to interpret the data. While methods for detecting positively selected lineages and antigenic variants have been developed, there is a need to improve them for rapid real-time analyses and extend their capabilities to process the extremely large datasets that are currently being generated. Further, we urgently need to generalize methods to other viruses, in particular potentially emerging, future pathogens.
Tools for analyzing virus morphology.
Virus morphology is ideally directly visualized using microscopy techniques. However, these techniques come with limitations and challenges, such as sophisticated fixation and labeling protocols, the danger of phototoxic effects, or increased background signals.