Large-scale genome sequencing reveals the driving forces of viruses in microalgal evolution
Authors: David R. Nelson, Khaled M. Hazzouri, Kyle J. Lauersen, Ashish Jaiswal, Amphun Chaiboonchoe, Alexandra Mystikou, Weiqi Fu, Sarah Daakour, Bushra Dohai, Amnah Alzahmi, David Nobles, Mark Hurd, Julie Sexton, Michael J. Preston, Joan Blanchette, Michael W. Lomas, Khaled M.A. Amiri, Kourosh Salehi-Ashtiani
Source: Cell Host & Microbe (2021)
DOI: 10.1016/j.chom.2020.12.005
Topics: microalgal genomics viral integration and endogenous viral elements comparative genomics halotolerance and environmental adaptation horizontal gene transfer protein family evolution freshwater versus saltwater microalgae algal phylogenomics giant viruses gene ontology enrichment
Abstract
Being integral primary producers in diverse ecosystems, microalgal genomes could be mined for ecological insights, but representative genome sequences are lacking for many phyla. We cultured and sequenced 107 microalgae species from 11 different phyla indigenous to varied geographies and climates. This collection was used to resolve genomic differences between saltwater and freshwater microalgae. Freshwater species showed domain-centric ontology enrichment for nuclear and nuclear membrane functions, while saltwater species were enriched in organellar and cellular membrane functions. Further, marine species contained significantly more viral families in their genomes (p = 8e–4). Sequences from Chlorovirus, Coccolithovirus, Pandoravirus, Marseillevirus, Tupanvirus, and other viruses were found integrated into the genomes of algal from marine environments. These viral-origin sequences were found to be expressed and code for a wide variety of functions. Together, this study comprehensively defines the expanse of protein-coding and viral elements in microalgal genomes and posits a unified adaptive strategy for algal halotolerance.
Summary
This study presents the sequencing and analysis of 107 new microalgal genomes spanning 11 phyla, sourced from culture collections (UTEX, NCMA) and a novel isolate collection from New York University Abu Dhabi. Combined with 67 previously available algal genomes, the resulting dataset of 174 assemblies was used to conduct large-scale comparative genomics across microalgae from diverse environments, including saltwater, freshwater, and euryhaline habitats. Protein family (PFAM) domain counts were analyzed using Pearson's correlation, hierarchical clustering, and dimensionality reduction methods, revealing that microalgae cluster by environmental habitat rather than strict phylogenetic affiliation, indicating convergent functional evolution across distantly related lineages.
A central finding of the study is the systematic identification of over 91,757 viral family domain-containing coding sequences (VFAM-CDSs) across the algal genome collection, representing endogenous viral-origin proteins (EVOPs) derived from Chlorovirus, Coccolithovirus, Pandoravirus, Marseillevirus, Tupanvirus, and other virus families. Marine microalgae harbored significantly more viral family domains than freshwater species (p = 8e–4). Transcriptomic data from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) confirmed that the majority of these VFAM-containing sequences are actively expressed. Saltwater-specific EVOPs were enriched for ion transporter and membrane-related functions, while freshwater EVOPs were more diverse and enriched for sugar and amino acid metabolism, consistent with relaxed selection in freshwater environments.
The study further demonstrates that each microalgal phylum carries a distinct repertoire of viral-origin sequences, and that species from shared ecological niches—such as open ocean picoeukaryotes or coral-associated dinoflagellates—cluster together based on their VFAM profiles irrespective of deep phylogenetic relationships. These patterns suggest that viral sequence acquisition has repeatedly shaped niche-specific biological processes in microalgae, including the reinforcement of cellular membranes in saltwater lineages. The data and assemblies are made publicly available, providing a resource for future ecological, evolutionary, and biotechnological research on microalgae.
Key Findings
- 107 new microalgal genomes from 11 phyla were sequenced, substantially expanding the available genomic resources for microalgae.
- Marine microalgae contained significantly more viral family (VFAM) domains in their genomes than freshwater species (p = 8e–4), with sequences from Chlorovirus, Coccolithovirus, Pandoravirus, Marseillevirus, and Tupanvirus identified in algal genomes.
- Over 91,757 viral family domain-containing coding sequences (VFAM-CDSs) were identified across 184 algal genomes, and transcriptomic data confirmed that a majority of these are expressed under natural conditions.
- Saltwater species showed convergent enrichment in membrane-related protein families and ion transporter functions, while freshwater species were enriched in nuclear and nuclear membrane-related protein families, suggesting environment-driven functional divergence.
- Each microalgal phylum harbored a distinct collection of viral-origin sequences, and species sharing environmental niches clustered together by VFAM domain counts regardless of phylogenetic affiliation, indicating niche-driven viral sequence acquisition.
Methods
- Whole-genome sequencing using short reads, long reads, and linked reads
- De novo genome assembly (Platanus, ABySS)
- BUSCO genome quality assessment
- Hidden Markov Model (HMM)-based protein family (PFAM) and viral family (VFAM) annotation
- Pearson's correlation and hierarchical bi-clustering of domain count arrays
- t-distributed stochastic neighbor embedding (tSNE) and UMAP dimensionality reduction
- Gene ontology (GO) enrichment analysis
- BLASTP-based contamination screening
- Transcriptomic validation using MMETSP data
- Phylogenetic analysis of viral-origin genes
Organisms
Microalgae (diverse phyla), Chlorophyta, Ochrophyta, Rhodophyta, Haptophyta, Cercozoa, Dinophyta (Myzozoa), Euglenophyta, Heterokonta, Streptophyta, Chromerida, Cryptophyta, Pelagophyta, Emiliania huxleyi, Micromonas pusilla, Fragilariopsis cylindrus, Chlamydomonas, Chlorella autotrophica, Chlamydomonas nivalis, Porphyridium purpureum, Porphyra umbilicalis