INTRODUCTION
The human gut microbiome, which consists of multi-kingdom species such as bacteria, fungi, and viruses, plays a non-negligible role in its coevolution with humans. Evidence mounts that the gut fungal community is a small but crucial component of the gut ecosystem, contributing to human health
1,2. The gut fungi or mycobiota are interdependent with bacterial composition and function, which may allow a rapid and coordinated response to external stimuli, such as food intake and medication use
3,4. Beyond dietary and environmental factors, host characteristics such as genetics, age, ethnicity, and physiological states contribute to variation in the gut microbial community
1. Despite previous evidence of host genetic effects on the gut bacteria, little is known about how the host genetic architecture influences the composition of gut mycobiota
5-10.
The intricacies of the gut mycobiota extend beyond mere diversity. Gut fungi also engage in symbiotic and antagonistic interactions, influencing the overall microbial balance and orchestrating the host immune homeostasis
11,12. Some resident gut fungi are known to facilitate the digestion of complex carbohydrates, thereby contributing positively to host nutrition and gut health
13,14. However, on the flip side, certain fungal species have been associated with adverse health conditions, such as inflammatory diseases
15, metabolic
16 and neurological disorders
17, and cancers
18. Therefore, unveiling the genetic determinants influencing gut mycobiota could provide insights into disease susceptibility, treatment responses, and even potential therapeutic interventions.
Here, we created a conjoint genetic atlas of the human gut fungi (n = 7,350 individuals) and bacteria (n = 7,935 individuals) in the Han Chinese population from five independent cohorts. By integrating host genetic data from peripheral blood leukocytes with gut microbial profiles, we discerned the host genetic architecture affecting both microbial kingdoms. Our colocalization analysis with eQTL data from 28 immune cell types indicated that host genetic effects on gut microbial traits were related to immune-cell regulatory variation across 27 of these cell types. Subsequent tissue-specificity analysis and magnetic resonance imaging (MRI) assessments suggested a potential gut–brain connection involving Eremothecium. Building on these observations, we applied advanced protein language models, single-cell transcriptomics and functional assays to explore the fungal metabolic potential and immune relevance. Furthermore, through Mendelian randomization (MR) analysis, we have elucidated the causal links of multi-kingdom fungal and bacterial microbiota with complex host clinical phenotypes.
RESULTS
Host genetic associations of gut fungi and bacteria
To perform genome-wide association studies (GWAS) of gut fungi and bacteria, we profiled the gut fungal and bacterial communities using the internal transcribed spacer 2 (ITS2) and 16S ribosomal RNA (rRNA) sequencing, respectively, in five Han Chinese cohorts (Fig. 1a; Supplementary Fig. S1). The choice of ITS2 was strategic, as it offers superior fungal taxa coverage compared to metagenomic approaches, ensuring a more comprehensive and accurate characterization of the fungal components of the gut microbiome. These five cohorts were THSBC (
n = 4,464/4,455; fungi vs bacteria), ZMSC (
n = 1,126/1,151), GNHS (
n = 1,159/1,683), WeBirth (
n = 435/453), and WePN-DA (
n = 166/193) (Fig. 1a; Supplementary Table S1). The gut fungal composition across all cohorts was consistently dominated by
Ascomycota and
Basidomycota, aligning with observations from Western and other Asian cohorts
2,15,19 (Supplementary Fig. S2a, b). For gut fungi, we included 209 features (α-diversity indices and microbial taxa) in the abundance-based analysis, using GCTA-MLMA (mixed linear model-based association analysis)
20,21. For the presence-based analysis, 195 taxonomic features were included using the logistic regression model (Materials and Methods; Supplementary Table S2). As for the bacterial GWAS, 301 and 246 features were accordingly included in the abundance-based and presence-based analysis. We combined the GWAS results from the five cohorts via fixed-effect meta-analysis to explore the genetic effects on the abundance or presence/absence of the microbial taxa from both gut fungi and bacteria.
We identified a total of 114 host-variant-fungal associations involving 89 lead variants and 65 fungal taxa at genome-wide significance (P < 5 × 10–8; Materials and Methods), among which 91 associations were identified from the abundance-based analysis and 23 from the presence-based analysis (Fig. 1b; Supplementary Table S3). For gut bacteria, 70% (57 out of 81) of host-variant-bacterial associations were from the abundance-based analysis, whereas others were from the presence-based analysis (Supplementary Table S4). We did not observe substantial genomic inflation (λGC) in our GWAS as the median values of λGC were below the common value of 1.05, indicating a benign performance of GWAS (Supplementary Fig. S2c). Through the leave-one-out analysis, we found that the exclusion of any single cohort did not substantially alter the overall effect sizes, further confirming the robustness of our meta-analysis findings (Supplementary Fig. S3a–c). Most of the lead variants (fungi: 81%; bacteria: 76%) were associated with a single taxon, while the others had pleiotropic associations with up to five fungal features or three bacterial features (Supplementary Fig. S3d–f). Particularly, the fungal genus Aureobasidiaceae sp and bacterial genus Alloprevotella had the largest number of host genetic associations (Fig. 1c, d), spanning multiple chromosomes and gene contexts, suggesting polygenic influences (Supplementary Tables S3, S4). Together, these results indicate a discernible human genetic control of both gut fungi and bacteria.
Among all the host-variant-fungal associations, four signals (
Auriculariales,
Rhynchogastremataceae,
Bulleribasidiaceae, and
Exophiala) surpassed the study-wide significance threshold (Materials and Methods; Fig. 1b; Supplementary Fig. S3g;
P < 2.2 × 10
–10). Since
Malassezia is a potential gut colonizer, we noted that it was associated with the lead variant rs10933647 (near
PPP1R2;
P = 2.9 × 10
–8), at locus 3q29. We also observed that the genus
Pichia was positively associated with the G-allele of the intergenic variant rs138858634 (near
PRKD1;
P = 4.6 × 10
–8; Supplementary Table S3). Compared with fungi, gut bacteria had none of the signals that reached study-wide significance (
P < 1.7 × 10
–10). Nonetheless, as part of our exploratory findings, we found a genome-wide-significant association of
Streptococcus with the intronic variant rs4802863 located in the
FPR2 gene locus (
P = 2.2 × 10
–8; Supplementary Table S4).
FPR2 encodes an intriguing transmembrane protein N-formyl peptide receptor 2 that is part of the G-protein coupled receptors (GPCR) family
22. This receptor is known to orchestrate pro- or anti-inflammatory responses based on its various ligands
23. This finding implies a potential role for
FPR2 in establishing host-microbiome homeostasis, possibly through mechanisms such as neutrophil activation, which are central to the modulation of inflammation. To further contextualize host genetic effects, we examined direct associations between major dietary variables and microbial taxa using linear mixed-effects models across five cohorts. We observed that multiple bacterial and fungal taxa were significantly associated with dietary components including meat, vegetables, fruits, cereals, and dairy intakes (
q < 0.05; Supplementary Table S5). Given that the findings for gut fungi or bacteria may be confounded by diet, we performed sensitivity analyses by rerunning the above GWAS, after further adjusting for dietary intakes in the model (Materials and Methods). The majority of the associations remained consistent following the adjustment (Supplementary Table S6).
To assess the reproducibility of our findings from fungal GWAS, we conducted a replication analysis in an independent Chinese cohort (
n = 766; Supplementary Table S1), and this validated 10 out of 79 significant host-fungal associations (Supplementary Table S7). For instance, the presence of the fungal genus
Sterigmatomyces was positively associated with the A-allele of the variant rs558418653, and the abundance of
Penicillium was positively associated with the A-allele of rs6580094 (both
P < 0.05; Supplementary Table S7). We further validated our findings using the MiBioGen consortium
5, focusing on host-bacterial associations identified in our study. Of the 30 associations available for testing, two were successfully replicated with consistent effect directions (Supplementary Table S8). Specifically, the abundance of
Bifidobacterium was positively associated with the T-allele of rs17435790, while
[Eubacterium] coprostanoligenes group was negatively associated with the G-allele of the variant rs3779970 (both
P < 0.05; Supplementary Table S8). Additionally, only one association from the MiBioGen study
5 could be confirmed in the current study:
Enterorhabdus abundance was inversely associated with the T-allele of rs11098863 (nearest gene:
RBM48P1;
P < 0.05). The limited replicability might be attributed to genetic population differences, environmental variations, or disparities in sample size. Despite these challenges, the results highlight the importance of studying underrepresented fungal taxa in unique populations and underscore the need for larger, more diverse cohorts to validate findings across populations.
Distinct genetic architecture of gut fungi and bacteria
For both gut fungi and bacteria, we observed an inverse relationship between the effect size and the minor allele frequency (MAF) of the associated genetic variants (MAF > 0.01; Fig. 2a). Over half of the identified host variants for fungi were located in the intergenic regions, whereas most bacteria-associated host variants were located in intronic regions (Fig. 2b). In addition, we identified four host loci at which both fungal and bacterial GWAS signals were detected (Fig. 2c). To further explore potential genomic overlaps, we performed a Bayesian colocalization analysis that identified two colocalization signals (shared causal variants (probability of hypothesis 4, PP4) > 0.8) and 39 likely colocalization signals (PP4 > 0.5; Supplementary Fig. S4a). For instance, we observed that GWAS signals for
Tilletia and
Butyricicoccus were likely colocalized at the 4:174022037 locus (nearest gene:
GALNTL6). GALNTL6 belongs to a highly conserved family of proteins responsible for the synthesis of mucin-type O-glycans, which are essential components of mucins expressed across various mucosal sites, particularly in the bacteria-rich intestinal tract
24. These glycoproteins play a crucial role in stabilizing the mucus barrier and modulating microbiota composition
25. Our results suggest that host genetics broadly influence the composition of human gut microbial community, and highlight the distinct genetic determinants of each microbial kingdom.
To explore the putative functional similarity of the prioritized genes from the fungal and bacterial GWAS (Materials and Methods), we inferred the enriched gene ontology (GO) cluster networks using Metascape
26. The results highlighted a diversity of enriched GO terms unique to each kingdom. Specifically, genes identified from the fungal GWAS were enriched in pathways such as the regulation of T cell activation and activation of protein kinase activity (Fig. 2d). In contrast, bacteria GWAS-derived genes were enriched in GO terms related to central nervous system (CNS) and adenylate cyclase-modulating GPCR signaling pathway (Fig. 2e). Although both kingdoms also shared higher-level biological processes, such as response to stimulus and metabolic process (both
P < 0.001), the most significant enrichment for gut fungi-associated genes was observed in immune system process (
P < 1.0 × 10
–6; Supplementary Fig. S4b, c).
Heritability of the gut fungi and bacteria
We then performed a heritability (
h2) estimation of gut fungal and bacterial features using GCTA in the THSBC cohort because of its large sample size
20,27. Heritability here refers to the proportion of phenotypic variance in microbial traits that can be explained by host genetic variation, specifically narrow-sense heritability reflecting additive effects of common SNPs. Overall, we observed that 31 gut fungal features (25 taxa and Shannon index) and 115 bacterial features (90 taxa and two α-diversity indices) showed evidence for a significant estimate of heritability (
Pnominal < 0.05), with
h2 values ranging from 0.11 to 0.25 (Supplementary Table S9). The highest heritability values were observed for the fungal order
Microascales (
h2 = 0.24;
P = 1.5 × 10
–4) and the bacterial family
Anaerofustaceae (
h2 = 0.25;
P = 2.8 × 10
–4; Fig. 2f). Through heritability partitioning analysis, we discovered that gut fungal traits displayed significant enrichment of heritability in ncRNA intronic regions and DNase I hypersensitive sites (DHSs). In contrast, the majority of gut bacterial features had a significant heritability enrichment in transcription factor binding sites (TFBSs), intronic region, and DHSs (Supplementary Table S10). We also noted that despite the varying number of heritable features between fungi and bacteria, the overall heritability estimates for these microbiota components were broadly comparable. This pattern indicates the potential for a shared genetic foundation governing the presence and diversity of fungi and bacteria within the gut ecosystem.
Cell-type specific immune system involvement in host–microbe interaction
Given that immune cells are widely recognized as key mediators of host–microbiome interactions, we sought to investigate whether host-microbiome GWAS signals shared causal regulatory variants with immune-cell eQTLs using Bayesian colocalization analysis. We identified 148 colocalization signals across 27 out of 28 immune cell types based on the data derived from human peripheral blood
28 (Supplementary Table S11). Among these, the GWAS signals of six fungal taxa were highly likely colocalized with eQTL signals for seven genes in various cell types (PP4 > 0.8; Fig. 3a). Specifically, GWAS signals for
Onygenales were colocalized with
IL6-AS1 eQTL signals in switched memory B cells (SM_B). Interestingly, these GWAS signals also colocalized with
HBS1L eQTL signals in various dendritic cell (DC) types, such as myeloid DC (mDC) and plasmacytoid DC (pDC). When considering 91 likely colocalization signals of fungi-eQTL at PP4 > 0.5, we also observed notable examples of genetic crosstalk. For instance, the GWAS signals for
Malassezia were colocalized with eQTLs for
TNK2-AS1 in T helper 1 cells (Th1) and the signals for
Aureobasidiaceae sp were colocalized with
HEXA eQTL in pDC (Supplementary Table S11).
For gut bacteria, we discovered that the GWAS association signals of twelve bacterial taxa showed strong colocalization evidence with multiple gene expression traits (Fig. 3a). In addition to the previously noted association of
Streptococcus abundance with the
FPR2 gene locus, our analysis also revealed the colocalization of GWAS signals for
Streptococcus and
Lactobacillales with
FPR2 eQTL signals across various immune cell types, all demonstrating high posterior probabilities (all PP4 > 0.99). This observation underscores the cell type-specific role of
FPR2 in the immune system. Previous studies have reported that serum amyloid A stimulates FPR2 in phagocytes, epithelial cells and T lymphocytes, leading to the production of inflammatory mediators
23. In contrast, lipoxin A4 binding to FPR2 could inactivate NF-κB signaling in neutrophils (Neu) and inhibit the production of proinflammatory cytokines and their proliferation
22. These results hint at a finely tuned regulation mechanism between gut microbiota and host immune responses at a cellular level.
Furthermore, in our comparison of colocalization signal counts across different cell types between gut fungi and bacteria, we observed distinct patterns (Fig. 3b). Notably, for gut bacteria, the highest number of colocalization signals was found in low-density granulocytes (LDGs) and Neu. In contrast, fungal signals were less frequent in these two cell types. The most abundant colocalization signals for fungi were predominantly observed in Th1 cells, highlighting a unique interaction profile between gut fungi and this specific immune cell subtype. Taken together, our findings point to intricate genetic interactions between these fungal and bacterial taxa and key human immune cell types, shedding light on the potential mechanisms of immune response modulation by gut microbiota.
Identification of the brain specificity for gut microbiota
To better understand the potential functional relevance of host genetic variants and the gut microbiota, we next used an unbiased approach to identify human tissues with preferential expression of genes mapped to these identified loci. Through tissue-specificity analysis using FUMA
29, we found significant enrichment for host genes expressed in the brain, such as the hypothalamus and frontal cortex Brodmann area 9 (BA9), for both microbial kingdoms (
q < 0.05; Fig. 4a; Supplementary Fig. S5a). In line with this, a previously reported large-scale gut bacterial GWAS also found that host genetic loci associated with gut microbial traits were enriched for genes expressed in the brain
5. In addition, growing evidence supports the bi-directional interactions between the CNS, the enteric nervous system (ENS), and the gastrointestinal tract
30. For example, gut microbe-derived serotonin could induce maturation of the adult ENS
31, while norepinephrine can stimulate proliferation of several strains of enteric pathogens
32.
To further explore the potential gut-brain axis, we examined the associations between gut microbiota and brain health in a subset of the GNHS cohort involving 501 middle-aged and elderly participants. Based on the brain MRI data, we extracted 116 regions of interest (ROIs) defined by the Automated Anatomical Labeling (AAL) atlas
33. The abundance of fungal genus
Eremothecium was positively associated with the grey matter density in the voxel-based morphometry (VBM) analyses, especially in the right superior frontal gyrus (medial;
q < 0.05, voxel > 10; Fig. 4b) and left thalamus (Fig. 4c). Conversely, we observed that the abundance of bacteria genus
Caproiciproducens was negatively associated with the grey matter density in thalamus (Supplementary Table S12). Using linear regression models, our data further showed that both the presence and abundance of
Eremothecium were positively associated with the extracted volumes of these brain ROIs after adjustment for potential confounders (Supplementary Fig. S5b). Given that the grey matter loss in the thalamus and frontal cortex is commonly associated with the progression of neurological diseases and cognitive decline in humans
34,35, these findings suggest that
Eremothecium may play a beneficial role in maintaining brain health.
Modulation of microglial inflammatory response by Eremothecium
To explore the potential links between
Eremothecium and brain health, we hypothesize that the effects of
Eremothecium may be linked to the production of riboflavin.
E. ashbyi is globally recognized for its role in riboflavin (vitamin B2) synthesis, with accumulating evidence suggesting that riboflavin has potential roles in neurological health, particularly in enhancing cognitive functions
36-39. However, it is crucial to acknowledge that while ITS2 sequencing offers extensive fungal taxonomic coverage, its ability to distinguish specific species or strains like
E. ashbyi is limited. Utilizing a comprehensive bioinformatic approach, we explored the functional potential of various
Eremothecium species in riboflavin biosynthesis. We first identified four gene homologs within the
Eremothecium species (
E. gossypii,
E. cymbalariae,
E. sinecaudum, and
E. coryli) that show the highest gene sequence similarity to those of the RIB4 and RIB5 enzymes of
Saccharomyces cerevisiae. These enzymes play a pivotal role in the riboflavin biosynthesis pathway (Fig. 5a, b). Based on the protein language model ESMfold, we also revealed a high degree of predicted structural similarity between the studied proteins and the RIB5 enzyme for all
Eremothecium species (all TM-score > 0.88; Fig. 5c; Supplementary Fig. S6a, b).
While all
SLC52A1,
SLC52A2, and
SLC52A3 are genes encoding riboflavin transporter proteins, our data indicate that
SLC52A2 is the primary isoform expressed in the human brain (Fig. 5d). Based on the Genotype-Tissue Expression (GTEx) database, we further found various differentially expressed genes associated with the
SLC52A2 expression level in human tissue (Fig. 5e; Supplementary Tables S13, S14). For example, in the thalamus,
TNFRSF9 was enriched in the
SLC52A2 lower group, while
PPP1R1B was more abundant in the
SLC52A2 higher group (
q < 0.001). This might offer insights into the immune regulation and neural signaling mechanisms influenced by riboflavin transport. Based on the single-cell transcriptomics data in lipopolysaccharide (LPS)-induced mice, we found that microglia had the highest expression of
Slc52a2 in the mouse brain (Fig. 5f, g). To delve deeper into the cellular mechanisms, we analyzed the
Slc52a2-related genes in 3,910 microglia cells in the mouse model. Our study showed that
Slc52a2 expression was positively associated with
Vezf1,
Milr1,
Dexi, and
Aldh3b1 in microglia (
q < 0.05; Fig. 5h). Specifically,
Vezf1 is known for its role in vascular and lymphatic system development, as discovered by a mouse study
40.
Milr1, encoding an Ig-like receptor Allergin-1, is implicated in modulating immune responses. Notably,
in vitro studies have demonstrated that Allergin-1 can inhibit Toll-like receptor 2-mediated activation of mast cells and reduce the production of inflammatory cytokines
41. Together, these observations support further investigation into the potential role of riboflavin and its transporters in neuroimmune processes.
To further validate the riboflavin-producing capabilities of
Eremothecium species, we quantified the riboflavin concentrations in our cultured fungal strains using liquid chromatography-mass spectrometry (LC-MS). Notably,
E. ashbyi demonstrated the highest riboflavin production when compared with other species, such as
S. cerevisiae and
Pichia kudriavzevii (Fig. 6a). Interestingly, despite the broadly similar global metabolic profiles observed across all strains (Fig. 6b; Supplementary Fig. S6c, d), significant variations in riboflavin concentrations were evident within the
Eremothecium species. A previous study reported that supplementation with flavin mononucleotide (FMN), an intermediate product of riboflavin, may have attenuated the pro-inflammatory TNFR1/NF-κB signaling pathway
37. In our cell model experiments, we discovered that the treatment with
E. ashbyi supernatant significantly reduced TNF-α and IL-6 levels while increasing IL-10 in the LPS-induced BV2 microglial cells (murine cell line;
P < 0.05; Fig. 6c, d). Through the transcriptomic analysis, we found that the whole transcriptome profile of
E. ashbyi with various doses of riboflavin resembled that of the control group, which is distinct from the LPS-treated group (Supplementary Fig. S7a). The treatment of
E. ashbyi downregulated
Nos2 while upregulated
Hmox1 gene expression (Supplementary Fig. S7b, c), which suggested a potential antioxidative effect. We also observed that the knockout (KO) of
Slc52a2 diminished the inhibitory effect of riboflavin on IL-6 expression (Supplementary Fig. S7d–f). However, the residual activity suggested that alternative pathways or other transporters may compensate for its role. While riboflavin downregulated certain gene expressions within the canonical NF-κB signal transduction pathway, such as
Il1a,
Il1b, and
Tlr3 (Supplementary Table S15), the precise temporal dynamics of upstream signaling components, such as p65 phosphorylation, warrant further high-resolution kinetic studies. Importantly, these results support a contributory role for riboflavin in modulating microglial inflammatory responses, while not excluding the involvement of additional metabolites or pathways. Based on these findings, we propose that
E. ashbyi may influence microglial inflammatory activity, potentially via the capacity of its metabolites (Fig. 6e). Further studies are required to define the underlying molecular pathways and to clarify the relevance of these interactions to neuroinflammatory conditions.
Causal evaluation of multi-kingdom microbiota with complex diseases and clinical phenotypes
Building on our exploration of the intricate roles of gut microbiota in brain health, we further expanded our investigation to include additional dimensions of microbial influence. Specifically, we applied MR analysis to assess the potential causal effects of multi-kingdom fungal and bacterial microbiota on 220 host clinical phenotypes. We observed that only the positive association between the presence of
Onygenales and mean corpuscular hemoglobin surpassed the threshold of
q < 0.05 (Fig. 7a). Notwithstanding, our results also suggest that the presence of
NK4A214 group may increase low-density lipoprotein cholesterol (
P < 2.0 × 10
–4; Supplementary Table S16). The bacterial genus
Bifidobacteriaceae sp had a protective effect against pancreatic cancer (PC; OR = 0.91,
P = 9.3 × 10
–3; Supplementary Table S16). In line with this, multiple species were previously observed to be depleted in the fecal microbiome of PC patients
42, such as
B. pseudocatenulatum and
B. bifidum.
Given the ecological connectedness between gut fungi and bacteria, we further investigated their genetic correlations using GCTA. Our results demonstrated considerable cross-kingdom associations at rg > 0.7 (Fig. 7b; Supplementary Table S17). Specifically, the abundance of Eremothecium was positively associated with Lachnospiraceae UCG-010 (rg = 0.72). In addition to the direct regulatory pathways of gut microbiota on host phenotypes, we assessed the causal possibility that gut fungi regulate human health through bacterial mediation. An example highlighted was the influence of Eremothecium on human height, potentially mediated by an increase in the bacterial abundance of Bifidobacteriaceae sp (Pmediation < 0.05; Fig. 7c; Supplementary Table S18). However, given the exploratory nature of these findings, more validation is essential to ascertain the reliability of the mediation effects and elucidate their underlying biological mechanisms comprehensively.
Webserver visualization of the genome-fungal and genome-bacterial associations
For a customized and in-depth exploration of high-priority fungi or bacteria, we created an interactive online resource (
omics.lab.westlake.edu.cn/data.html). This webserver provides intuitive representations of genetic findings and enables inspection of summary statistics for individual single-nucleotide polymorphisms and genes, and researchers could download the summary statistics across all microbiota targets.
DISCUSSION
In this study, we present a comprehensive genetic atlas detailing the inter-individual variation in gut fungi and bacteria, thereby offering a more holistic understanding of the gut microbiome's complexity. Our study uncovers distinct host genetic factors that shape the gut fungal and bacterial community. The microbial-related host variants colocalized with distinct eQTL signals across 27 immune cell types profiled. Furthermore, we found that the identified host genetic loci associated with both gut microbial kingdoms exhibit brain-enriched expression patterns. The Eremothecium shows a modulatory effect on microglial inflammatory responses, potentially through its metabolic activity. Moreover, the MR analysis highlights several possible causal links of multi-kingdom microbiota with host phenotypes, showing as direct or indirect pathways.
Our large, multi-kingdom study has allowed for a comparison of genetic effects on gut fungi and bacteria. A key finding of our study is the identification of unique genetic loci associated with the presence and abundance of certain gut fungi and bacteria. For instance, the identification of distinct signals, such as the associations of
Malassezia with rs10933647 and
Streptococcus with rs4802863, provides a deeper understanding of the genetic factors that may influence gut colonization by these microorganisms. Notably, recent work has shown that the presence of oral bacteria such as
Streptococcus in the gut may indicate microbial dysbiosis and decreased total bacterial load
43. This finding further supports the relevance of our observed association between
Streptococcus and
FPR2, a gene involved in immune regulation and epithelial barrier homeostasis. Furthermore, the enriched GO cluster networks derived from our data present an intriguing landscape of biological functions and pathways. Compared with gut bacteria, the fungi-associated genes showed stronger enrichment in pathways related to the regulation of T cell activation, a pattern that is particularly noteworthy. Our study also showed that the most abundant colocalization signals for fungi were predominantly found in Th1 cells. Helper T cells are crucial in the adaptive immune system, responsible for detecting infections, activating other T cells, and regulating the overall immune response
44. In contrast, Th2 cells are more closely associated with humoral immunity and allergic responses, which may be less directly linked to gut fungal regulation. At the same time, the differences in transcriptional variability and statistical power across cell-type-specific eQTL datasets may also contribute to the observed Th1 enrichment. Therefore, this apparent specificity should be interpreted with caution. While our study utilized a comprehensive set of 28 immune cell eQTLs to highlight the role of systemic immunity in host-microbiome crosstalk, we acknowledge that integrating eQTL data from brain-specific or gastrointestinal-specific databases could further refine our understanding of local regulatory mechanisms in future studies. Taken together, these findings suggest that certain host genetic factors may predispose individuals to harbor specific microbiota, potentially influencing their immune homeostasis, disease susceptibility, and overall health.
Emerging studies have highlighted the multifaceted ways in which host genetics shape the gut microbiome. For example, variants near
ABO and
FUT2 were found to modulate GalNAc metabolism, influencing gut microbial composition through the enrichment of glycan-utilizing bacteria such as
Faecalibacterium prausnitzii in A-antigen secretors
8,45. In our study, we identified an association between
Alloprevotella and two variants located in
GALNT18, a gene involved in O-linked mucin-type glycosylation (Supplementary Table S4). This supports the idea that host glycosylation patterns can selectively shape microbial colonization, possibly by altering mucosal nutrient landscapes. Moreover, a recent study integrating host transcriptomics and microbiome profiles demonstrated that the same microbial genus may affect different host pathways in distinct disease contexts
46. These findings underscore the multifaceted complexity of host–microbiome interactions, where host genetic and transcriptional backgrounds may determine not only which microbes colonize the gut, but also how these microbes functionally influence the host.
Like gut bacteria, the fungal microbiota or mycobiota engages in interactions with the host immune system
1,47. However, this interaction may exhibit cellular specificity that varies between the two kingdoms. In the present study, we found that host genetic signals associated with
Onygenales taxa colocalized with immune regulatory signals in human B cells and dendritic cells, indicating intriguing functional interactions. The association with
IL6-AS1 expression in switched memory B cells hints at a potential role for
Onygenales in modulating B-cell-mediated immunity, possibly influencing antibody production or memory response. Memory B cells are crucial for long-term immune protection. One study showed that the isolated human switched memory B-cells-derived monoclonal anti-
Candida antibodies enhanced the phagocytosis and protected against disseminated candidiasis
48. The interaction of this cell type with
Onygenales might have implications for both adaptive immunity and microbial homeostasis
49. Furthermore, the colocalization of GWAS signals for
Onygenales with
HBS1L eQTLs in dendritic cell subsets, including myeloid and plasmacytoid dendritic cells, underscores a potentially significant role in antigen presentation and immune regulation. Given that the
Onygenales order includes several known human pathogens, these colocalizations may also signify host susceptibility to
Onygenales infections. Taken together, to gain a deeper understanding of the complex interactions between the host and microbiota, future research needs to be conducted at the cellular level, focusing on the microbial immunomodulating species. Such studies will help to unravel the subtle mechanisms that influence an individual's susceptibility to fungal infections and may guide the development of therapeutic strategies targeted at specific cell types.
The brain, as a part of the CNS, is involved in the regulation of immune homeostasis in the gut
50,51. In the present study, we found that the identified host variants associated with both kingdoms are mapped to genes showing enriched expression in brain regions, such as frontal cortex BA9. Intriguingly, an animal study on extinction learning showed that an altered gut microbiota changed the gene expression in the cell subsets of medial prefrontal cortex. In addition, these genes were linked to synapse-related pathways and calcium signaling pathways
52. Another key finding is the notable evidence for the protective role of
Eremothecium, particularly
E. ashbyi, which may modulate inflammatory responses in microglia through riboflavin synthesis. The identification of gene homologs in
Eremothecium species, with predicted protein structures similar to riboflavin biosynthesis enzymes in
S. cerevisiae, suggests a conserved role in riboflavin production. The distinct riboflavin production capabilities of
E. ashbyi, demonstrated in our cultured fungal strains, and its impact on inflammatory markers in LPS-induced BV2 microglial cells, offer a perspective on the metabolic interface between gut fungi and host health. While these findings are exploratory, the consistent alignment between brain MRI structural associations and the functional anti-inflammatory effects observed in microglial models provides a compelling biological basis. However, we did not directly quantify riboflavin levels in cerebrospinal fluid or brain tissue, and therefore cannot directly demonstrate transport of gut-derived riboflavin to the central nervous system in humans. Moreover, while riboflavin produced by
E. ashbyi is a primary focus of our study due to its pronounced effects, we also recognize the potential contributions of other metabolites that may interact synergistically or additively to influence neurological outcomes. Our findings highlight the importance of immunometabolism, offering a valuable foundation for future research into various compounds in this field. Establishing causality and evaluating clinical translational potential will require future intervention studies and longitudinal human cohort investigations. Overall, given the close connections between the brain and the microbiome throughout evolutionary history, further studies are warranted to explore the role of gut fungi or mycobiota in the gut-brain axis.
This study has several limitations. First, limited by the nature of amplicon sequencing in our study, we cannot leverage the precise information at species or even strain level. Although shotgun metagenomic sequencing provides higher resolution, its ability for gut fungi sequencing is limited by the low relative abundance of fungal DNA and the lack of targeted enrichment. In this context, ITS2 sequencing is currently one of the best methods to profile the mycobiota. Second, a previous study shows an effect of ethnicity on the gut bacterial GWAS
9. We should notice that all the subjects of this study were Han Chinese. Thus, future microbiome GWAS with multi-ancestry is warranted. Third, our study includes cohorts with distinct physiological characteristics. Although our leave-one-out analyses demonstrate the robustness of our findings across different cohorts, it is important to acknowledge that the generalizability of the results may still be limited by the sample size and homogeneity within certain subgroups, such as pregnant women or elderly individuals. In a similar vein, our MR analysis is constrained by the relatively weak explanatory power of current microbial genetic instruments, which limits the sensitivity to detect small causal effects and warrants a cautious interpretation of null findings. In our sensitivity analysis, we observed an increased number of genetic associations among less prevalent microbial taxa. While this could indicate significant genetic interactions with these rarer taxa, it also raises concerns about the potential for false positives. Therefore, validating the results in more independent cohorts and corroborating these genetic links with additional multimodal data are warranted. Finally, although MR analysis could be used for the exploratory inference of causal effects, more studies are needed to provide insights into biological mechanisms underpinning the relationship between gut fungi, bacteria, and host health.
Building on the comprehensive insights provided by our conjoint genetic atlas of human gut fungi and bacteria, this study lays a foundation for exploring human–microbiome interactions and co-evolutionary adaptation. Our findings unravel the complex interplay between these microbial communities and the host's immune and neurological systems, which underscores the potential of the microbiome in personalized medicine.
MATERIALS AND METHODS
Study population and stool sample collection
A total of five cohorts, comprising 7,900 individuals from different cities in China, were included in the present fungal and bacterial GWAS analyses (Fig. 1a). Of these, four cohorts (THSBC, GNHS, ZMSC, and WeBirth) were from the Westlake Gut Consortium of gut microbiome-based human cohort studies
53. An independent cohort of 766 participants from RuLAS was included as external validation for the gut fungal GWAS
54. The supplementary Table S1 showed a detailed description of the characteristics of participants for each cohort.
THSBC
The Tongji-Huaxi-Shuangliu Birth Cohort (THSBC) is an ongoing prospective cohort study. Between 2017 and 2019, THSBC had involved 6,143 eligible pregnant women aged 18–40 years in Shuangliu Maternal and Child Health Hospital, Chengdu, China. Detailed information about the study design has been reported elsewhere
55. Stool samples were collected at the hospital and finally stored at –80 °C in the laboratory at HUST. Among the participants with genome-wide genotyping data, there were 4,464 individuals with fecal ITS2 sequencing data and 4,455 individuals with fecal 16S rRNA data (with 4,446 individuals having both), and these participants were included in the present GWAS analysis.
GNHS
The Guangzhou Nutrition and Health Study (GNHS) is an ongoing community-based cohort study in China. In this study, a total of 4,048 participants aged 45–70 years were recruited between 2008 and 2013. All individuals had lived in Guangzhou city (China) for at least five years and were followed up approximately every three years. Detailed information on the study design has been reported previously
56 (registered at clinicaltrials.gov as NCT03179657). Stool samples of GNHS were collected on-site at Sun Yat-sen University and stored at –80 °C before further processing. In the present analysis, we included 1,159 and 1,683 participants with ITS2 and 16S rRNA sequencing data as well as genotyping data, in the fungal and bacterial GWAS analysis, respectively.
ZMSC
The Zhejiang Metabolic Syndrome Cohort is a community-based prospective cohort that recruited 22,649 participants at baseline between 2010 to 2014. The study design had been reported previously
57. In 2020, stool samples from 1,403 individuals aged 33–87 years were collected at the Second People’s Hospital, Putuo District. In the present analysis, we included 1,126 and 1,151 participants from Zhoushan island in the fungal and bacterial GWAS, respectively.
WeBirth
The WeBirth cohort is an ongoing prospective cohort study among women with hyperglycemia during pregnancy living in Hangzhou (registered at clinicaltrials.gov as NCT04060056). The stool samples were collected on-site at Hangzhou Women’s Hospital and transported to the Westlake University laboratory by dry ice and kept in –80 °C freezers until further processing. Among the individuals with genotyping data, 435 had fecal ITS2 gene sequencing and 453 had fecal 16S rRNA sequencing data.
WePN-DA
The Westlake Personalized Nutrition and Health Cohort for Drug Addicts (WePN-DA) included abstinent drug addicts (aged 18–65 years) from three drug rehabilitation centers in Zhejiang Province, China. Participants with cancer, human immunodeficiency virus (HIV) infection, or other severe diseases suffering were not enrolled in this study. Two hundred participants were recruited from October 28, 2019 to December 15, 2019 (registered at clinicaltrials.gov as NCT04105621). Their stool samples were collected at the on-site hospital in each center and stored in a –40 °C freezer for temporary storage. After that, the samples were sent to the laboratory at Westlake University by dry ice and kept at –80 °C. In the present analysis, 166 (measured ITS2 sequences) and 193 participants (measured 16S rRNA sequences) with genotyping data, were included in the fungal and bacterial GWAS analysis, respectively.
RuLAS
The Rugao Longevity and Ageing Study (RuLAS) is a unique population-based longitudinal study of older residents in rural areas of Rugao City, Jiangsu Province, China. The study design had been reported previously
54,58. We included 766 participants with gut ITS2 data and host genotyping data in the present study.
Ethics statement
THSBC was approved by the Ethics Committee of Tongji Medical College, Huazhong University of Science and Technology (HUST), Wuhan, China (2017) (S225)-1; GNHS was approved by the Ethics Committee of the School of Public Health at Sun Yat-sen University (2018048) and the Ethics Committee of Westlake University (20190114ZJS0003); WeBirth was approved by the Ethics Committee of Westlake University (20190701ZJS0007); ZMSC was approved by the Ethics Committees of Zhejiang University School of Medicine; WePN-DA was approved by the Ethics Committee of Westlake University (20190812ZJS0009); RuLAS was approved by the Ethics Committee of the Fudan University School of Life Sciences (BE1815). All participants gave informed consent to participate in the study before taking part.
Culture experiments for fungal strains
Eremothecium strains and Pichia kudriavzevii were obtained from the China Center of Industrial Culture Collection (CICC) and Centraalbureau voor Schimmelcultures (CBS), respectively. Candida albicans and Saccharomyces cerevisiae were isolated from the human fecal sample and identified via Sanger sequencing targeting on fungal ITS region. All fungal cultures are started from stocks in 25% glycerol in Yeast and mold (YM) broth. Sterilized penicillin-streptomycin solution was added to the medium at the 1:100 ratio to avoid bacterial contamination.
Cell line
The BV2 microglial cell line was purchased from Procell Life Science & Technology. All cells were seeded at 2.5 × 105 cells/mL in DMEM medium mix (DMEM supplemented with 10% fetal bovine serum (FBS), 100 U/mL penicillin, 100 U/mL streptomycin antibiotics, 2 mmol/L GlutaMAXTM, and 1 mmol/L sodium pyruvate), and incubated at 37 °C with 5% CO2 for 24 h.
Host DNA extraction, genotyping and quality control
For THSBC, the host DNA was extracted from peripheral blood leukocytes using the genomic DNA extraction kit RC1003 (Concert, Xiamen, China). The DNA of ZMSC was isolated from whole blood using a TACO automatic nucleic acid extraction apparatus (GeneReach Biotechnology Corp., Taichung, Taiwan, China). As for the other three cohorts, the host DNA was extracted from leukocytes using the TIANamp Blood DNA Kit (DP348, TianGen Biotech Co, Ltd., China) according to the manufacturer’s instructions. DNA concentrations were determined using the Qubit quantification system (Thermo Fisher Scientific, Wilmington, DE, USA). The extracted DNA was stored at −80 °C. The genotyping for the former two cohorts was performed using the Infinium Asian Screening Array (ASA)-24 v1.0 BeadChip (Illumina, Inc., San Diego, California, United States) and standard Illumina protocols. The latter three cohorts were genotyped with the Illumina ASA-750K arrays.
Quality control and relatedness filters were performed using the same standard for all cohorts by PLINK1.9
59. We filtered the variants with per-SNP call rate < 95% and deviation from the HWE with
P < 10
−5. We also excluded the participants with SNP missingness > 0.05 and sex discrepancy for each cohort. Individuals with a high or low proportion of heterozygous genotypes (outliers defined as three standard deviations) were excluded
60. Individuals who had different ancestries (the first two principal components ± 5 standard deviations from the mean) or related individuals (IBD > 0.185) were excluded
60. Variants were mapped to the 1000 Genomes Phase 3 v5 by SHAPEIT
61,62, and then genome-wide genotype imputation was conducted with the 1000 Genomes Phase 3 v5 reference panel by Minimac3
63,64. Genetic variants with imputation accuracy RSQR > 0.3 and MAF > 0.01 were included in the analysis.
Human gut fungal and bacterial community profiling
ITS2 data to detect fungal identity, diversity and abundance
For
ITS2 gene sequencing, raw data were obtained using the same method as described previously
2. The
ITS2 hypervariable regions of the fungal ITS rRNA gene were amplified with primers ITS3F: GCATCGATGAAGAACGCAGC and ITS4R: TCCTCCGCTTATTGATATGC by thermocycler PCR system (GeneAmp 9700, ABI, USA). After the PCR reactions and purification, the purified amplicons were pooled in equimolar and paired-end sequenced on an Illumina MiSeq platform (Illumina, San Diego, USA) according to the standard protocols by Majorbio Bio-Pharm Technology Co. Ltd. (Shanghai, China).
Raw data were demultiplexed by the MiSeq Controller Software (Illumina Inc.). QIIME2 (version 2021.8) was used for the downstream analysis
65. The demultiplexed ITS2 sequences were denoised and grouped into amplicon sequence variants (ASVs) using DADA2
66. The ASV features that were presented in only one sample were excluded for all cohorts. The individual ASVs were taxonomically classified based on the UNITE (version 8.2, 99%) database using the VSEARCH tool wrapped in QIIME2. α-diversity analysis was conducted through the q2-diversity plugin at a sampling depth of 10,000. α-diversity was estimated by Shannon’s diversity index, Observed Features, Pielou’s Evenness, and Faith’s PD.
16S rRNA data to detect bacterial identity, diversity and abundance
For the 16S analysis, raw data from GNHS were obtained using the same method as previously described
67. For the other four cohorts, we extracted the microbial DNA from fecal samples using the E.Z.N.A.® soil DNA Kit (Omega Bio-tek, Norcross, GA, U.S.) according to the manufacturer’s protocols. The final DNA concentration and purification were determined by NanoDrop 2000 UV-vis spectrophotometer (Thermo Fisher Scientific, Wilmington, USA), and DNA quality was checked by 1% agarose gel electrophoresis. The bacterial 16S rRNA gene was amplified with primers 338F: ACTCCTACGGGAGGCAGCAG and 806R: GGACTACHVGGGTWTCTAAT by thermocycler PCR system (GeneAmp 9700, ABI, USA). The PCR reactions were conducted using the following program: 3 min of initial denaturation at 95 °C, 35 cycles of 30 s at 95 °C, 30 s for annealing at 55 °C, and 45 s for elongation at 72 °C, and a final extension at 72 °C for 10 min. PCR reactions were performed in triplicate in a 20 μL mixture containing 4 μL of 5× FastPfu Buffer, 2 μL of 2.5 mM dNTPs, 0.8 μL of each primer (5 μM), 0.4 μL of FastPfu Polymerase and 10 ng of template DNA. The resultant PCR products were extracted from a 2% agarose gel and further purified using the AxyPrep DNA Gel Extraction Kit (Axygen Biosciences, Union City, CA, USA) and quantified using QuantiFluorTM-ST (Promega, USA) according to the manufacturer’s protocol. Purified amplicons were pooled in equimolar and paired-end sequenced (2 × 300) on an Illumina MiSeq platform (Illumina, San Diego, USA) according to the standard protocols by Majorbio Bio-Pharm Technology Co. Ltd. (Shanghai, China).
Fastq files were demultiplexed by the MiSeq Controller Software (Illumina Inc.) using a mapping file as input. Sequences were merge-paired, quality-filtered and analyzed using QIIME2 (version 2021.8)
65. As described above, we used DADA2 denoised-paired plugin in QIIME2 to process the fastq files
66. We filtered the features that were present in only a single sample. The taxonomies of ASVs were subsequently determined using the VSEARCH tool based on the Sliva_138 99% reference database. α-diversity analysis was conducted at a sampling depth of 10,000. α-diversity of the gut bacteria was estimated by the indices the same as ITS2 data.
Gene mapping and functional annotation
We applied ANNOVAR (version 2022-06-07) to annotate these lead variants and obtained the nearest genes for each locus
68. We then used snpXplorer to prioritize the likely affected genes of the genetic variants (
snpxplorer.net/)
69. The identified genes were subjected to tissue-specificity analysis using the GENE2FUNC module of the FUMA webservice (
fuma.ctglab.nl/gene2func)
29. All of the parameters were default (Ensembl v92, GTEx v8). We further used the Metascape (
metascape.org) to conduct the GO enrichment analysis
26.
Brain MRI acquisition and analysis
Brain MRI data were obtained from participants enrolled in the Westlake Nutrition and Brain Study (WEBRAIN, NCT04135066), a prospectively designed imaging sub-study of the GNHS. MRI examinations were conducted at the enrollment visit of WEBRAIN, corresponding to a follow-up assessment of GNHS participants (mean ± SD: 11.91 ± 1.64 years after the GNHS baseline visit).
Brain T1-weighted images (T1WI) were acquired on a 3.0T Magnetom Aera, by using a 3-dimensional magnetization-prepared rapid gradient echo sequence (MPRAGE, MAGNETOM Skyra, Siemens Healthineers, Erlangen, Germany). The following settings were applied for the acquisition of T1WI: sagittal orientation, 176 slices without gaps, 1.0 × 1.0 × 1.0 mm3 voxel size, a field of view of 256 × 256 × 176 mm3, flip angle of 8°, repetition time of 2,300 ms, and echo time of 2.19 ms. All scans were inspected for structural abnormalities, and movement artifacts. Finally, 501 GNHS samples were applied for the following preprocessing and analysis, with no scans excluded due to poor image quality.
VBM
VBM is one of classic methods to assess structural changes, involving a voxel-wise comparison of the local intensity of gray matter
70. The processing and analysis of these 3D T1 images were carried out using MATLAB R2020b (The MathWorks Inc, Natick, MA) and Statistical Parametric Mapping software (SPM12; The Welcome Department of Imaging Neuroscience, London). The standard protocol was implemented for preprocessing, including segmentation, alignment by Diffeomorphic Anatomical Registration Through Exponentiated Lie Algebra (DARTEL), modulation and normalization to the Montreal Neurological Institute (MNI) space. For the final step, the normalized images were smoothed by an 8-mm full-width at half maximum (FWHM) isotropic Gaussian kernel
71,72.
Brain MRI data analysis
For the present study, we focused on the regions of interest (ROIs) including thalamus, superior frontal gyrus, and medial superior frontal gyrus, which were identified by the Automated Anatomical Labeling atlas. More detailed information has been reported in our previous study
72. Further, we explored the association between gut microbiota and each voxel of targeted ROIs using linear mixed-effect models in MATLAB (False discovery rate [FDR] corrected, FDR < 0.05, adjusted for age, sex, and total intracranial volume [TIV]). Regions in the resultant T-map with FDR < 0.05 and cluster > 10 voxels were considered significant. Peak voxels were clustered and labeled according to the automated anatomical labeling (AAL) atlas
33. We also extracted the volumetric information of each individual ROI to explore associations between the gut microbiota and the volumes using multiple linear regression models adjusted for age, sex, and TIV.
Genome analysis for Eremothecium species
We extracted the ITS2 sequences of four
Eremothecium species (
E. gossypii,
E. cymbalariae,
E. sinecaudum, and
E. coryli) from the UNITE database, and constructed a phylogenetic tree utilizing EvolView v2
73,74. To investigate their capability to produce riboflavin, we retrieved the whole genome sequences of these species from NCBI database. In addition, we predicted the genes for all four genomes of the
Eremothecium species using AUGUSTUS (version 3.5.0)
75, which is a gene prediction program for eukaryotes. A homology assessment was then conducted against the last two key enzymes in the riboflavin biosynthesis pathway of
Saccharomyces cerevisiae, RIB4 and RIB5. We employed BLAST+ (version 2.4.1) for the sequence similarity identification
76.
Subsequently, for the protein sequences exhibiting the highest similarity within each species, we predicted the protein structures using advanced protein language models ESMfold
77 (
esmatlas.com/resources?action=fold; fair-esm, version 2.0.1) with default parameters. Full-length amino acid sequences were used as input without trimming or homology-based preprocessing. We then performed Pairwise Structure Alignment using TM-align with default settings to compare their predicted protein structures with RIB4 and RIB5 (
https://www.rcsb.org/alignment). The structural similarity was assessed by the TM-score based on global structural alignment, which provided insights into the potential functional conservation in riboflavin biosynthesis among the evaluated
Eremothecium species
78.
In vitro fungal metabolites production and collection
Fungal glycerol stock was streaked on YM agar to obtain single colonies. We collected 3 to 5 colonies and washed three times with PBS to remove the metabolites. During the final washing, the pellet was resuspended in 3 mL 25% glycerol in PBS, which was aliquoted to 500 μL and stored at −80 °C. 100 μL of this aliquot was used to inoculate 10 mL YM broth supplemented with dual antibiotics in a 50 mL sterile mini bioreactor capped with 0.22 μm filters. Immediately after inoculation, 1 mL of culture was sampled for background metabolites measurement. The cultures then were incubated at 25 °C and 37 °C in an aerobic condition with continuous shaking (200 RPM) for 3 days. To collect media for cell assay and metabolomics analysis, fungal cultures were sterilized by centrifugation (4,000× g, 5 min) and filtration (0.1 μm, PES). The obtained solutions were immediately aliquoted and stored at −80 °C until use.
Quantitative riboflavin measurements and untargeted metabolomics analysis
For metabolites extraction, 400 μL of methanol (pre-chilled to −80 °C) was added to 100 μL of processed fungal cultures thawed in an ice bath. The methanol solutions were gently mixed and incubated at −80 °C for 2 h. The solutions were then centrifuged at 14,000× g for 10 min at 4 °C to obtain supernatants for metabolomics analysis. For riboflavin measurement, 2 μL of prepared samples were injected onto a Waters Acquity UPLC HSS T3 column, which was coupled to an Agilent 1290-6549 LC-MS machine with ESI+ mode. For chromatographic separation, the mobile phase A consists of 0.1% formic acid in water, while phase B is composed of 0.1% formic acid in methanol. The flow rate is set at 0.3 mL/min, and the temperature is maintained at 40 °C. Riboflavin was identified and quantified based on accurate mass, retention time, and comparison with an authentic standard.
Untargeted metabolomics analysis was performed using an LC-MS system comprising an Agilent 1290 Infinity II UHPLC system tandem with Agilent6545 Q-TOF/MS (Agilent). MS data were acquired using electrospray ionization in both positive and negative ion mode over 50–1250 m/z. Raw data were processed using Profinder 10.0 (Agilent) for peak detection, alignment, and integration. Specifically, 5 μL of prepared samples (drying with nitrogen and dissolved in 80% methanol) were injected onto a Waters Acquity UPLC BEH Amide column (100 mm×2.1 mm, 1.7 μm) to perform chromatographic separation. The mobile phase A is composed of an aqueous solution containing 15 mM ammonium acetate and 0.3% NH3·H2O. On the other hand, phase B is a solution consisting of 15 mM ammonium acetate and 0.3% NH3·H2O in a 9:1 ratio of acetonitrile to water. The flow rate is set at 0.3 mL/min.
BV2 microglia culture and stimulation with LPS
The BV2 microglial cell culture medium was replaced every 2 days. For the formal experiments, the BV2 cells were pre-incubated with additional 5% (v/v) fungal supernatant, with E. ashbyi specifically yielding a riboflavin concentration of approximately 3 µg/mL. This pre-incubation lasted for 4 h prior to exposure to 0.5 µg/mL LPS from Salmonella enterica serotype typhimurium over a 12-h period. As controls, cells were incubated with pure medium for fungal culture and treated with phosphate buffer saline (PBS) before the end of treatment.
Enzyme-linked immunosorbent assay (ELISA)
The culture medium was harvested and subjected to a centrifugation process at 1,000× g for 10 min. The resultant supernatant was segregated and reserved for further inflammatory cytokine measurements via mouse ELISA Kits according to the manufacturer’s protocol.
Transcriptome measurements and analysis
Total RNA was extracted from LPS-treated cells subjected to different conditions: E. ashbyi, 1 µg/mL riboflavin, 10 µg/mL riboflavin, 50 µg/mL riboflavin, and 100 µg/mL riboflavin. The extraction was performed using TRIzol® Reagent. RNA quality was assessed with an Agilent 5300 Bioanalyzer and a NanoDrop ND-2000 spectrophotometer. For library construction from standard RNA quantities, the Illumina® Stranded mRNA Prep, Ligation protocol was used. This process included mRNA isolation via polyA selection, cDNA synthesis, end-repair, adapter ligation, size selection for 300–400 bp fragments, and PCR amplification. The resulting libraries were sequenced using the NovaSeq X Plus platform.
Quality control of raw reads was performed using fastp
79, and reads were aligned to a reference genome with HISAT2
80. The mapped reads of each sample were assembled by StringTie in a reference-based approach
81. We then conducted the differential gene expression analysis using DESeq2
82.
Slc52a2 KO experiments
For generation of Slc52a2–/– BV2 cell line, single-guide RNAs (sgRNAs, Forward sequence: ATGGGAGACACCTCGATCGG; Reverse sequence: CCGATCGAGGTGTCTCCCAT) were cloned into LentiCRISPRv2-mCherry vector (Addgene, #99154), and subsequently co-transferred into HEK293T cells with packaging plasmids psPAX2 and pMD2.G, using PolyJet (Signagen Laboratories). Lentiviral supernatants harvested at 48 h and 72 h were concentrated by ultracentrifuge and used to infect BV2 cells. For further expansion and subsequent analysis, mCherry+ BV2 cells were flow sorted, starting 48 h post-infection. Control BV2 cells were infected by lentivirus with a blue fluorescent protein (BFP) signal without sgRNAs.
For validation in the Slc52a2−/− BV2 cell line, a total of 1 × 105 to 3 × 105 BV2 were sorted into 200 µL of lysis buffer. Total RNA was purified with Quick-RNA Microprep Kit (Zymo) according to the manufacturer’s protocol. Equal amounts of RNA were reverse-transcribed into cDNA by using iScript™ Reverse Transcription Supermix (Biorad). Quantitative real-time PCR (qRT-PCR) was performed using iTaq Universal SYBR Green Supermix (Bio-Rad) on a Qtower384G System (Jena) (Primer: Forward-CCTCCGATCGAGGTGTCTC; Reverse-GTCGGGGATGGCAGTAGTAA). Quantification of the PCR signals of each sample was performed by comparing the cycle threshold values (Ct) of Slc52a2 gene against to the corresponding value of GAPDH, as housekeeping gene. Finally, we repeated the ELISA experiments in Slc52a2−/− BV2 cell models.
Statistical analysis
Meta-analysis of GWAS
In this study, we applied both abundance-based and presence-based methods to conduct the fungal and bacterial GWAS based on the rationale reported previously
7. We included the α-diversity indices and all levels of microbial taxa (phylum to genus) except for the species in both fungal and bacterial kingdoms as GWAS outcomes. In addition, we excluded the fungal taxa presented in less than 5% of the 7,350 samples and bacterial taxa presented in less than 10% of the 7,935 samples. Finally, a total of 205 features and 297 features for the abundance-based analysis were included in the fungal and bacterial GWAS, respectively (Supplementary Table S2). For the presence-based analysis, we dropped the features present in more than 90% of the total samples, resulting in 195 features in fungal GWAS and 246 features in bacterial GWAS.
Analysis of α-diversity
We analyzed the genetic effects on four α-diversity indices after the
z-score transformation. In each cohort, a mixed linear model (MLM)-based association analysis was performed with GCTA-MLMA (version 1.93.2)
20,21, adjusting for the covariates age, sex, sequencing depth, batch and the first 10 genetic principal components. All statistical tests were performed two-sided.
Abundance-based analysis
We firstly conducted the centered log ratio (CLR) transformation and z-score normalization for the count abundances of each feature. The zero count in the abundance matrix was replaced with the pseudocount 1. Then we fitted them using the GCTA-MLMA model after the adjustment for age, sex, batch and the first 10 genetic PCs. Genotypes were coded additively as 0, 1, or 2 according to the number of effect alleles.
Presence-based analysis
To investigate the genetic effects on the gut fungal and bacterial presence-absence pattern, we recoded the abundance values of each taxon as 0 (absent) or 1 (present). After that, we performed a standard logistic regression as implemented in PLINK2 (version 2.0) and adjusted for the covariates the same as abundance-based analysis. Genotypes were similarly modeled in an additive manner.
Meta-analysis
We estimated the genomic inflation (λ
GC) for all cohorts and features. For each cohort, we further excluded the features that present in less than 5% and 10% of the samples in the fungal and bacterial GWAS, respectively. These thresholds were strategically chosen to balance the need for sufficient statistical power with the desire to include a comprehensive range of taxa. The results of GWAS analysis in separate cohorts were meta-analyzed by the METAL software
83. We combined the
P values for all cohorts by the weighted effect size estimates using the inverse of the corresponding standard errors. The criteria for reporting a significant association were: 1) a genome-wide significant meta-analysis
P < 5 × 10
−8; 2) consistent direction of effect across the sub-cohorts; 3) the combined sample size > 5,000. As the microbial features may correlate with each other, we calculated the effective number of the independent variables using the matSpDlite algorithm
84. While this method helps balance control of false positives and statistical power, we acknowledge that it is one of several possible approaches for multiple-testing correction in microbiome GWAS. For the fungal GWAS, it yielded 108 and 120 features for the abundance-based and presence-based analysis, respectively. Then we defined a study-wide significance threshold of
P < 5 × 10
−8/228 = 2.2 × 10
−10. For the bacterial GWAS, we obtained 152 and 145 features, respectively, and we defined a study-wide significance threshold of
P < 5 × 10
−8/297 = 1.7 × 10
−10. We further conducted a sensitivity analysis to explore the impact of varying prevalence cutoffs for fungal and bacterial taxa on the associations detected. The lead variants were identified by LD clumping at r
2 < 0.1 with a genome-wide significance using plink software (version 1.90)
59. We considered a ± 500 kb genomic region around the lead variants as a gene locus. Then, we merged the loci into one locus if the genomic regions overlapped.
Sensitivity analysis for diet
To assess the robustness of our GWAS results with respect to dietary influences, we first applied linear mixed-effects models to examine diet-microbiome relationships across the five cohorts, adjusting for age, sex, and sequencing batch, with cohort included as a random effect. We then conducted a sensitivity analysis by further adjusting the food groups in the primary model. Given the data availability, we mainly included five food groups for all cohorts: meat, vegetables, fruits, cereals and dairy intakes. Each food group was categorized into cohort-specific quartiles. The dietary intakes in WePN-DA were obtained from the diet diary, while for the other cohorts it was estimated from the food frequency questionnaires and were categorized into four quartiles.
Leave-one-out analysis
In our study, we employed a leave-one-out analysis to assess the stability and robustness of our findings. This analysis involves sequentially removing one cohort at a time from the meta-analysis and recalculating the effect sizes for each remaining dataset. This method could identify if any single cohort disproportionately influences the overall results. Additionally, it helps confirm the generalizability of our findings across diverse populations.
Independent replication analysis
To further assess the reproducibility of our findings, we conducted a replication analysis using an independent cohort and external datasets. First, we attempted to replicate significant host-fungal associations identified in our main analysis within an additional cohort of 766 Chinese individuals from RuLAS. This cohort was selected based on its independence from the discovery cohorts and the availability of both microbiome and host genetic data. Given the data availability, we analyzed the variants with an MAF > 0.05 for this validation. We examined the consistency of the effect direction and statistical significance between the independent cohort and the original findings. Furthermore, to replicate the finding from our gut bacterial GWAS, we leveraged data from the MiBioGen consortium, which includes 18,340 individuals of European descent
5. We selected overlapping bacterial taxa and genetic variants from our study and the MiBioGen dataset for replication analysis. A total of 30 out of 81 identified associations were tested based on the availability. We evaluated the associations for consistency in effect direction and statistical significance in the MiBioGen cohort. For consistency across validation efforts, we adopted a nominal
P value threshold of 0.05 as a permissive criterion for replication.
Heritability analysis
The heritability of gut microbial features was estimated using the GCTA-GREML approach
27,85. As THSBC preserved the highest number of features after prevalence filtering, we performed the heritability analysis in 4,464 and 4,455 participants for both fungal and bacterial features, respectively. In this analysis, the genetic relationship matrix (GRM) was calculated after excluding the SNPs located in the MHC region. The GREML model was adjusted for age, sex, sequencing batch and the first 10 genetic PCs. To evaluate the heritability of various gut microbial features among different genomic regions, we further performed a heritability partitioning analysis using the same model described above. The 11 genomic regions included exonic, intronic, intergenic, upstream, downstream, 3’ untranslated region (UTR), 5’ UTR, non-coding RNA (ncRNA) exonic, ncRNA intronic, TFBSs, and DHSs. The SNPs of these regions were annotated by ANNOVAR (version 2022-06-07)
68.
Colocalization analyses
Based on the summary statistics identified in the present study, we conducted a cross-kingdom colocalization analysis to explore potential shared genetic influences between gut fungi and bacteria. For each identified host variant associated with gut fungi, we examined colocalization within a 1,000 kb window centered on the variant with all bacterial traits demonstrating significant heritability. Conversely, for each host variant identified in association with gut bacteria, the same 1,000 kb window was used to assess colocalization with all significantly heritable fungal traits. This approach ensures focused analysis on microbial features with confirmed genetic contributions, thereby minimizing the risk of overestimating the colocalization signals.
We included the summary statistics of eQTL of 28 distinct human immune cell types from ImmuNexUT (
n = 416) (humandbs.biosciencedbc.jp/en/) to comprehensively test whether there was a putative shared genetic structure between microbiome features and host omics layers
28. A 1,000 kb window centered on GWAS lead variants was employed to test cell-type specific colocalization. The analysis was carried out using the
coloc R package
86. The prior probabilities that the SNP will only be linked with trait 1 were set at 1 × 10
−4, that it will only be associated with trait 2 were set at 1 × 10
−4, and that it will be associated with both traits were set at 1 × 10
−5. If the PP4 was more than 0.8, two signals were considered to be ‘highly likely to colocalize’, while the associations with PP4 > 0.5 were deemed ‘likely to colocalize’.
Bulk and Single-cell transcriptomics analyses of the riboflavin transporter gene
Bulk gene expression data for the human hypothalamus and frontal cortex were retrieved from the Genotype-Tissue Expression (GTEx) project (version 8)
87. Individuals were stratified into two groups based on the median expression level of the
SLC52A2 gene. Differential gene expression analysis was then performed using the
DESeq2 (version1.40.2)
82, with adjustments for age and sex to account for potential confounders.
We further examined the expression levels of
Slc52a2 across various brain cell types in a mouse model with LPS stimulation
88. To extend our analysis to the cellular level, we retrieved the single-cell RNA sequencing data of 16 aging mice brains from the Broad Single Cell Portal (singlecell.broadinstitute.org/single_cell)
89. We conducted a Spearman correlation analysis to investigate the association of various genes with
Slc52a2 expression, excluding genes with a prevalence of less than 5% prior to the analysis.
Causal evaluation of gut fungi and bacteria with host phenotypes
Mendelian randomization analysis
To investigate the causal effects of the gut microbiome on host phenotypes, with the gut fungi and bacteria-related lead variants, we conducted a two-sample MR analysis on 220 host phenotypes using the
TwoSampleMR package
90. We filtered the significant GWAS associations to ensure the independence of our instrumental variables, using a linkage disequilibrium (LD) clumping threshold of R
2 < 0.001 and a distance of > 10,000 kb. MR tests were performed using the Wald’s ratio, and Wald’s ratios were meta-analyzed using the inverse variance-weighted average method (IVW)
91. The summary-level GWAS data for clinical traits and human diseases were obtained from the BBJ and the AGEN-T2D study
92-95. The
P values were adjusted for multiple testing using FDR correction (
q values) to control for type I error inflation.
Genetic correlation between gut fungi and bacteria
For the gut microbial features with
P for heritability < 0.05, we performed a bivariate GREML analysis to estimate the genetic cross-kingdom correlation between gut fungi and bacteria
96. The genetic correlations with estimate (
rg) > 0.7 or < –0.7 were considered as strong correlations.
Mediation analysis
We used a two-step MR approach to investigate the effect of the gut fungi on diseases via bacteria
97. We evaluated the causal effect of fungi on bacteria with the inverse-variance weighted (IVW) method. For those fungi-bacteria comparisons with
P < 0.05, we then used the multivariable MR analysis to assess the effect of fungi on diseases or traits (α) and the effect of bacteria on diseases or traits (β). The indirect effects were calculated with ‘product of coefficients’, and the standard errors were derived as the formula
98.
DATA AVAILABILITY
The raw data of the ITS2 sequencing and 16S rRNA sequencing for each cohort have been deposited in the Genome Sequence Archive (GSA) (
ngdc.cncb.ac.cn/gsa/) at accession number THSBC: CRA014764, CRA014766; GNHS: CRA009842, CRA006769; ZMSC: CRA009894, CRA009895; WeBirth: CRA010696, CRA010697; WePN-DA: CRA010674, CRA010673. The summary statistics for gut fungi and bacteria in the present study are available at
omics.lab.westlake.edu.cn/data.html. Publicly available summary statistics for eQTLs of immune cells were obtained from humandbs.dbcls.jp/en/. The human brain SLC52A2 expression data can be accessed through the human protein atlas (
proteinatlas.org/ENSG00000185803-SLC52A2/tissue). Bulk gene expression data for the human hypothalamus and frontal cortex were retrieved from the GTEx project (version 8), accessible via gtexportal.org/home/downloads/adult-gtex/bulk_tissue_expression. The single cell transcriptomic data of the LPS treated mice were obtained from brainimmuneatlas.org/download.php and the aging mice were obtained from portals.broadinstitute.org/single_cell/study/aging-mouse-brain. This paper does not report original code. Any additional information required to reanalyze the data reported in this work paper is available from the Lead Contact upon request.
The Author(s) 2026. Published by Higher Education Press. This is an Open Access article distributed under the terms of the CC BY license (https://creativecommons.org/licenses/by/4.0/).