Synchronized seasonal excretion of multiple coronaviruses coincides with high rates of coinfection in immature bats

Bat capture and sampling was performed following best practices⁶⁵. Field protocols were approved by the Montana State University Institutional Animal Care and Use Committee (201750) and Griffith University Animal Ethics Committee (ENV/10/16/AEC and ENV/07/20/AEC). The research was conducted in accordance with Scientific Purposes Permits from the Queensland Department of Environment and Heritage Protection (WISP17455716, WA0012532 and WA0058827), a permit to Take, Use, Keep or Interfere with Cultural or Natural Resources (Scientific Purpose) from the Department of National Parks, Sport and Racing (WITK18590417), and a Scientific License from the New South Wales Parks and Wildlife Service (SL101800).

Study sites

We selected five study sites in south-east Queensland and northern New South Wales, Australia, where black flying foxes (BFF; Pteropus alecto gouldi) and grey-headed flying foxes (GHFF; Pteropus poliocephalus) were regularly present (Supplementary Fig. 1; Toowoomba (−27.60 S, 151.94 E), Redcliffe (−27.23 S, 153.10 E), Sunnybank (−27.58 S, 153.05 E) and Burleigh Knoll (−28.08 S, 153.44 E), in Queensland, Australia and Clunes (−28.73 S, 153.42 E), New South Wales (NSW)). Site selection criteria included attributes associated with viral spillover risk and feasibility³¹, including: continuous occupation by BFF, recently overwintering, limited native winter food, and sampling feasibility, access, and permissions³¹. Little red flying-foxes (LRFF; Pteropus scapulatus) were seasonal visitors to the Redcliffe, Sunnybank, and Toowoomba sites (October–May). Another species that regularly roosted within the Redcliffe site was the Australian white ibis (Threskiornis molucca). We collected fecal samples for viral screening directly from individual bats (n = 1188) as well as from population-level sampling by collecting excreta on plastic sheets under the roosts (n = 1349).

Individual bat sampling

Between May 2018 and December 2020, we caught and sampled flying foxes over 13 sessions at the Redcliffe roost and 15 sessions at the Toowoomba roost. Additionally, we caught and sampled at the Clunes roost in August 2017, February 2018, and August 2018 (Supplementary Table 1 and Supplementary Figs. 1 and 3). All 31 individual sampling sessions extended 3–4 consecutive days, with the objective of collecting samples from approximately 60 bats per session. Sampling sessions typically occurred five times per year and covered various stages of the reproductive cycle: in March (early-autumn, around weaning/mating), May (late-autumn, early-pregnancy), July (mid-winter, mid-pregnancy), September (early spring, late-pregnancy), and in December (early-summer, lactation), following a mid-October peak in births.

Bats were caught in mist nets (Ecotone 716/7-12P) set at or above roosting height with modified antenna poles (Spiderbeam 26 m Fiberglass Telescoping Pole) and opened approximately 2 h before dawn to catch individuals returning from their nocturnal foraging activities. Bats were removed from the nets immediately to minimize stress or injury. After capture, bats were temporarily placed in cloth bags until they were anesthetized with isoflurane (starting at 5% then reducing to 1.5%) for sampling (within 6 h). While our sampling primarily focused on BFF, a smaller number of GHFF were sampled opportunistically. All bats were checked for a Passive Integrated Transponder (PIT) tag to identify recaptured individuals. If no PIT tag was present, we inserted a PIT tag (RFID, ZD Tech Group China) under the skin between the scapulae while the bat was anesthetized.

The data recorded for each bat included species, sex, age category (juvenile, subadult, or adult), body mass (grams), forearm length (millimetres), and reproductive status (males: immature, reproductive; females: immature, pregnant, lactating, post-lactation). Age categories were assigned using a combination of sexually dimorphic ranges (weight and forearm, where applicable) in conjunction with the timing of seasonal birth pulses (juveniles: smaller than adult body size (generally <150 mm forearm and <450 g mass), not sexually mature, assumed to be <12 months of age (including dependent pups); subadults near or at adult body size, not sexually mature, assumed to be 12–24 months; adults: full body size (generally >160 mm forearm and >550 g mass), with females showing evidence of prior lactation (elongated nipples) and males with enlarged penis and descended, enlarged testes, assumed to be >24 months). Reproductive status was determined by evidence of sexual maturity (see above), and current or past reproduction. Specifically, pregnancy was determined by the presence of a palpable uterine bulge; lactation by the ability to express milk from the nipples, the size and shape of nipples, and the absence of fur around nipples (lactation: elongated bare nipples; post-lactation: elongated furred nipples; immature: small furred nipples); and reproduction in males by the size and descension of genitalia. For additional details on age class and reproductive status definitions and methods, see Pietromonaco et al.⁶⁶. We collected fecal samples using sterile cotton swabs either as the bat defecated under anesthetic, directly from the rectum, or from feces deposited within the cloth bags during holding periods. These swabs were immediately placed in individual resealable plastic bags, placed on ice, and subsequently preserved at temperature of −20 °C or −80 °C until the laboratory processing and analysis were performed. Dependent (suckling) juveniles generally remain in roosts overnight during December and January, and so were not captured and sampled during these months (with one exception).

All bats were administered fluids (one part 2.5% glucose and 0.45% NaCl solution and one part Hartmann’s solution) subcutaneously and monitored for at least one hour before release back into their roost. All animal handling was conducted under approval from the Griffith University Animal Ethics Committee (Certificate: ENV/10/16/AEC and ENV/07/20/AEC). Personal Protective Equipment and disinfection protocols followed best practice guidelines (e.g. IUCN Bat Specialist Group, 2021; Wildlife Health Australia, 2020). The field team collected individual bat data onto optical mark recognition (OMR) forms, which were processed using Scantron software, which exported data into CSV files for analyses.

Under-roost sampling

In conjunction with individual bat sampling, we collected fecal samples monthly from underneath roosting bats from all five sites (Toowoomba, Redcliffe, Sunnybank, Burleigh Knoll and Clunes; Supplementary Table 2 and Supplementary Figs. 1 and 4). Under-roost sampling methods are described in full in Lunn et. al³¹. We deployed plastic sheets measuring 0.9 × 1.3 meters beneath the trees where bats were roosting^31,67. We positioned the plastic sheets under the trees before the bats returned to the site at dawn. To avoid sampling the same individuals twice, we maintained a minimum distance of at least one meter between sheets preferentially targeting areas of the roost occupied by BFF. In instances where the roost size was too small for sheet locations to meet the minimum distance requirement, we deployed fewer sheets to ensure sample independence. Once bats settled on their roosting trees, we collected individual fecal samples from each sheet (median 1 sample per sheet, range 0–13), generally within three hours after dawn. We collected approximately four grams of feces per sample and stored them in individual resealable bags and then preserved at a temperature of −20 °C or at −80 °C until laboratory analysis. As flying fox feces exhibit high variability in color and consistency, independent samples can typically be easily identified. We considered each sample collected from a sheet as a sheet-level replicate. During the sample collection process, we recorded the presence of each bat species and the number of bats present on the tree branches above the plastic sheets by direct observation. The field team collected under-roost data using REDCap (Research Electronic Data Capture) software, which exported data into CSV files for analyses.

Coronavirus screening

Sample selection and pooling

We optimized testing effort and costs by applying a two-phase screening approach using pooled samples³³ (Supplementary Fig. 2). We screened all individual bats for which fecal samples were available, including both BFF and GHFF. For samples from individual bats, RNA was extracted from each sample separately (extraction described below), then we randomly assigned the RNA extract products from each sample into pools of three from within the same sampling session. Depending on the total number of samples obtained in a session, one pool per session might have only consisted of one or two samples. In the second step, if a pool yielded a positive coronavirus detection, the extracts of samples included in the pool were re-analyzed individually. During this second step, we identified the specific individual bat or bats that contributed to the positive result. When pools rendered a negative result, all individual bats that contributed a sample to that pool were categorized as negative (Supplementary Fig. 2).

For fecal samples obtained from the population-level under-roost sampling, we first filtered the available samples to include only those ones that were noted to have BFF roosting over the sheet but no GHFF. Next, for each site and sampling session, we randomly selected 30 samples using the following rules: (1) Using stratified random sampling, one sample was selected from each sheet. Additional samples were then selected from other sheets with the goal of avoiding overrepresentation of a single sheet with a disproportionately large number of samples or creating pools with multiple samples from the same sheet. And (2) after identifying all samples available for a session, a systematic sampling approach was used to allocate them into the pools. This involved arranging the samples by sheet and then the replicate number. The first sample, along with the 11th and 21st, was placed in pool 1, and so forth. The samples within each group were then shuffled to prevent pools composed of samples from identical sheets. Under-roost pools were assigned as positive or negative without rescreening of component samples (Supplementary Fig. 2).

RNA extraction and coronavirus RT-PCR screening

Fecal samples were combined with Zymo 1X DNA/RNA shield (0.06 g feces:0.6 ml shield, ~1:10) to inactivate the sample, vortexed for 3 min and centrifuged 15,000 × g/1 min to clarify, then stored at −20 °C prior to extraction. RNA was extracted using the MagMAX^TM CORE Pathogen kit (Thermo Fisher) and MagMAX automated magnetic particle processor (MagMAX_Core_200 script). Briefly, 200 µl of clarified supernatant was added to 450 µl lysis buffer in a 1.5 ml centrifuge tube, vortexed for 3 min, centrifuged 15,000 × g/2 min, then 500 µl of lysate added to 30 µl magnetic bead/protein kinase mix in a deep well plate, sealed and mixed on a plate shaker at high speed for 2 min. To this, 350 µl binding solution was added prior to proceeding with wash and elution steps according to the kit manufacturer’s instructions. RNA was then converted to cDNA with the Invitrogen SuperScript IV VILO Master Mix with ezDNase (Thermo Fisher) as per the manufacturer’s instructions. We used a semi-nested RT-PCR assay to target the betacoronavirus ORF1b region with the AllTaq PCR Core kit (QIAGEN). The primers for the first round were: VM3007 5’-GGTTGGGAYTAYCCHAARTGYGA-3’; VM3008 5’-CCRTCATCAGAHARWATCATC-3’; VM3009 5’-CCRTCATCACTHARWATCATC-3’. The primers for the second round were: VM3008; VM3009; VM1818 5’-GAYTAYCCHAARTGTGAYAGAGC-3’; VM3010 5’-GAYTAYCCHAARTGTGAYMGHGC-3′⁶⁸. For each PCR, 8 pmol of primer was added per 20 µl reaction with 25 and 40 cycles used for the first and second round, respectively. Known positive samples (Human alpha-CoV NL63 & beta-CoV HKU1) and no template controls were used to confirm assay performance.

DNA sequencing and assembly

The expected amplicon, measuring 434 base pairs, was verified by gel electrophoresis before purification using AMPure XP (Beckman Coulter). In addition to individually positive reactions and to potentially capture any lower-yield samples without visual band, we also combined PCR products in equal volumes across the rows of plates for bead purification. Next, the purified DNA was sequenced on the Illumina iSeq 100 platform using the Nextera XT DNA library preparation kit with unique dual indexes generating at least 100,000 paired reads per sample (2$\times$150nt). For data analysis, sequence reads were trimmed of adapter and low-quality bases using BBDuk (available from https://sourceforge.net/projects/bbmap/) before de novo assembly with MEGAHIT⁶⁹. To identify coronavirus sequences and exclude any non-specific hits, contigs were aligned against the NCBI nucleotide and protein databases using BLAST⁷⁰.

Phylogenetic analysis and identification of viral clades

All coronavirus sequences were first aligned against the Rousettus bat coronavirus HKU9 (GenBank accession NC_009021) using MAFFT⁷¹ to confirm the presence of the expected region and coverage of each amplicon. Any sequences with less than 250nt of coverage across the target region or less than 10% of relative abundance within each library were removed. The final curated dataset was codon aligned against known reference strains available from NCBI GenBank using MAFFT focusing on betacoronaviruses that were exclusively identified in this study. To confirm clusters and identify viral clades, a maximum likelihood phylogeny was prepared using PhyML⁷² with the GTR + G substitution model. Branch support values for the ML tree were estimated using the Shimodaira–Hasegawa approximate likelihood-based test with virus clades (clades) defined as monophyletic groups with SH-like support values >0.8 or genetic distance >10% from other sequences across the target region.

Whole genome sequencing

To obtain genome sequences, the previously extracted RNA of fecal samples of select representative strains were processed using a previously published metatranscriptomic approach⁷³. Samples were selected with priority from: higher viral load samples (i.e. stronger bands on gels); different sampling sessions; individual animals; and no co-detections. The extracts were first treated with ezDNase Enzyme (Thermo Fisher) to remove residual genomic DNA, and then treated with QIAseq FastSelect –5S/ 16S/ 23S and -rRNA HMR (Qiagen) to deplete ribosomal RNA. Next, SuperScript IV VILO Master Mix (Thermo Fisher) was used for first strand cDNA synthesis, followed by addition of Sequenase Version 2.0 DNA Polymerase (Thermo Fisher) for second strand cDNA synthesis. The double stranded cDNA was purified with Mag-Bind TotalPure NGS (Omega Bio-tek) before Nextera XT DNA Library Preparation Kit with Unique Dual Indexes (Illumina) and paired end (2$\times$150nt) sequencing on Illumina NovaSeq (targeting >20 M reads per library). Similar to the amplicon assay, sequence reads were first trimmed using BBDuk before de novo assembly with MEGAHIT and coronavirus sequence identification by nucleotide and protein blast searches against the NCBI nt/nr database. Due to the relatively high rates of co-infection/co-detection in this cohort, careful inspection of the final assembled reads was made to ensure no chimeric assemblies were obtained, with this determined based on the uniformity of coverage and lack of heterogeneity amongst the reads in the assembly set.

The final sequences were annotated using the nobecovirus strain AMB130 using Geneious Prime 2023.1 before alignment against all available reference genomes on NCBI Genbank with MAFFT. Recombination in the sequence alignments was assessed using the Recombination Detections Program v4 with default settings⁷⁴ and phylogenetic analysis with PhyML with the GTR + G substitution model and SH-like tests for node support. Sequences were classified according to the latest ICTV criteria (https://ictv.global/report/chapter/coronaviridae/coronaviridae). Briefly, an initial BLAST alignment against the Conserved Domain Database⁷⁵ of the ORF1ab region was used to identify and extract the coding regions for the 3CLpro, NiRAN, RdRP, ZBD and HEL1 domains. The translated sequences were then concatenated and aligned using MAFFT before phylogenetic analysis using PhyML with the LG + G amino acid substitution model. Distance matrices of the Percentage of Unchanged Differences (PUD) and Pairwise Patristic Distance (PPD) between viruses were calculated in Geneious Prime. Representative genomes of each coronavirus clade were submitted to NCBI GenBank (Supplementary Table 3).

Cytochrome b gene sequencing and host species confirmation

To confirm the species contributing to under-roost pools, we screened RNA/DNA extracts using PCR targeting the cytochrome b mitochrondrial gene⁷⁶. Here, we chose to screen the entire column of extraction plates containing any CoV-positive samples that included all CoV positive wells (n = 72) and neighbouring non-CoV positive wells (n = 216). PCR was performed from the fecal extracts without cDNA synthesis using the Invitrogen Platinum SuperFi II PCR Master Mix (Thermo Fisher) with the primers L14724 (5’-CGAAGCTTGATATGAAAAACCATCGTTG-3’) and H15149 (5’-AAACTGCAGCCCCTCAGAATGATATTTGTCCTCA-3’) as per manufacturer’s instructions. All amplicons of approximately the expected size (~425 bp) were sequenced using the Rapid Barcoding Kit 96 and R9.4.1 flowcells on a MinION Mk1C with rapid basecalling (Oxford Nanopore Technologies) targeting ~10,000 reads per samples. Following this, de-multiplexed sequence reads were filtered by length (between 250 and 450 bp) before mapping against a database of human and flying fox reference sequencing in Geneious Primer using Minimap2⁷⁷ with a minimum mapping rate of 25 reads. Consensus sequences were generated based on quality scores in Geneious Prime, which first excludes bases with a Phred quality score below 60% of the maximum possible score, then determines a majority consensus from the most frequent base among the remaining high-quality reads. Unmapped reads were de novo assembled in Geneious Prime and compared to NCBI GenBank using BLAST to determine any other species present. We then compared host data obtained through cytochrome b sequencing to the species recorded as roosting over the sheets in field data. Finally, host-species identification from the RNA sequencing data used for CoV WGS was also performed by mapping the filtered reads against a comprehensive database of cytochrome c oxidase I (COXI) gene sequences (available from https://github.com/bachob5/MetaCOXI).

Statistical analyses

To estimate prevalence dynamics jointly from pooled under-roost samples and individual catching data we developed a Bayesian data integration approach. This statistical method enables the calculation of time-varying population-level prevalence estimates derived from both sample types, which offers a snapshot of the underlying biologically continuous process of infection dynamics within the populations. Moreover, the individual samples and their associated metadata are utilized to discern individual bat-level characteristics related to variations in prevalence and, in turn, environmental and ecological drivers of viral shedding.

Estimating population-level prevalence, or the proportion of individuals actively shedding virus from pooled samples, can be challenging. A positive detection in a pool of samples indicates that at least one individual in the pool tested positive, but on its own is not directly indicative of population prevalence. In addition to individual prevalence, the proportion of pools testing positive also depends on the pool size. Consequently, to obtain reliable estimates of the true population-level prevalence, the pooled prevalence estimates need to be adjusted to account for the pooling process. This adjustment requires transforming the probability of a positive pool to the individual scale.

Prior to collecting data for this study, we examined the statistical properties of integrating pooled and individual samples³³. For a fixed budget for laboratory testing, we found that allocating laboratory tests toward pools yielded more accurate and more precise estimates of prevalence (with tighter credible intervals) than if those resources were used to screen individual samples. However, the integration of both data types allowed us to leverage complementary strengths: individual-level data provides crucial demographic information, while combining individual samples with the pooled data enables precise estimates of true prevalence with broader temporal and spatial coverage with reduced cost and effort.

Moreover, the observed prevalence at any given sampling session is influenced by the epidemic momentum within the bat population, which is temporally correlated but unknown at the time of sampling. Thus, our model framework includes a transformation of pooled probabilities, to account for pooling, and a temporally explicit Gaussian Process (GP) to estimate population prevalence over time. By using a GP, our modeling framework can be formulated in a manner analogous to a logistic regression model—but additionally as one that accounts for both temporal correlation and the pooled structure in the data. Specifically, this model framework can be expressed as

$${y}_{t.i} \sim {{\rm{Bernoulli}}}({\pi }_{t,i})$$

(1)

$${\pi }_{t.i}=1-{(1-{p}_{t,i})}^{{m}_{t,i}}$$

(2)

$${{\rm{logit}}}({p}_{t,i})={X}_{t,i}{{\rm{\beta }}}+{\omega }_{t\,}$$

(3)

where ${y}_{t,i}$ is a binary variable denoting whether the ${i}^{{th}}$ pool at time $t$ tests positive, ${\pi }_{t,i}$ is the probability of pool $i$ testing positive at time t, which is a function of the individual prevalence ${p}_{t,i}$ and the pool size ${m}_{t,i}$. The individual level prevalence is a function of covariates $({X}_{t,i})$, in the same fashion as with logistic regression with β, and a time-varying term $({\omega }_{t})$ that comes from the Gaussian process. For all Bayesian analyses we used weakly informative priors and ran four Markov chain Monte Carlo chains for a minimum of 2000 iterations.

Model selection approach

Using the model outlined in Eqs. 1–3, we assessed how the dynamics of viral shedding differed across viral clades, time of the year, and geographic locations. We evaluated the support for competing hypotheses representing biological processes by formulating four distinct model frameworks: (1) a unified model that combines all viral clades and locations—representing the hypothesis that all coronaviruses exhibit the same temporal dynamic across all sites; (2) a location-specific model that pools all viral clades together – representing the hypothesis that all coronaviruses exhibit identical temporal dynamics, but with site-specific differences; (3) a clade-specific model that combines all locations together – representing the hypothesis that every viral clade exhibits a unique temporal dynamic, but consistent across all sites; and (4) a clade-by-location model that applies a unique curve for each combination—representing the hypothesis that every viral clade exhibits a distinct temporal dynamic that varies by site. This fourth framework resulted in a set of models including 30 combinations of viral clades and locations (see Supplementary information for details). For every model formulation, we implemented a cross-validation approach to compare their predictive capacities and assess their performance and thus support for the underlying hypotheses. We conducted a comparative analysis of the models by using “leave-one-out” cross-validation and evaluating the expected log pointwise predictive densities (ELPD)⁷⁸, or equivalently LOOIC (leave-one-out information criteria).

Individual level dynamics of infection

While the pooled data collected from feces under the roosts cannot be associated with individual bats, data collected from individual bats during catching sessions can provide additional information on how age and species impact prevalence dynamics. From the individual samples the age and species of the bats were recorded. Using individual samples only we fit a dynamic binary regression model using the model specified in Eqs. 1–3; however, with pools of size one this reduces to the standard logistic regression framework coupled with a time-varying term from the GP. With this model framework, differences in prevalence dynamics can be explored across age, sex, and species for each coronavirus clade and evaluated using LOOIC.

Association across viral clades

To assess potential co-infection, or statistical associations, between viral clades and bat age classes, we performed a chi-squared test of association. For this test we used only the results from individual bats. We aimed to understand whether an individual flying fox is more likely to be infected with two coronavirus clades if it was already infected with one clade. This analysis can indicate potential co-infection associations. Following this initial step, we estimated the conditional probabilities of individual bats testing positive for a specific viral clade given that they were positive for another one. The combination of these two approaches allowed us to discuss the potential for co-infections and interactions between pairs of viral clades.