We built FUNGuild to hopefully be a broadly usable tool that can help people new and old to mycology more easily see ecological patterns in their datasets. Please read Nguyen et al. 2016 (Fungal Ecology) to see the larger description of the database and utility of assigning guilds. One thing that I encourage people to do is to not just blindly accept the FUNGuild output for unassigned sequences (because unfortunately that is very likely a largest chunk of the output due our limited knowledge of the ecology of many fungal groups). Below is an excerpt from our most recent paper using NGS methods where we tried to dig deeper. Doing so allowed us to go from about 1.1 million sequences to almost 2 million sequences included in the final dataset.
“EM fungal OTUs were separated from those belonging to other guilds using the online tool FUNGuild (Nguyen et al. 2016). For the final EM OTU × sample matrix, we included all OTUs that FUNGuild assigned as having a ‘highly probable’ and ‘probable’ likelihood of being an EM fungal taxon. For all OTUs that had a ‘possible’ EM designation, we checked the species-level matches for inclusion in the final dataset and removed two Entoloma OTUs, one Ceratobasidium OTU, and one Lyophyllum OTU that matched more closely to non-EM than EM fungal sequences. Among the OTUs that were unassigned in FUNGuild, we determined that some were likely EM despite not being assigned to that guild (due to missing family- and/or genus-level taxonomy). We therefore checked the individual UNITE database species hypotheses (SH in Kõljalg et al. 2013) for each unassigned OTU. Using the criteria: 1) >90% sequence match to top BLAST hist, 2) belonged to a lineage designated EcM in UNITE, and 3) matched another sequence identified as ectomycorrhizal from a tree host within 3% sequence similarity, we reassigned 72 of the 335 unassigned OTUs as EM fungi. The final non-transformed EM fungal OTU × sample matrix, including taxonomic identification for each OTU, is provided in Table S1.”
If it is helpful, here is the upstream part of that methods section:
“To identify the EM fungi present on A. balsamea and B. papyrifera sapling roots, the ITS1 rDNA subunit was PCR amplified using a barcoded fungal-specific ITS1F-ITS2 primer set and cycling conditions detailed in Smith and Peay (2014). Amplified products were magnetically cleaned using the Agencourt AMPure XP kit (Beckman Coulter, Brea, CA, USA) and quantified using a Qubit dsDNA HS Fluorometer (Life Technologies, Carlsbad, CA, USA). Each of the A. balsamea and B. papyrifera root samples were pooled into a single library and sequenced at the University of Minnesota Genomics Center using the 250 bp paired-end MiSeq Illumina platform. Raw sequences and associated metadata were deposited in the NCBI Short Read Archive (Accession #: SRP080680).
Using both the QIIME and MOTHUR packages (QIIME v 1.8 (Caporaso et al. 2010) and MOTHUR v 1.33.3 (Schloss et al. 2009)), we demultiplexed and quality filtered the raw sequences (i.e. culled sequences with Phred scores < 20, less than 75 bp long, with any ambiguous bases, or a homopolymer run of >8 bp). The reverse reads in this MiSeq run were found to be of relatively poor quality, so we used only the forward reads for all analyses. After quality filtering, we employed a multi-step operational taxonomic unit (OTU) picking strategy, first clustering all sequences with USEARCH at a 95% sequence similarity followed by reclustering with UCLUST at 95%. We found that this strategy best recovered the mock community that we included as a positive control (see Nguyen et al. 2015 for details) and we therefore applied it to all the experimental samples. The UNITE database (v6, Kõljalg et al. 2013) was used for chimera checking, OTU clustering, and assigning taxonomy. Since we have previously found that OTUs with length/query length ≤ 0.845 often have ambiguous taxonomies (i.e. they may not be fungal) (Nguyen et al. 2015), we excluded any OTUs below that threshold. For any fungal OTUs present in the mock community sample but not part of its original composition (likely due to low-level tag switching (Carlsen et al. 2012)), we subtracted the number of sequence reads of that OTU in the mock sample from the number of sequence reads of that OTU in each of the experimental samples (as described in Nguyen et al. 2015). As an additional quality control step, we removed all sequence reads <10 per sample for all remaining OTUs, based on the combined recommendations of Lindahl et al. (2013) and Oliver et al. (2015).