Show simple item record

dc.contributor.advisorKrumholz, Lee
dc.contributor.authorShi, Zhou
dc.date.accessioned2017-05-12T16:08:05Z
dc.date.available2017-05-12T16:08:05Z
dc.date.issued2017-05-12
dc.identifier.urihttps://hdl.handle.net/11244/50834
dc.description.abstractMicroorganisms are ubiquitous on earth, and they interact each other to form communities, which play unique and integral roles in various biochemical processes and functions that are of critical importance in global biogeochemical cycling, human health, energy, climate change, environmental remediation, engineering, industry, and agriculture. However, identification, characterization, and quantification of microbial communities are still limited, due to the extreme diversity and yet-uncultivable nature of a vast majority of microorganisms, and our understanding of microbial communities is further hindered by complex organization and dynamics of interactions among microorganisms. In this work, we developed high-throughput functional gene arrays (FGAs), bioinformatics tools and computational methods for analysis of microbial metagenomes and interactomes to address some of the limitations, whose powerfulness were demonstrated in application studies. In the beginning of this work, we developed a high-throughput FGA for characterizing a specific group of microorganisms - plant growth promoting microorganisms (PGPMs). PGPMs can promote plant growth and suppress disease directly and/or indirectly by enhancing soil fertility and plant resistance to biotic and abiotic stresses, thus may contribute to the success of invasive plants over native species. However, PGPMs are highly diverse in terms of both species richness and plant promoting mechanisms. Therefore, it is difficult to study the PGPMs changes along with environment shifts, and their subsequent impacts on plant performance and ecosystem functioning. The developed high-throughput FGA, termed Plant Associated Beneficial Microorganism Chip (PABMC), focused on functional genes from PGPMs that are beneficial to plants. A total of 3,870 probes covering 34 functional gene families were designed in PABMC, including six categories: plant growth-promoting hormones, plant pathogen resistance, antibiotics, antioxidants, drought tolerance, and secondary benefits (e.g. elicitor of plant immune defense response). Computational analysis showed that ~98% of the probes were highly specific at the species or strain level.  The PABMC was also applied to investigate PGPMs’ responses to Ageratina adenophora (A. adenophora) invasion in a natural grassland, and showed A. adenophora invasion increased the alpha diversity and shifted the composition of PGPM communities compared with what from the native site. The PABMC uncovered changes in abundance of a key gene related to drought tolerance, pathogen resistance, antibiotic biosynthesis, and antioxidant biosynthesis, due to A. adenophora invasion. These changes may promote the survival and growth of A. adenophora over native species in the site we studied. Next, we developed GeoChip 5.0, and advanced the FGA based metagenomics technology to a new level of comprehensiveness, for analyzing complex microbial communities. GeoChip 5.0 was based on Agilent platform, with two formats. The smaller format contained 60K probes (GeoChip 5.0S), majorly covering probes from carbon (C), nitrogen (N), sulphur (S), and phosphorus (P) cyclings and energy metabolism probes. The larger format (GeoChip 5.0M) contained all probes in GeoChip 5.0S and expanded to antibiotic resistance, metal resistance/reduction, organic contaminant remediation, stress responses, pathogenesis, soil beneficial microbes, soil pathogens, and virulence. GeoChip 5.0M contains 161,961 probes covering approximately 370,000 representative coding sequences from 1,447 functional gene families. These genes were derived from functionally divergent broad taxonomic groups, including bacteria (2,721 genera), archaea (101 genera), fungi (297 genera), protists (219), and viruses (167 genera, mainly phages). Both computational and experimental evaluation indicated that all designed probes were highly specific to their corresponding targets. Sensitivity tests revealed that as little as 0.05 ng of pure culture DNAs was detectable within 1 µg of complex soil community DNA as background, suggesting that the Agilent platform-based GeoChip is extremely sensitive. Additionally, very strong quantitative linear relationships were obtained between signal intensity and pure genomic DNAs or soil DNAs. Application of the designed FGAs to a contaminated groundwater with very low biomass indicated that environmental contaminants (majorly, heavy metals) had significant impacts on the biodiversity of microbial communities. Since next generation sequencing (NGS) technology has revolutionized metagenomics and microbial ecology studies, immense improvements made in sequencing speed, throughput, and cost. However, NGS technology also produces a formidable number of raw reads which poses computational challenges, especially for analyzing deep shotgun metagenomics sequencing data. To tackle some of the challenges, we present an Ecological Function oriented Metagenomic Analysis Pipeline (EcoFun-MAP), to facilitate analysis of shotgun metagenomic sequencing data in microbial ecology studies. The EcoFun-MAP consists of reference databases of different data structures, with a selective coverage of functional genes that are important to ecological functions. Meanwhile, multiple predefined data analysis workflows were built on the databases with most updated bioinformatics tools. Furthermore, the EcoFun-MAP was implemented and deployed on High-Performance Computing (HPC) infrastructure with high accessible and easy-to-use interfaces. In our evaluation, the EcoFun-MAP was found to be fast (multi-million reads/min.) and highly scalable, and capable of addressing disparate needs for accuracy and precision. In addition, we showcase the effectiveness of the EcoFun-MAP by applying it to reveal differences among metagenomes from underground water samples, and provide insights to link the metagenomic differences with distinctive levels of contaminants. To extend an emerging dimension of microbial community analysis, that is the analysis of complex microbial interactions, we provided a generalized Brody distribution (GBD) based Random Matrix Theory approach (GBD-RMT approach) for inferring microbial data association networks. The GBD-RMT approach addresses several limitations of a previous Random Matrix Theory (RMT)-based approach in the capability of detection and interpretability of detected thresholds. The GBD-RMT approach is capable of quantitatively characterizing the dynamics of Nearest Neighboring Spacing Distribution (NNSD) of eigenvalues against candidate thresholds, and detecting both the critical transitions and thresholds in NNSD dynamics using trend analysis. In our evaluation, the GBD-RMT approach successfully detected the critical thresholds in all of the numerically simulated and real datasets, including those for which the previous method failed. It also had higher detection resolution, and gained higher confidence and interpretability in detected critical thresholds. Meanwhile, the GBD-RMT approach integrated improvements for detecting more types of data association and reducing compositional data bias. In addition, the GBD-RMT approach uncovered a remarkable overlap between the critical transitions and the plateaus of scale-freeness from the inferred networks, and the overlap is showed to be statistically significant and universal in complex biological systems in our analysis. All the developed technologies and computational methods in this work provided powerful and up-to-date means for analyzing complex metagenomes, and should be ready to serve for improving our understanding of microbial communities in the studies of microbial ecology and global change biology.en_US
dc.languageen_USen_US
dc.subjectComputational Biologyen_US
dc.subjectBioinformaticsen_US
dc.subjectBiotechnologyen_US
dc.subjectMetagenomics and Microbial Ecologyen_US
dc.titleDEVELOPMENT OF HIGH-THROUGHPUT EXPERIMENTAL AND COMPUTATIONAL TECHNOLOGIES FOR ANALYZING MICROBIAL FUNCTIONS AND INTERACTIONS IN ENVIRONMENTAL METAGENOMESen_US
dc.contributor.committeeMemberZhu, Meijun
dc.contributor.committeeMemberLuo, Yiqi
dc.contributor.committeeMemberRadhakrishnan, Sridhar
dc.contributor.committeeMemberZhou, Jizhong
dc.date.manuscript2017-05-11
dc.thesis.degreePh.D.en_US
ou.groupCollege of Arts and Sciences::Department of Microbiology and Plant Biologyen_US
shareok.nativefileaccessrestricteden_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record