phyloseq: Explore microbiome profiles using R

The analysis of microbial communities brings many challenges: the integration of many different types of data with methods from ecology, genetics, phylogenetics, network analysis, visualization and testing. The data itself may originate from widely different sources, such as the microbiomes of humans, soils, surface and ocean waters, wastewater treatment plants, industrial facilities, and so on; and as a result, these varied sample types may have very different forms and scales of related data that is extremely dependent upon the experiment and its question(s). The phyloseq package is a tool to import, store, analyze, and graphically display complex phylogenetic sequencing data that has already been clustered into Operational Taxonomic Units (OTUs), especially when there is associated sample data, phylogenetic tree, and/or taxonomic assignment of the OTUs. This package leverages many of the tools available in R for ecology and phylogenetic analysis (vegan, ade4, ape, picante), while also using advanced/flexible graphic systems (ggplot2) to easily produce publication-quality graphics of complex phylogenetic data. phyloseq uses a specialized system of S4 classes to store all related phylogenetic sequencing data as single experiment-level object, making it easier to share data and reproduce analyses. In general, phyloseq seeks to facilitate the use of R for efficient interactive and reproducible analysis of OTU-clustered high-throughput phylogenetic sequencing data.

More concretely, phyloseq provides:

Import abundance and related data from popular Denoising / OTU-clustering pipelines: (DADA2, UPARSE, QIIME, mothur, BIOM, PyroTagger, RDP, etc.)
Convenience analysis wrappers for common analysis tasks
44 supported distance methods (UniFrac, Jensen-Shannon, etc)
Ordination –> many supported methods, including constrained methods
Microbiome plot functions using ggplot2 for powerful, flexible exploratory analysi
Modular, customizable preprocessing functions supporting fully reproducible work.
Functions for merging data based on OTU/sample variables, and for supporting manually-imported data.
Native R/C, parallelized implementation of UniFrac distance calculations.
Multiple testing methods specific to high-throughput amplicon sequencing data.
Examples for analysis and graphics using real published data.

The phyloseq package is actively and openly developed on GitHub:

https://github.com/joey711/phyloseq

I make lots of effort to cite/attribute author contributions within official package documentation, citations, and anywhere else it is appropriate. Please feel free to fork and contribute!

Thanks to Anjile An, MPH with the logo (twitter: @anjile_an)!