Import biom and sample data

A More Complicated Import Example

One of the example datasets included in the phyloseq package is derived from the study first describing human microbiome “Enterotypes”, and that dataset is called simply enterotype. It will be called in later examples using the data command.

A more recent study investigating human microbiome “Enterotypes” is titled Linking Long-Term Dietary Patterns with Gut Microbial Enterotypes by Wu et al., Science, 334 (6052), 105–108. One of the three corresponding authors has the last name “Bushman”, which also happens to be the title of the QIIME-processed version of this dataset at the microbio.me/qiime database.

We will import this data to illustrate a more complicated situation in which we need to import 3 different data components from two different file types (one is biom-format, the other is sample data contained in a tab-delimited “Mapping File” format produced by QIIME.

For convenience and stability, these “Bushman” data files were saved locally and imported from their local system location. An even more complicated “direct import” example is provided in the last section (“Extra Example”) below, but produces the same result and is not run by the embedded code.

library(phyloseq)

Import Bushman data, already downloaded.

uzdir <- "/Volumes/media/Research/study_1011_split_library_seqs_and_mapping/"
biom_file <- paste(uzdir, "study_1011_closed_reference_otu_table.biom", sep = "")
map_file <- paste(uzdir, "study_1011_mapping_file.txt", sep = "")
# Now import the .biom-formatted otu_table-tax_table file.
biom_otu_tax <- import_biom(biom_file, "greengenes")
# Add sample data to the dataset using merge
bmsd <- import_qiime_sample_data(map_file)
class(bmsd)
## [1] "sample_data"
## attr(,"package")
## [1] "phyloseq"
dim(bmsd)
## [1] 102 225
biom_otu_tax
## phyloseq-class experiment-level object
## OTU Table:          [1873 taxa and 100 samples]
##                      taxa are rows
## Taxonomy Table:     [1873 taxa by 7 taxonomic ranks]:

Merging datasets or components

We need to merge these two separate Bushman dataset objects into one “phyloseq” object. Presently, the two data objects contain the otu_table, tax_table, and sample_data components, respectively. If we had three objects that were all components (think single tables, or a tree), then we would use the constructor function, phyloseq. However, because the .biom file contained two tables (including an otu_table), the import_biom function returned a valid "phyloseq-class" instance instead that contained both components. Whenever you need to add or merge data componentes from one (or more) phyloseq-class objects, the merging function, merge_phyloseq, is recommended, rather than the constructor (phyloseq).

Bushman <- merge_phyloseq(biom_otu_tax, bmsd)

Extra Example: Direct ftp Download, Unzip, and Import

The .biom and sample data files are also provided online (ftp), and a useful way to download and import into phyloseq directly from the ftp address in the following example code. This is an example in which we download a zip file with both biom- and qiime-formatted data, unzip it in a temporary directory from with in R, import the relavant files using phyloseq importers, and then delete the temporary files. This code should be platform independent, but occasionally there are finicky Windows issues that arise.

(Note: this is not actually run in this demo. Would be redundant, and occasionally Windows issues might crash it, based on experience.)

zipftp <- "ftp://thebeast.colorado.edu/pub/QIIME_DB_Public_Studies/study_1011_split_library_seqs_and_mapping.zip"
# First create a temporary directory in which to store the unpacked
# file(s) from the .zip
tmpdir <- tempdir()
# Second create a temp file where you will put the .zip-file itself
temp <- tempfile()
# Now download the file and unzip to tmpdir directory
download.file(zipftp, temp)
unzip(temp, exdir = tmpdir)
# Define the biom file-path
biom_file <- file.path(tmpdir, list.files(tmpdir, pattern = ".biom"))
# Define the mapping file-path
map_file <- file.path(tmpdir, list.files(tmpdir, pattern = "mapping"))
# Now import the .biom-formatted otu_table/tax_table file.
biom_otu_tax <- import_biom(biom_file, "greengenes")
# Add sample data to the dataset using merge
bmsd <- import_qiime_sample_data(map_file)
# Remove the temperorary file and directory where you unpacked the zip
# files
unlink(temp)
unlink(tmpdir)