|marine sponges||[Roche 454, 15 samples, original study: doi:10.1007/s12526-017-0697-0, QIIME1 open reference pipeline: doi:10.1016/j.resmic.2018.03.004]|
|goose gut microbiome||[Illumina MiSeq, 9 samples, original study: doi:10.1016/j.micres.2015.10.003, QIIME1 open reference pipeline: doi:10.1016/j.resmic.2018.03.004 ]|
|osteomyelitis in jaw bones||[Roche 454, 9 samples, oroginal study: doi:10.1111/1469-0691.12400, QIIME1 open reference pipeline: doi:10.1016/j.resmic.2018.03.004 ]|
|sponges in Baikal lake||[Roche 454, 23 samples, oroginal study: doi:10.1007/s00248-017-1097-5, QIIME1 closed reference pipeline: pdf, doi:10.31951/2658-3518-2018-A-2-122 ]|
|counts of marine molluscs||[original data: OBIS site, custom pipeline: doi:10.1016/j.resmic.2018.03.004 ]|
The online services availavle in the site are developed for the extensive analysis of tables with estimates of bacteria abundance levels in environmental samples.Most the microbial ecology-specific functionality is implemented by the scikit-bio Python package, together with the other Python packages intended for big data analysis. The interactive visualisation tools are implemented by the D3.js software library, therefore, the software project is named D3b. The source codes are available at github as an installable package.
An installation process may require skills of system administrator, in a case of incompatibilities in the python3 environment and the software dependencies.
Suggested installation via conda:
- Create virtual environment:
conda create -n d3b python=3.6 anaconda
source activate d3b
- Download the package and run setup.py:
git clone https://github.com/sferanchuk/d3b_charts.git
pip install .
- Download phantomjs package, extract the executable file from the archive and move it to 'phantomjs' subdirectory. Suggested URLs to obtain phantomjs:
This step is optional; phantomjs is used to render graphics on a server side and is required to export PNG and SVG images. Other packages external to python environment, like d3.js, are already included into the distribution.
- Run django server (runserver.sh) in console. Point the browser to URL 'http://localhost:8000'. Modify ALLOWED_HOSTS variable in d3b/settings.py to resolve the problems with a remote access to pages.
Use links to sample datasets on a start page to explore the abilities of d3b system. On each link, the menu provides the tools to analyse a dataset in various representations.
For each tool, the result becomes available in the browser after pressing 'Submit' button. The input parameters should be specified before submitting the task. For the sample datasets, the default settings may be often used on first steps to become familiar with the design and conventions of the d3b system.
By an intention, a dataset is composed from abundances of annotated OTUs in microbial comminities. Four of thw sample datasets are estimates amplicon sequencing followed by QIIME1 pipeline, and one dataset represent a selection from a custom ecological community.
Anyway, all the datasets are represented as matrices of abundance counts. To allow the specification of traits and taxonomic groups, the descriptors for rows and columns of the matrix could be modified and saved. The tools which allow to specify both kinds of descriptors (Subsets of samples / Taxonomic groups) also available in the sidebar menu for any dataset.
To apply the tools from d3b system to a custom data, a dataset in a form of a matrix should be loaded to the server side of the system. After that, a permanent link to the processed dataset becomes available. It should be kept and could be accessed at any time when a whole system is up-running.
The two formats of input matrix are accepted. First is tab-delimited format (TSV) and second is BIOM format. In both formats, abundance counts for OTU are combined with taxonomic annotation of the OTU and identifiers of samples.
A tab-delimited file exported from spreadsheet using packages like MS Excel or LibreOfiice Calc is suitable for the system, if the conventions for a content of rows and columns are followed. And the processed dataset could be exported back to tab-delimited format, as it is provided in the tool 'matrix of abundances'.
The output obtained from that tool applied to one of the sample datasets can provide an example of a correctly composed tab-delimited file. In that file, each line begins from a complete taxonomic annotation for the OTU, in a fixed number of cells, followed by absolute counts of abundances for this OTU in all of the samples. In the first line, cells wuth numbers from 1 to 2, 3, etc., used to specify fixed levels of hierarchy in taxonomic system, and they are followed by cells with identifiers of the samples.
The data in BIOM format ("biological observation matrix") is also suitable for an import into d3b. This format is often used to keep the intermediate output results in software systems for microbiologists. But due to flexibility of BIOM format specification, the import from a paticilar BIOM-formatted file may fail or may give wrong results. In that case, the python script 'biom2emap.py' from the source codes of d3b system could be used as a template for a user-defined script, to convert any biom-formatted dataset to a correct tab-delimited file.
The tabular presentations of the data include a table of abundances, just as it is loaded into the system, and with various options for sorting the rows, merging the columns, reducing the level of taxonomic hierarchy, and others. Namely, most of the input forms include the choice of level of taxonomic hierarchy, the possibility to restrict the analysis or data presentation to certain taxonomic groups, and the possibility to merge samples into pre-defined groups.
The tabular presentations also include:
1) the values of alpha-diversity calculated using several of the most informative estimators,
2) the significance of differences for alpha-diversity values between several groups, calculated following the methods described in ,
3) the significance of differences between groups of samples calculated using the distances between samples, with several alternative measures of distance.
The graphical presentations, implemented with the use of d3.js library, currenly include following charts:
1) A bubble chart and heatmap, to represent absolute/relative abundances.
2) 2D scatter charts, to represent the results of several data ordination methods, such as principal component analysis (PCA) or multi-dimensional scaling (MDS). The choice of several measures is available here to calculate distances between the samples.
3) A dendrogram (tree) to represent the degree of proximity between samples.
4) A Venn diagram to represent the unique and shared taxonomic units for the samples, implemented with the use of the jVenn  plugin and venn.js library.
5) Two kind of diagrams to present distributions which describe a sample or a group of samples: a rank-abundance chart (Whittaker plot) to represent the distribution of relative species abundance, and a rarefaction curve to estimate the effect of insufficient coverage and sample size.
6) A ternary chart, to represent the relative abundances of bacterial phylotypes for the three samples or groups of samples.
7) Volcano chart and mean-distance plot, to represent distribution of abundances and differentiation between traits.
8) Two combined 2D charts, to represent the results of PCA decomposition applied directly to a non-square matrix of abundances. One chart is for the samples in the survey and the second adjacent chart is for bacterial species in the rows of the submitted matrix.