Package | Description |
---|---|
biolockj |
The root biolockj package contains core classes used by all BioLockJ pipelines.
|
biolockj.module | |
biolockj.module.classifier |
This package contains Classifier specific
BioModule s that build bash scripts to generate
taxonomy reports from the sequences files. |
biolockj.module.classifier.r16s |
This package contains RDP and QIIME classifier modules that assign taxonomy to 16s sequences.
Output file formats are classifier specific so classifier requires a unique post-requisite ParserModule . |
biolockj.module.classifier.wgs |
This package contains Kraken, Kraken2, Humann2, MetaPhlAn classifier modules that assign taxonomy to WGS data.
As with all BioLockJ classifiers, these modules generate raw classifier output files from the sequence data. Include the corresponding ParserModule in the Config file to
generate standardized OTU abundance tables. |
biolockj.module.implicit |
The modules in this package are implicitly added to pipelines as needed.
These modules cannot be directly added to any pipeline unless overridden via pipeline.disableImplicitModules=Y. |
biolockj.module.implicit.parser |
This package contains Parser BioModules in the r16s and WGS sub-packages that are paired with a
ClassifierModule via
BioModule.getPostRequisiteModules() to run immediately after the
classifier. |
biolockj.module.implicit.parser.r16s |
This package contains Parser
BioModule s that convert the 16S taxonomy reports generated by
16S classifiers (such as RDP and QIIME) into standardized OTU abundance tables. |
biolockj.module.implicit.parser.wgs |
This package contains Parser
BioModule s that convert the WGS taxonomy reports generated by
WGS classifiers (such as Kraken, Kraken2 Metaphlan2, or Humann2) into standardized OTU abundance tables. |
biolockj.module.implicit.qiime |
This package contains
BioModule s that are implicitly added to QIIME pipeline as needed. |
biolockj.module.report |
This package contains
BioModule s that normalize OTU abundance tables output by Parser
modules, merges them with the metadata, and generates various reports and notifications. |
biolockj.module.report.humann2 | |
biolockj.module.report.otu | |
biolockj.module.report.r |
This package contains
BioModule s that build pipeline reports from the standard OTU abundance
tables by generating R scripts to produce the summary statistics and data visualizations output to PDF files. |
biolockj.module.report.taxa | |
biolockj.module.seq |
BioModule s used to prepare sequence files or update the metadata prior to classification. |
biolockj.util |
Static utilities centralize and organize reusable core methods.
|
Modifier and Type | Method and Description |
---|---|
protected static boolean |
Pipeline.poll(ScriptModule module)
The
getScriptDir() will contain one main script and one ore more worker
scripts.An empty file with appended to the script name is created when execution begins. If successful, an empty file with appended to the script name is created. Upon failure, an empty file with appended to the script name is created. Script status is polled each minute, determining status by counting indicator files. Log outputs the # of started, failed, and successful scripts (if any change).Log repeats the previous message every 10 minutes if no status change is detected. |
static void |
Processor.submit(ScriptModule module)
This method is called by script generating
ScriptModule s to update the script
file-permissions to ensure they are executable by the program. |
Modifier and Type | Interface and Description |
---|---|
interface |
JavaModule
Classes that implement this interface are pure Java modules.
|
interface |
SeqModule
Classes that implement this interface requires sequence files for input.
|
Modifier and Type | Class and Description |
---|---|
class |
JavaModuleImpl
Superclass for Java BioModules that will be called in separate instances of the application.
|
class |
ScriptModuleImpl
Superclass for Java BioModules that will be called in separate instances of the application.
|
class |
SeqModuleImpl
Superclass for SeqModules that take sequence files as input for pre-processing prior to classification.
|
Modifier and Type | Interface and Description |
---|---|
interface |
ClassifierModule
Classifier
BioModule s build one or more bash scripts to call the application on sequence
files. |
Modifier and Type | Class and Description |
---|---|
class |
ClassifierModuleImpl
This is the superclass for all WGS and 16S biolockj.module.classifier BioModules.
|
Modifier and Type | Class and Description |
---|---|
class |
QiimeClosedRefClassifier
This BioModule executes the QIIME script pick_closed_reference_otus.py on a FastA sequence files.
|
class |
QiimeDeNovoClassifier
This module runs the QIIME pick_de_novo_otus.py script on FastA sequence files in a single script so it is important
to allocate sufficient job resources if running in a clustered environment.
|
class |
QiimeOpenRefClassifier
This module runs the QIIME pick_open_reference_otus.py script on FastA sequence files in a single script so it is
important to allocate sufficient job resources if running in a clustered environment.
|
class |
RdpClassifier
This BioModule uses RDP to assign taxonomy to 16s sequences.
|
Modifier and Type | Class and Description |
---|---|
class |
Humann2Classifier
This BioModule runs biobakery humann2 program to generate the HMP Unified Metabolic Analysis Network
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). |
class |
Kraken2Classifier
This BioModule assigns taxonomy to WGS sequences and translates the results into mpa-format.
|
class |
KrakenClassifier
This BioModule assigns taxonomy to WGS sequences and translates the results into mpa-format.
|
class |
Metaphlan2Classifier
This BioModule builds the bash scripts used to execute metaphlan2.py to classify WGS sequences with MetaPhlAn2.
|
Modifier and Type | Class and Description |
---|---|
class |
Demultiplexer
This BioModule splits multiplexed data into a separate file or pair of files (for paired reads) for each sample.
|
class |
RegisterNumReads
This BioModule parses sequence file to count the number of reads per sample.
|
Modifier and Type | Interface and Description |
---|---|
interface |
ParserModule
This interface defines the required methods to parse ClassifierModule output.
|
Modifier and Type | Class and Description |
---|---|
class |
ParserModuleImpl
Parser
BioModule s read ClassifierModule output to build
standardized OTU count tables. |
Modifier and Type | Class and Description |
---|---|
class |
QiimeParser
This BioModules parses QiimeClassifier output reports to build standard OTU abundance tables.
|
class |
RdpParser
This BioModule parses RDP output files to build standard OTU abundance tables.
|
Modifier and Type | Class and Description |
---|---|
class |
Humann2Parser
This BioModules parses Humann2Classifier output reports to build standard OTU abundance tables.
Samples IDs are found in the column headers starting with the 2nd column. The count type depends on the HumanN2 config properties. |
class |
Kraken2Parser
This BioModules parses KrakenClassifier output reports to build standard OTU abundance tables.
|
class |
KrakenParser
This BioModules parses KrakenClassifier output reports to build standard OTU abundance tables.
|
class |
Metaphlan2Parser
This BioModules parses Metaphlan2Classifier output reports to build standard OTU abundance tables.
|
Modifier and Type | Class and Description |
---|---|
class |
BuildQiimeMapping
This BioModule converts the metadata file into a tab delimited QIIME mapping file (if provided).
The QIIME mapping file is validated by calling QIIME script validate_mapping_file.py |
class |
MergeQiimeOtuTables
This BioModule will run immediately after QiimeClosedRefClassifier if multiple otu_table.biom files were created.
|
class |
QiimeClassifier
This BioModule generates the bash script used to create QIIME summary scripts, taxonomy-level reports, and add alpha
diversity metrics (if configured) to the metadata file.
For a complete list of available metrics, see: http://scikit-bio.org/docs/latest/generated/skbio.diversity.alpha.html |
Modifier and Type | Class and Description |
---|---|
class |
JsonReport
This BioModule is used to build a JSON file (summary.json) compiled from all OTUs in the dataset.
|
Modifier and Type | Class and Description |
---|---|
class |
AddMetadataToPathwayTables
This BioModule is used to add metadata columns to the HumanN2 pathway abundance, pathway coverage, and gene family
tables.
|
class |
Humann2CountModule
This abstract superclass is extended by all other modules in this package.
Shared method implementations are defined to ensure uniform adoption of dependencies and prerequisites. |
class |
RemoveLowPathwayCounts
This BioModule set low Pathway counts below a configured threshold to zero.
These low sample counts are assumed to be miscategorized or genomic contamination. |
class |
RemoveScarcePathwayCounts
This BioModule removes scarce pathways not found in enough samples.
Each pathway must be found in a configurable percentage of samples to be retained. |
Modifier and Type | Class and Description |
---|---|
class |
CompileOtuCounts
This BioModule compiles the counts from all OTU count files into a single summary OTU count file containing OTU
counts for the entire dataset.
|
class |
OtuCountModule
OtuCount modules reads OTU count assignment tables (1 file/sample) with 2 columns.
Col1: Full OTU pathway spanning top to bottom level Col2: Count (# of reads) for the sample. |
class |
RarefyOtuCounts
This BioModule applies a mean iterative post-OTU classification rarefication algorithm so that each output sample
will have approximately the same number of OTUs.
|
class |
RemoveLowOtuCounts
This BioModule set low OTU counts below a configured threshold to zero.
These low sample counts are assumed to be miscategorized or contaminents. |
class |
RemoveScarceOtuCounts
This BioModule removes scarce OTUs not found in enough samples.
The OTU must be found in a configurable percentage of samples. |
Modifier and Type | Class and Description |
---|---|
class |
R_CalculateStats
This BioModule is used to build the R script used to generate taxonomy statistics and plots.
|
class |
R_Module
This BioModule is the superclass for R script generating modules.
|
class |
R_PlotEffectSize
This BioModule is used to run the R script used to generate OTU-metadata fold-change-barplots for each binary report
field.
|
class |
R_PlotMds
This BioModule is used to build the R script used to generate MDS plots for each report field and each taxonomy level
configured.
|
class |
R_PlotOtus
This BioModule is used to build the R script used to generate OTU-metadata box-plots and scatter-plots for each
report field and taxonomy level.
|
class |
R_PlotPvalHistograms
This BioModule is used to build the R script used to generate p-value histograms for each report field and each
taxonomy level configured.
|
Modifier and Type | Class and Description |
---|---|
class |
AddMetadataToTaxaTables
This BioModule is used to add metadata columns to the OTU abundance tables.
|
class |
BuildTaxaTables
Many R BioModules expect separate tables containing log-normalized taxa counts for each taxonomy level.
|
class |
LogTransformTaxaTables
This utility is used to log-transform the raw OTU counts on Log10 or Log-e scales.
|
class |
NormalizeTaxaTables
This utility is used to normalize and/or log-transform the raw OTU counts using the formulas:
Normalized OTU count formula = (RC/n)*((SUM(x))/N)+1
Relative abundance formula = Log(log_base) [ (RC/n)*((SUM(x))/N)+1 ]
The code implementation supports (log_base = e) and (log_base = 10) which is configured via
Constants.REPORT_LOG_BASE property. |
class |
TaxaCountModule
TBD
|
Modifier and Type | Class and Description |
---|---|
class |
AwkFastaConverter
This BioModule uses awk and gzip to convert input sequence files into a decompressed fasta file format.
|
class |
Gunzipper
This BioModule uses gzip to decompress input sequence files.
|
class |
KneadData
This BioModule runs biobakery kneaddata program to remove contaminated DNA.
Multiple contaminent DNA databases can be used to filter reads simultaniously. Common contaminents include Human, Viral, and Plasmid DNA. |
class |
Multiplexer
This BioModule will merge sequence files into a single combined sequence file, with either the sample ID or an
identifying bar-code (if defined in the metatata) is stored in the sequence header.
BioLockJ is designed to run on demultiplexed data so this must be the last module to run in its branch. |
class |
PearMergeReads
This BioModule will merge forward and reverse fastq files using PEAR.
For more informations, see the online PEAR manual: https://sco.h-its.org/exelixis/web/software/pear/doc.html |
class |
RarefySeqs
This BioModule imposes a minimum and/or maximum number of reads per sample.
|
class |
SeqFileValidator
This BioModule validates fasta/fastq file formats are valid and enforces min/max read lengths.
|
class |
TrimPrimers
This BioModule removes sequence primers from demultiplexed files.
The primers are defined using regular expressions in a separate file. |
Modifier and Type | Method and Description |
---|---|
static void |
BashScriptBuilder.buildMainScript(ScriptModule module)
Build the MIAN script.
|
static void |
BashScriptBuilder.buildScripts(ScriptModule module,
List<List<String>> data)
This method builds the bash scripts required for the given module.
Standard local/cluster pipelines include: 1 MAIN script, 1+ worker-scripts. Docker R_Modules include: 1 MAIN scirpt, 0 worker-scripts - MAIN.R run by MAIN.sh Docker *non-R_Modules* include: 1 MAIN scirpt, 1+ worker-scripts - MAIN.sh runs workers AWS Docker R_Modules include: 0 MAIN scirpts, 0 worker-scripts - MAIN.R run by nextflow AWS Docker *non-R_Modules* include: 0 MAIN scirpts, 1+ worker-scripts MAIN.sh runs workers |
protected static String |
BashScriptBuilder.getMainScriptExecuteWorkerLine(ScriptModule module,
String workerScriptPath,
String workerId)
Call "execute" on the worker script
|
static String |
SummaryUtil.getScriptDirSummary(ScriptModule module)
Return summary of the
ScriptModule script directory with metrics:
Print main script name
Number of worker scripts run
Number of worker scripts successful/failed/incomplete
Average worker script run time
Longest running worker script names/duration
Longest running workers script names/duration
|
protected static String |
BashScriptBuilder.getWorkerScriptPath(ScriptModule module,
String workerId)
Build the file path for the numbered worker script.
|
protected static List<String> |
BashScriptBuilder.initMainScript(ScriptModule module)
Create the ScriptModule main script that calls all worker scripts.
|
protected static List<String> |
BashScriptBuilder.initWorkerScript(ScriptModule module,
String scriptPath)
Create the numbered worker scripts.
|
static void |
LogUtil.syncModuleLogs(ScriptModule module)
Not used currently.
|
protected static void |
BashScriptBuilder.verifyConfig(ScriptModule module)
If property "pipeline.env" = cluster, require property "cluster.batchCommand",
otherwise exit this method.
|