Package | Description |
---|---|
biolockj |
The root biolockj package contains core classes used by all BioLockJ pipelines.
|
biolockj.module | |
biolockj.module.classifier |
This package contains Classifier specific
BioModule s that build bash scripts to generate
taxonomy reports from the sequences files. |
biolockj.module.classifier.r16s |
This package contains RDP and QIIME classifier modules that assign taxonomy to 16s sequences.
Output file formats are classifier specific so classifier requires a unique post-requisite ParserModule . |
biolockj.module.classifier.wgs |
This package contains Kraken, Kraken2, Humann2, MetaPhlAn classifier modules that assign taxonomy to WGS data.
As with all BioLockJ classifiers, these modules generate raw classifier output files from the sequence data. Include the corresponding ParserModule in the Config file to
generate standardized OTU abundance tables. |
biolockj.module.implicit |
The modules in this package are implicitly added to pipelines as needed.
These modules cannot be directly added to any pipeline unless overridden via pipeline.disableImplicitModules=Y. |
biolockj.module.implicit.parser |
This package contains Parser BioModules in the r16s and WGS sub-packages that are paired with a
ClassifierModule via
getPostRequisiteModules() to run immediately after the
classifier. |
biolockj.module.implicit.parser.r16s |
This package contains Parser
BioModule s that convert the 16S taxonomy reports generated by
16S classifiers (such as RDP and QIIME) into standardized OTU abundance tables. |
biolockj.module.implicit.parser.wgs |
This package contains Parser
BioModule s that convert the WGS taxonomy reports generated by
WGS classifiers (such as Kraken, Kraken2 Metaphlan2, or Humann2) into standardized OTU abundance tables. |
biolockj.module.implicit.qiime |
This package contains
BioModule s that are implicitly added to QIIME pipeline as needed. |
biolockj.module.report |
This package contains
BioModule s that normalize OTU abundance tables output by Parser
modules, merges them with the metadata, and generates various reports and notifications. |
biolockj.module.report.humann2 | |
biolockj.module.report.otu | |
biolockj.module.report.r |
This package contains
BioModule s that build pipeline reports from the standard OTU abundance
tables by generating R scripts to produce the summary statistics and data visualizations output to PDF files. |
biolockj.module.report.taxa | |
biolockj.module.seq |
BioModule s used to prepare sequence files or update the metadata prior to classification. |
biolockj.util |
Static utilities centralize and organize reusable core methods.
|
Modifier and Type | Method and Description |
---|---|
static List<BioModule> |
BioModuleFactory.buildPipeline()
Build all modules for the pipeline.
|
static List<BioModule> |
Pipeline.getModules()
Return a list of
BioModule s constructed by the BioModuleFactory |
Modifier and Type | Method and Description |
---|---|
protected static void |
Pipeline.deleteIncompleteModule(BioModule module)
Delete and recreate incomplete module directory.
|
static void |
Pipeline.executeModule(BioModule module)
Execute a single pipeline module.
|
static boolean |
Config.getBoolean(BioModule module,
String property)
Parse property value (Y or N) to return boolean, if not found, return false;
|
static Double |
Config.getDoubleVal(BioModule module,
String property)
Parse property for numeric (double) value
|
static String |
Config.getExe(BioModule module,
String property)
Get exe.* property name.
|
static String |
Config.getExeParams(BioModule module,
String property)
Call this function to get the parameters configured for this property.
Make sure the last character for non-null results is an empty character for use in bash scripts calling the corresponding executable. |
static File |
Config.getExistingDir(BioModule module,
String property)
Get a valid File directory or return null
|
static File |
Config.getExistingFile(BioModule module,
String property)
Get a valid File or return null.
|
static List<String> |
Config.getList(BioModule module,
String property)
Parse comma delimited property value to return list
|
protected static File |
Pipeline.getMetadata(BioModule bioModule)
If the bioModule is complete and contains a metadata file in its output directory, return the metadata file,
since it must be a new version.
|
static String |
Config.getModuleProp(BioModule module,
String prop)
Return module specific property if configured, otherwise use the given prop.
|
static Integer |
Config.getNonNegativeInteger(BioModule module,
String property)
Parse property as non-negative integer value
|
static Double |
Config.getPositiveDoubleVal(BioModule module,
String property)
Parse property as positive double value
|
static Integer |
Config.getPositiveInteger(BioModule module,
String property)
Parse property as positive integer value
|
protected List<String> |
BioModuleFactory.getPostRequisites(BioModule module)
This method returns all module post-requisites (including post-requisites of the post-requisites).
|
protected List<String> |
BioModuleFactory.getPreRequisites(BioModule module)
This method returns all module prerequisites (including prerequisites of the prerequisites).
|
static Set<String> |
Config.getSet(BioModule module,
String property)
Parse comma-separated property value to build an unordered Set
|
static String |
Config.getString(BioModule module,
String property)
Get property value as String.
|
static Set<String> |
Config.getTreeSet(BioModule module,
String property)
Parse comma-separated property value to build an ordered Set
|
protected static void |
Pipeline.refreshOutputMetadata(BioModule module)
Call
MetaUtil to refresh the metadata cache if a new metadata file was output by the
bioModule . |
protected static void |
Pipeline.refreshRCacheIfNeeded(BioModule module)
Refresh R cache if about to run the 1st R module.
|
static boolean |
Config.requireBoolean(BioModule module,
String property)
Required to return a valid boolean or
|
static Double |
Config.requireDoubleVal(BioModule module,
String property)
Requires valid double value
|
static File |
Config.requireExistingDir(BioModule module,
String property)
Requires valid existing directory.
|
static List<File> |
Config.requireExistingDirs(BioModule module,
String property)
Requires valid list of file directories
|
static File |
Config.requireExistingFile(BioModule module,
String property)
Require valid existing file
|
static Integer |
Config.requireInteger(BioModule module,
String property)
Requires valid integer value
|
static List<String> |
Config.requireList(BioModule module,
String property)
Require valid list property
|
static Double |
Config.requirePositiveDouble(BioModule module,
String property)
Require valid positive double value
|
static Integer |
Config.requirePositiveInteger(BioModule module,
String property)
Require valid positive integer value
|
static Set<String> |
Config.requireSet(BioModule module,
String property)
Require valid Set value
|
static String |
Config.requireString(BioModule module,
String property)
Require valid String value
|
Modifier and Type | Interface and Description |
---|---|
interface |
DatabaseModule
Interface for BioModules that use a reference database that is used by the DockerUtil to find the correct database
directory to map to the container /db volume.
|
interface |
JavaModule
Classes that implement this interface are pure Java modules.
|
interface |
ScriptModule
Classes that implement this interface are
|
interface |
SeqModule
Classes that implement this interface requires sequence files for input.
|
Modifier and Type | Class and Description |
---|---|
class |
BioModuleImpl
Superclass for standard BioModules (classifiers, parsers, etc).
|
class |
JavaModuleImpl
Superclass for Java BioModules that will be called in separate instances of the application.
|
class |
ScriptModuleImpl
Superclass for Java BioModules that will be called in separate instances of the application.
|
class |
SeqModuleImpl
Superclass for SeqModules that take sequence files as input for pre-processing prior to classification.
|
Modifier and Type | Method and Description |
---|---|
int |
BioModuleImpl.compareTo(BioModule module) |
boolean |
BioModuleImpl.isValidInputModule(BioModule module)
In the early stages of the pipeline, starting with the very 1st module
ImportMetadata , most modules expect sequence files as input. |
boolean |
SeqModuleImpl.isValidInputModule(BioModule module) |
boolean |
JavaModuleImpl.isValidInputModule(BioModule module)
If module is a
SeqModule input must contain sequence data. |
boolean |
BioModule.isValidInputModule(BioModule previousModule)
BioModules
getInputFiles() method typically, but not always, return the previousModule output files. |
Modifier and Type | Interface and Description |
---|---|
interface |
ClassifierModule
Classifier
BioModule s build one or more bash scripts to call the application on sequence
files. |
Modifier and Type | Class and Description |
---|---|
class |
ClassifierModuleImpl
This is the superclass for all WGS and 16S biolockj.module.classifier BioModules.
|
Modifier and Type | Class and Description |
---|---|
class |
QiimeClosedRefClassifier
This BioModule executes the QIIME script pick_closed_reference_otus.py on a FastA sequence files.
|
class |
QiimeDeNovoClassifier
This module runs the QIIME pick_de_novo_otus.py script on FastA sequence files in a single script so it is important
to allocate sufficient job resources if running in a clustered environment.
|
class |
QiimeOpenRefClassifier
This module runs the QIIME pick_open_reference_otus.py script on FastA sequence files in a single script so it is
important to allocate sufficient job resources if running in a clustered environment.
|
class |
RdpClassifier
This BioModule uses RDP to assign taxonomy to 16s sequences.
|
Modifier and Type | Class and Description |
---|---|
class |
Humann2Classifier
This BioModule runs biobakery humann2 program to generate the HMP Unified Metabolic Analysis Network
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). |
class |
Kraken2Classifier
This BioModule assigns taxonomy to WGS sequences and translates the results into mpa-format.
|
class |
KrakenClassifier
This BioModule assigns taxonomy to WGS sequences and translates the results into mpa-format.
|
class |
Metaphlan2Classifier
This BioModule builds the bash scripts used to execute metaphlan2.py to classify WGS sequences with MetaPhlAn2.
|
Modifier and Type | Class and Description |
---|---|
class |
Demultiplexer
This BioModule splits multiplexed data into a separate file or pair of files (for paired reads) for each sample.
|
class |
ImportMetadata
This BioModule validates the contents/format of the project metadata file and the related Config properties.
|
class |
RegisterNumReads
This BioModule parses sequence file to count the number of reads per sample.
|
Modifier and Type | Interface and Description |
---|---|
interface |
ParserModule
This interface defines the required methods to parse ClassifierModule output.
|
Modifier and Type | Class and Description |
---|---|
class |
ParserModuleImpl
Parser
BioModule s read ClassifierModule output to build
standardized OTU count tables. |
Modifier and Type | Class and Description |
---|---|
class |
QiimeParser
This BioModules parses QiimeClassifier output reports to build standard OTU abundance tables.
|
class |
RdpParser
This BioModule parses RDP output files to build standard OTU abundance tables.
|
Modifier and Type | Class and Description |
---|---|
class |
Humann2Parser
This BioModules parses Humann2Classifier output reports to build standard OTU abundance tables.
Samples IDs are found in the column headers starting with the 2nd column. The count type depends on the HumanN2 config properties. |
class |
Kraken2Parser
This BioModules parses KrakenClassifier output reports to build standard OTU abundance tables.
|
class |
KrakenParser
This BioModules parses KrakenClassifier output reports to build standard OTU abundance tables.
|
class |
Metaphlan2Parser
This BioModules parses Metaphlan2Classifier output reports to build standard OTU abundance tables.
|
Modifier and Type | Method and Description |
---|---|
boolean |
Humann2Parser.isValidInputModule(BioModule module) |
Modifier and Type | Class and Description |
---|---|
class |
BuildQiimeMapping
This BioModule converts the metadata file into a tab delimited QIIME mapping file (if provided).
The QIIME mapping file is validated by calling QIIME script validate_mapping_file.py |
class |
MergeQiimeOtuTables
This BioModule will run immediately after QiimeClosedRefClassifier if multiple otu_table.biom files were created.
|
class |
QiimeClassifier
This BioModule generates the bash script used to create QIIME summary scripts, taxonomy-level reports, and add alpha
diversity metrics (if configured) to the metadata file.
For a complete list of available metrics, see: http://scikit-bio.org/docs/latest/generated/skbio.diversity.alpha.html |
Modifier and Type | Method and Description |
---|---|
boolean |
QiimeClassifier.isValidInputModule(BioModule module)
If superclass is fed by another QiimeClassifier, it must be a subclass with biom output.
|
Modifier and Type | Class and Description |
---|---|
class |
Email
This BioModule is used to email the user the pipeline execution status and summary.
|
class |
JsonReport
This BioModule is used to build a JSON file (summary.json) compiled from all OTUs in the dataset.
|
Modifier and Type | Method and Description |
---|---|
boolean |
JsonReport.isValidInputModule(BioModule module) |
Modifier and Type | Class and Description |
---|---|
class |
AddMetadataToPathwayTables
This BioModule is used to add metadata columns to the HumanN2 pathway abundance, pathway coverage, and gene family
tables.
|
class |
Humann2CountModule
This abstract superclass is extended by all other modules in this package.
Shared method implementations are defined to ensure uniform adoption of dependencies and prerequisites. |
class |
RemoveLowPathwayCounts
This BioModule set low Pathway counts below a configured threshold to zero.
These low sample counts are assumed to be miscategorized or genomic contamination. |
class |
RemoveScarcePathwayCounts
This BioModule removes scarce pathways not found in enough samples.
Each pathway must be found in a configurable percentage of samples to be retained. |
Modifier and Type | Method and Description |
---|---|
protected boolean |
Humann2CountModule.isHumann2CountModule(BioModule module)
Check the module to determine if it generated OTU count files.
|
boolean |
Humann2CountModule.isValidInputModule(BioModule module) |
Modifier and Type | Class and Description |
---|---|
class |
CompileOtuCounts
This BioModule compiles the counts from all OTU count files into a single summary OTU count file containing OTU
counts for the entire dataset.
|
class |
OtuCountModule
OtuCount modules reads OTU count assignment tables (1 file/sample) with 2 columns.
Col1: Full OTU pathway spanning top to bottom level Col2: Count (# of reads) for the sample. |
class |
RarefyOtuCounts
This BioModule applies a mean iterative post-OTU classification rarefication algorithm so that each output sample
will have approximately the same number of OTUs.
|
class |
RemoveLowOtuCounts
This BioModule set low OTU counts below a configured threshold to zero.
These low sample counts are assumed to be miscategorized or contaminents. |
class |
RemoveScarceOtuCounts
This BioModule removes scarce OTUs not found in enough samples.
The OTU must be found in a configurable percentage of samples. |
Modifier and Type | Method and Description |
---|---|
protected boolean |
OtuCountModule.isOtuModule(BioModule module)
Check the module to determine if it generated OTU count files.
|
boolean |
OtuCountModule.isValidInputModule(BioModule module) |
Modifier and Type | Class and Description |
---|---|
class |
R_CalculateStats
This BioModule is used to build the R script used to generate taxonomy statistics and plots.
|
class |
R_Module
This BioModule is the superclass for R script generating modules.
|
class |
R_PlotEffectSize
This BioModule is used to run the R script used to generate OTU-metadata fold-change-barplots for each binary report
field.
|
class |
R_PlotMds
This BioModule is used to build the R script used to generate MDS plots for each report field and each taxonomy level
configured.
|
class |
R_PlotOtus
This BioModule is used to build the R script used to generate OTU-metadata box-plots and scatter-plots for each
report field and taxonomy level.
|
class |
R_PlotPvalHistograms
This BioModule is used to build the R script used to generate p-value histograms for each report field and each
taxonomy level configured.
|
Modifier and Type | Method and Description |
---|---|
static File |
R_CalculateStats.getStatsFile(BioModule module,
String level,
Boolean isParametric,
Boolean isAdjusted)
Get the stats file for the given fileType and taxonomy level.
|
Modifier and Type | Class and Description |
---|---|
class |
AddMetadataToTaxaTables
This BioModule is used to add metadata columns to the OTU abundance tables.
|
class |
BuildTaxaTables
Many R BioModules expect separate tables containing log-normalized taxa counts for each taxonomy level.
|
class |
LogTransformTaxaTables
This utility is used to log-transform the raw OTU counts on Log10 or Log-e scales.
|
class |
NormalizeTaxaTables
This utility is used to normalize and/or log-transform the raw OTU counts using the formulas:
Normalized OTU count formula = (RC/n)*((SUM(x))/N)+1
Relative abundance formula = Log(log_base) [ (RC/n)*((SUM(x))/N)+1 ]
The code implementation supports (log_base = e) and (log_base = 10) which is configured via
Constants.REPORT_LOG_BASE property. |
class |
TaxaCountModule
TBD
|
Modifier and Type | Method and Description |
---|---|
boolean |
TaxaCountModule.isTaxaModule(BioModule module)
Check the module output directory for taxonomy table files generated by BioLockJ.
|
boolean |
TaxaCountModule.isValidInputModule(BioModule module) |
Modifier and Type | Class and Description |
---|---|
class |
AwkFastaConverter
This BioModule uses awk and gzip to convert input sequence files into a decompressed fasta file format.
|
class |
Gunzipper
This BioModule uses gzip to decompress input sequence files.
|
class |
KneadData
This BioModule runs biobakery kneaddata program to remove contaminated DNA.
Multiple contaminent DNA databases can be used to filter reads simultaniously. Common contaminents include Human, Viral, and Plasmid DNA. |
class |
Multiplexer
This BioModule will merge sequence files into a single combined sequence file, with either the sample ID or an
identifying bar-code (if defined in the metatata) is stored in the sequence header.
BioLockJ is designed to run on demultiplexed data so this must be the last module to run in its branch. |
class |
PearMergeReads
This BioModule will merge forward and reverse fastq files using PEAR.
For more informations, see the online PEAR manual: https://sco.h-its.org/exelixis/web/software/pear/doc.html |
class |
RarefySeqs
This BioModule imposes a minimum and/or maximum number of reads per sample.
|
class |
SeqFileValidator
This BioModule validates fasta/fastq file formats are valid and enforces min/max read lengths.
|
class |
TrimPrimers
This BioModule removes sequence primers from demultiplexed files.
The primers are defined using regular expressions in a separate file. |
Modifier and Type | Method and Description |
---|---|
static BioModule |
ModuleUtil.getModule(BioModule module,
String className,
boolean checkAhead)
Get a module with given className unless a classifier module is found 1st.
Use checkAhead parameter to determine if we look forward or backwards starting from the given module. |
static BioModule |
ModuleUtil.getModule(String className)
Construct a BioModule based on its className.
|
static BioModule |
ModuleUtil.getNextModule(BioModule module)
BioModules are run in the order configured.
Return the module configured to run after the given module. |
static BioModule |
ModuleUtil.getPreviousModule(BioModule module)
BioModules are run in the order configured.
Return the module configured to run before the given module. |
Modifier and Type | Method and Description |
---|---|
protected static List<BioModule> |
DownloadUtil.getDownloadModules()
Get the modules to download.
|
static List<BioModule> |
ModuleUtil.getModules(BioModule module,
Boolean checkAhead)
Return pipeline modules after the given module if checkAhead = TRUE
Otherwise return pipeline modules before the given module. If returning the prior modules, return the pipeline modules in reverse order, so the 1st item in the list is the module immediately preceding the given module. |
Modifier and Type | Method and Description |
---|---|
static List<String> |
DockerUtil.buildSpawnDockerContainerFunction(BioModule module)
Build the "spawnDockerContainer" method, which takes container name, in/out port, and optionally script
path parameters.
|
static void |
RMetaUtil.classifyReportableMetadata(BioModule module)
Classify and verify the R filter and reportable metadata fields listed in the
Config file.All metadata fields are reported unless specific fields are listed in: Config ."r.reportFields". |
static String |
ModuleUtil.displayID(BioModule module)
Return the module ID as a 2 digit display number (add leading zero if needed).
|
static Set<String> |
RMetaUtil.getBinaryFields(BioModule module)
Get the
Config ."R_internal.binaryFields" fields containing only 2 non-numeric values. |
static ClassifierModule |
ModuleUtil.getClassifier(BioModule module,
boolean checkAhead)
Get a classifier module
Use checkAhead parameter to determine if we look forward or backwards starting from the given module. |
static String |
RuntimeParamUtil.getDirectModuleParam(BioModule module)
Direct module parameters contain 2 parts separated by a colon: (pipeline directory name):(module name)
|
static String |
DockerUtil.getDockerImage(BioModule module)
Return the name of the Docker image needed for the given module.
|
static String |
DockerUtil.getDockerUser(BioModule module)
Return the Docker Hub user ID.
|
static String |
DockerUtil.getImageName(BioModule module)
Return the Docker Image name for the given class name.
Return "blj_bash" for simple bash script modules that don't rely on special software Class names contain no spaces, words are separated via CamelCaseConvension. Docker image names cannot contain upper case letters, so this method substitutes "_" before the lower-case version of each capital letter. Example: JavaModule becomes java_module |
static String |
DockerUtil.getImageVersion(BioModule module)
Get the Docker image version if defined in the
Config fileIf not found, return the default version "latest" |
static String |
SummaryUtil.getInputSummary(BioModule module)
Build a summary of the input files for the given module
|
static BioModule |
ModuleUtil.getModule(BioModule module,
String className,
boolean checkAhead)
Get a module with given className unless a classifier module is found 1st.
Use checkAhead parameter to determine if we look forward or backwards starting from the given module. |
static String |
SummaryUtil.getModuleRunTime(BioModule module)
Return duration module ran based on modified data of started file, formatted for display (as hours, minutes,
seconds).
|
static List<BioModule> |
ModuleUtil.getModules(BioModule module,
Boolean checkAhead)
Return pipeline modules after the given module if checkAhead = TRUE
Otherwise return pipeline modules before the given module. If returning the prior modules, return the pipeline modules in reverse order, so the 1st item in the list is the module immediately preceding the given module. |
static BioModule |
ModuleUtil.getNextModule(BioModule module)
BioModules are run in the order configured.
Return the module configured to run after the given module. |
static String |
SummaryUtil.getOutputDirSummary(BioModule module)
Return summary of
BioModule output directory, with metrics:
Number of output files
Mean output file size
Path of new metadata file if any created
|
static BioModule |
ModuleUtil.getPreviousModule(BioModule module)
BioModules are run in the order configured.
Return the module configured to run before the given module. |
static String |
MetaUtil.getSystemMetaCol(BioModule module,
String col)
Return a system generated metadata column name based on the module status.
|
static boolean |
DockerUtil.hasDB(BioModule module)
Function used to determine if an alternate database has been defined (other than /db).
|
static boolean |
ModuleUtil.hasExecuted(BioModule module)
Return TRUE if module has executed.
|
static boolean |
ModuleUtil.isComplete(BioModule module)
Return TRUE if module completed successfully.
|
static boolean |
ModuleUtil.isFirstRModule(BioModule module)
Test if module is the first
R_Module configured in the pipeline. |
static boolean |
ModuleUtil.isIncomplete(BioModule module)
Return TRUE if module started execution but is not complete.
|
static boolean |
ModuleUtil.isMetadataModule(BioModule module)
Method determines if the given module is a metadata-module (which does not use/modify sequence data.
|
static boolean |
PathwayUtil.isPathwayModule(BioModule module)
Check the module to determine if it generated OTU count files.
|
static boolean |
SeqUtil.isSeqModule(BioModule module)
Check the module to determine if it generated sequence file output.
|
static void |
ModuleUtil.markComplete(BioModule module)
Method creates a file named "biolockjComplete" in module root directory to document module
has completed successfully.
|
static void |
ModuleUtil.markStarted(BioModule module)
Method creates a file named "biolockjStarted" in module root directory to document module
has completed successfully.
|
static boolean |
RMetaUtil.reportAllFields(BioModule module)
The override property:
Config ."r.reportFields" can be used to list the metadata
reportable fields for use in the R modules. |
static void |
SummaryUtil.reportSuccess(BioModule module)
After each module completes, this method is called to track the execution summary.
If module is null, the pipeline is complete/successful. |
static File |
ModuleUtil.requireSubDir(BioModule module,
String subDirName)
Get BioModule subdirectory File object with given name.
|
protected static void |
SummaryUtil.resetModuleSummary(BioModule module)
Modules can be forced to reset to incomplete status.
|
static boolean |
ModuleUtil.subDirExists(BioModule module,
String subDirName)
Return TRUE if BioModule sub-directory exists
|
static boolean |
RMetaUtil.updateRConfig(BioModule module)
Get updated R config props
|
static Boolean |
PathwayUtil.useHumann2RawCount(BioModule module)
Determine if humann2 provided most recent raw count data, used to determine getPreReq modules.
|
static void |
PathwayUtil.verifyConfig(BioModule module)
Verify the HumanN2 Config contains at least one of the following reports are enabled:
"humann2.disableGeneFamilies" "humann2.disablePathCoverage" "humann2.disableGeneFamilies" |
static void |
RMetaUtil.verifyMetadataFieldsExist(BioModule module,
String prop,
Collection<String> fields)
This method verifies the fields given exist in the metadata file.
|
Modifier and Type | Method and Description |
---|---|
protected static File |
DownloadUtil.makeRunAllScript(List<BioModule> modules)
This script allows a user to run all R scripts together from a single script.
|
static void |
NextflowUtil.startNextflow(List<BioModule> modules)
Call this method to build the Nextflow main.nf for the current pipeline.
|