public class Demultiplexer extends JavaModuleImpl implements SeqModule
Modifier and Type | Field and Description |
---|---|
protected static int |
NUM_LINES_TEMP_FILE
Module splits multiplexed file into smaller files with this number of lines: 2000000
|
protected static String |
SAMPLE_ID_SUFFIX_TRIM_DEFAULT
Multiplexed files created by BioLockJ may add sample ID to the sequence header if no barcode is provided.
If sample ID is added, it is immediately followed by the character: "_" This value can be used then to set Config ."input.trimSuffix" |
BLJ_OPTIONS
GZIP_EXT, LOG_EXT, PDF_EXT, RETURN, SH_EXT, TAB_DELIM, TSV_EXT, TXT_EXT
SCRIPT_BATCH_SIZE, SCRIPT_DEFAULT_HEADER, SCRIPT_NUM_THREADS, SCRIPT_PERMISSIONS, SCRIPT_TIMEOUT
MAIN_SCRIPT_PREFIX, OUTPUT_DIR, TEMP_DIR
Constructor and Description |
---|
Demultiplexer() |
Modifier and Type | Method and Description |
---|---|
protected void |
breakUpFiles()
Some multiplexed files can be very large.
|
void |
checkDependencies()
Validate module dependencies:
If
Config ."demultiplexer.strategy" indicates use of barcodes to
demultiplexer, validate metadata column named
Config ."metadata.barcodeColumn" exists
Call setMultiplexedConfig() to set multiplexed Config if needed
If Config ."demultiplexer.barcodeCutoff" defined, validate between 0.0 -
1.0
|
void |
cleanUp()
Update SeqUtil to indicate data has been demultiplexed.
|
protected void |
demultiplex(Map<String,Set<String>> validHeaders)
Demultiplex the file into separate small temp files, with 2000000 lines each for
processing.
|
List<File> |
getSeqFiles(Collection<File> files)
Return only sequence files for sample IDs found in the metadata file.
If Config ."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row. |
String |
getSummary()
Produces initial count and demultiplexed output count summaries for forward/reverse reads.
|
protected Map<String,Set<String>> |
getValidFwHeaders()
Get valid forward read headers that belong to reads with a valid barcode or sample identifier.
|
protected Map<String,Set<String>> |
getValidHeaders()
This method obtains all valid headers for the forward reads, and returns only headers that also have a matching
reverse read
|
void |
runModule()
Module execution summary:
Execute breakUpFiles() to split the multiplex file into smaller size for processing
Execute getValidHeaders() to obtain list of valid headers matched to sample ID with the metadata
file and also verifies matching forward and reverse read headers if demuliplexing paired reads. |
protected void |
setMultiplexedConfig()
Set the
Config properties needed to read the sample IDs from a multiplexed file if no barcode is
providedSet Config ."input.trimPrefix" = 1st sequence header character.Set Config ."input.trimSuffix"
="_" |
buildScript, executeTask, getSource, getWorkerScriptFunctions, isValidInputModule, markStatus, moduleComplete, moduleFailed
buildScriptForPairedReads, getJobParams, getMainScript, getRuntimeParams, getScriptDir, getScriptErrors, getTimeout, hasScripts
cacheInputFiles, compareTo, equals, findModuleInputFiles, getFileCache, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getTempDir, init, toString, validateFileNameUnique
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
buildScript, buildScriptForPairedReads, getJobParams, getMainScript, getScriptDir, getScriptErrors, getTimeout, getWorkerScriptFunctions
executeTask, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getTempDir, init, isValidInputModule
protected static final int NUM_LINES_TEMP_FILE
protected static final String SAMPLE_ID_SUFFIX_TRIM_DEFAULT
Config
."input.trimSuffix"public void checkDependencies() throws Exception
Config
."demultiplexer.strategy" indicates use of barcodes to
demultiplexer, validate metadata column named
Config
."metadata.barcodeColumn" exists
setMultiplexedConfig()
to set multiplexed Config if needed
Config
."demultiplexer.barcodeCutoff" defined, validate between 0.0 -
1.0
checkDependencies
in interface BioModule
checkDependencies
in class ScriptModuleImpl
Exception
- thrown if missing or invalid dependencies are foundpublic void cleanUp() throws Exception
cleanUp
in interface BioModule
cleanUp
in class BioModuleImpl
Exception
- if unable to modify propertypublic List<File> getSeqFiles(Collection<File> files) throws Exception
SeqModule
Config
."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row.getSeqFiles
in interface SeqModule
files
- Module input filesException
- if no input files are foundpublic String getSummary() throws Exception
getSummary
in interface BioModule
getSummary
in class ScriptModuleImpl
Exception
- if any error occurspublic void runModule() throws Exception
breakUpFiles()
to split the multiplex file into smaller size for processing
getValidHeaders()
to obtain list of valid headers matched to sample ID with the metadata
file and also verifies matching forward and reverse read headers if demuliplexing paired reads.
demultiplex(Map)
to demultiplex the data into a separate file (or pair of files) for each
sample
If paired reads are combined in a single file the read direction must be identified in the sequence header using key strings " 1:N:" " 2:N:"
runModule
in interface JavaModule
runModule
in class JavaModuleImpl
Exception
- thrown if any runtime error occursprotected void breakUpFiles() throws Exception
Exception
- if unexpected errors occur at runtimeprotected void demultiplex(Map<String,Set<String>> validHeaders) throws Exception
validHeaders
- Set of valid headersException
- if error occurs reading the multiplexed fileprotected Map<String,Set<String>> getValidFwHeaders() throws Exception
Exception
- if error occurprotected Map<String,Set<String>> getValidHeaders() throws Exception
Exception
- if unable to obtain headersprotected void setMultiplexedConfig() throws Exception
Config
properties needed to read the sample IDs from a multiplexed file if no barcode is
providedConfig
."input.trimPrefix" = 1st sequence header character.Config
."input.trimSuffix"
="_"Exception
- if unable to update the property values