public class SeqFileValidator extends JavaModuleImpl implements SeqModule
Modifier and Type | Field and Description |
---|---|
protected static String |
INPUT_SEQ_MAX
Config Integer property "seqFileValidator.seqMaxLen" defines the maximum number of bases per read |
protected static String |
INPUT_SEQ_MIN
Config Integer property "seqFileValidator.seqMinLen" defines the minimum number of bases per read |
static String |
NUM_VALID_READS
Column name that holds number of valid reads per sample: "Num_Valid_Reads"
|
protected static String |
REQUIRE_EUQL_NUM_PAIRS
Config Boolean property "seqFileValidator.requireEqualNumPairs" determines if module requires equal
number of forward and reverse reads (simple check). |
BLJ_OPTIONS
GZIP_EXT, LOG_EXT, PDF_EXT, RETURN, SH_EXT, TAB_DELIM, TSV_EXT, TXT_EXT
SCRIPT_BATCH_SIZE, SCRIPT_DEFAULT_HEADER, SCRIPT_NUM_THREADS, SCRIPT_PERMISSIONS, SCRIPT_TIMEOUT
MAIN_SCRIPT_PREFIX, OUTPUT_DIR, TEMP_DIR
Constructor and Description |
---|
SeqFileValidator() |
Modifier and Type | Method and Description |
---|---|
void |
cleanUp()
Set "Num_Valid_Reads" as the number of reads field.
|
List<File> |
getSeqFiles(Collection<File> files)
Return only sequence files for sample IDs found in the metadata file.
If Config ."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row. |
String |
getSummary()
Produce a summary message with counts on total number of reads and number of valid reads containing a barcode
defined in the metadata file.
|
protected void |
removeBadFiles()
Remove sequence files in which all reads failed validation checks, leaving only an empty file.
|
void |
runModule()
Cache sampleIds to compare to validated sampleIds post-processing.
|
protected void |
validateFile(File file,
Integer fileCount)
Validate sequence files:
Validate valid 1st sequence header character is expected character
Validate fastq files have same number of bases and quality scores per read
Remove reads below minimum threshold: "seqFileValidator.seqMinLen"
Trim reads if above the maximum threshold: "seqFileValidator.seqMaxLen"
Invalid reads are saved to a file in the module temp directory for analysis/review.
|
protected void |
verifyPairedSeqs()
Verify equal number of forward and reverse read files.
if "seqFileValidator.requireEqualNumPairs"="Y", verify forward and reverse read files have an equal number of reads. |
buildScript, executeTask, getSource, getWorkerScriptFunctions, isValidInputModule, markStatus, moduleComplete, moduleFailed
buildScriptForPairedReads, checkDependencies, getJobParams, getMainScript, getRuntimeParams, getScriptDir, getScriptErrors, getTimeout, hasScripts
cacheInputFiles, compareTo, equals, findModuleInputFiles, getFileCache, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getTempDir, init, toString, validateFileNameUnique
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
buildScript, buildScriptForPairedReads, getJobParams, getMainScript, getScriptDir, getScriptErrors, getTimeout, getWorkerScriptFunctions
checkDependencies, executeTask, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getTempDir, init, isValidInputModule
public static final String NUM_VALID_READS
protected static final String INPUT_SEQ_MAX
Config
Integer property "seqFileValidator.seqMaxLen" defines the maximum number of bases per readprotected static final String INPUT_SEQ_MIN
Config
Integer property "seqFileValidator.seqMinLen" defines the minimum number of bases per readprotected static final String REQUIRE_EUQL_NUM_PAIRS
Config
Boolean property "seqFileValidator.requireEqualNumPairs" determines if module requires equal
number of forward and reverse reads (simple check).public void cleanUp() throws Exception
cleanUp
in interface BioModule
cleanUp
in class BioModuleImpl
Exception
- thrown if any runtime error occurspublic List<File> getSeqFiles(Collection<File> files) throws Exception
SeqModule
Config
."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row.getSeqFiles
in interface SeqModule
files
- Module input filesException
- if no input files are foundpublic String getSummary() throws Exception
getSummary
in interface BioModule
getSummary
in class ScriptModuleImpl
Exception
- if any error occurspublic void runModule() throws Exception
validateFile(File, Integer)
for
each input file.removeBadFiles()
to remove empty files (cases where all reads fail validation).verifyPairedSeqs()
if module input files are paired read files.MetaUtil.addColumn(String, Map, File, boolean)
runModule
in interface JavaModule
runModule
in class JavaModuleImpl
Exception
- thrown if any runtime error occursprotected void removeBadFiles() throws Exception
Exception
- if errors occurprotected void validateFile(File file, Integer fileCount) throws Exception
file
- Sequence filefileCount
- Integer countException
- if I/O errors occur while processing sequence filesprotected void verifyPairedSeqs() throws Exception
Exception
- if validations fail or errors occur