public class SeqFileValidator extends JavaModuleImpl implements SeqModule
| Modifier and Type | Field and Description |
|---|---|
protected static String |
INPUT_SEQ_MAX
Config Integer property "seqFileValidator.seqMaxLen" defines the maximum number of bases per read |
protected static String |
INPUT_SEQ_MIN
Config Integer property "seqFileValidator.seqMinLen" defines the minimum number of bases per read |
static String |
NUM_VALID_READS
Column name that holds number of valid reads per sample: "Num_Valid_Reads"
|
protected static String |
REQUIRE_EUQL_NUM_PAIRS
Config Boolean property "seqFileValidator.requireEqualNumPairs" determines if module requires equal
number of forward and reverse reads (simple check). |
BLJ_OPTIONSGZIP_EXT, LOG_EXT, PDF_EXT, RETURN, SH_EXT, TAB_DELIM, TSV_EXT, TXT_EXTSCRIPT_BATCH_SIZE, SCRIPT_DEFAULT_HEADER, SCRIPT_NUM_THREADS, SCRIPT_PERMISSIONS, SCRIPT_TIMEOUTMAIN_SCRIPT_PREFIX, OUTPUT_DIR, TEMP_DIR| Constructor and Description |
|---|
SeqFileValidator() |
| Modifier and Type | Method and Description |
|---|---|
void |
cleanUp()
Set "Num_Valid_Reads" as the number of reads field.
|
List<File> |
getSeqFiles(Collection<File> files)
Return only sequence files for sample IDs found in the metadata file.
If Config."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row. |
String |
getSummary()
Produce a summary message with counts on total number of reads and number of valid reads containing a barcode
defined in the metadata file.
|
protected void |
removeBadFiles()
Remove sequence files in which all reads failed validation checks, leaving only an empty file.
|
void |
runModule()
Cache sampleIds to compare to validated sampleIds post-processing.
|
protected void |
validateFile(File file,
Integer fileCount)
Validate sequence files:
Validate valid 1st sequence header character is expected character
Validate fastq files have same number of bases and quality scores per read
Remove reads below minimum threshold: "seqFileValidator.seqMinLen"
Trim reads if above the maximum threshold: "seqFileValidator.seqMaxLen"
Invalid reads are saved to a file in the module temp directory for analysis/review.
|
protected void |
verifyPairedSeqs()
Verify equal number of forward and reverse read files.
if "seqFileValidator.requireEqualNumPairs"="Y", verify forward and reverse read files have an equal number of reads. |
buildScript, executeTask, getSource, getWorkerScriptFunctions, isValidInputModule, markStatus, moduleComplete, moduleFailedbuildScriptForPairedReads, checkDependencies, getJobParams, getMainScript, getRuntimeParams, getScriptDir, getScriptErrors, getTimeout, hasScriptscacheInputFiles, compareTo, equals, findModuleInputFiles, getFileCache, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getTempDir, init, toString, validateFileNameUniqueclone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitbuildScript, buildScriptForPairedReads, getJobParams, getMainScript, getScriptDir, getScriptErrors, getTimeout, getWorkerScriptFunctionscheckDependencies, executeTask, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getTempDir, init, isValidInputModulepublic static final String NUM_VALID_READS
protected static final String INPUT_SEQ_MAX
Config Integer property "seqFileValidator.seqMaxLen" defines the maximum number of bases per readprotected static final String INPUT_SEQ_MIN
Config Integer property "seqFileValidator.seqMinLen" defines the minimum number of bases per readprotected static final String REQUIRE_EUQL_NUM_PAIRS
Config Boolean property "seqFileValidator.requireEqualNumPairs" determines if module requires equal
number of forward and reverse reads (simple check).public void cleanUp()
throws Exception
cleanUp in interface BioModulecleanUp in class BioModuleImplException - thrown if any runtime error occurspublic List<File> getSeqFiles(Collection<File> files) throws Exception
SeqModuleConfig."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row.getSeqFiles in interface SeqModulefiles - Module input filesException - if no input files are foundpublic String getSummary() throws Exception
getSummary in interface BioModulegetSummary in class ScriptModuleImplException - if any error occurspublic void runModule()
throws Exception
validateFile(File, Integer) for
each input file.removeBadFiles() to remove empty files (cases where all reads fail validation).verifyPairedSeqs() if module input files are paired read files.MetaUtil.addColumn(String, Map, File, boolean)runModule in interface JavaModulerunModule in class JavaModuleImplException - thrown if any runtime error occursprotected void removeBadFiles()
throws Exception
Exception - if errors occurprotected void validateFile(File file, Integer fileCount) throws Exception
file - Sequence filefileCount - Integer countException - if I/O errors occur while processing sequence filesprotected void verifyPairedSeqs()
throws Exception
Exception - if validations fail or errors occur