public class RarefySeqs extends JavaModuleImpl implements SeqModule
Modifier and Type | Field and Description |
---|---|
protected static String |
INPUT_RAREFYING_MAX
Config property "rarefySeqs.max" defines the maximum number of reads per file |
protected static String |
INPUT_RAREFYING_MIN
Config property "rarefySeqs.min" defines the minimum number of reads per file |
static String |
NUM_RAREFIED_READS
Metadata column name for column that holds number of rarefied reads per sample: "Num_Rarefied_Reads"
|
BLJ_OPTIONS
GZIP_EXT, LOG_EXT, PDF_EXT, RETURN, SH_EXT, TAB_DELIM, TSV_EXT, TXT_EXT
SCRIPT_BATCH_SIZE, SCRIPT_DEFAULT_HEADER, SCRIPT_NUM_THREADS, SCRIPT_PERMISSIONS, SCRIPT_TIMEOUT
MAIN_SCRIPT_PREFIX, OUTPUT_DIR, TEMP_DIR
Constructor and Description |
---|
RarefySeqs() |
Modifier and Type | Method and Description |
---|---|
protected void |
buildRarefiedFile(File input,
List<Long> indexes)
Build the rarefied file for the input file, keeping only the given indexes
|
void |
checkDependencies()
Validate module dependencies
Validate
Config .INPUT_RAREFYING_MIN is a non-negative integer
Validate Config .INPUT_RAREFYING_MAX is a positive integer that is greater than or
equal to Config .INPUT_RAREFYING_MIN (if defined)
|
void |
cleanUp()
Set "Num_Rarefied_Reads" as the number of reads field.
|
List<String> |
getPreRequisiteModules()
This method always requires a prerequisite module with a "number of reads" count such as:
RegisterNumReads . |
List<File> |
getSeqFiles(Collection<File> files)
Return only sequence files for sample IDs found in the metadata file.
If Config ."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row. |
String |
getSummary()
Produce summary message with min, max, mean, and median number of reads.
|
protected void |
rarefy(File seqFile)
Builds the rarefied file if too many seqs found, or adds files with too few samples to the list of bad samples.
|
void |
runModule()
For each file with number reads outside of
Config .INPUT_RAREFYING_MIN and
Config .INPUT_RAREFYING_MAX values, generate a new sequence file from a shuffled list of
its sequences. |
buildScript, executeTask, getSource, getWorkerScriptFunctions, isValidInputModule, markStatus, moduleComplete, moduleFailed
buildScriptForPairedReads, getJobParams, getMainScript, getRuntimeParams, getScriptDir, getScriptErrors, getTimeout, hasScripts
cacheInputFiles, compareTo, equals, findModuleInputFiles, getFileCache, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getTempDir, init, toString, validateFileNameUnique
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
buildScript, buildScriptForPairedReads, getJobParams, getMainScript, getScriptDir, getScriptErrors, getTimeout, getWorkerScriptFunctions
executeTask, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getTempDir, init, isValidInputModule
public static final String NUM_RAREFIED_READS
protected static final String INPUT_RAREFYING_MAX
Config
property "rarefySeqs.max" defines the maximum number of reads per fileprotected static final String INPUT_RAREFYING_MIN
Config
property "rarefySeqs.min" defines the minimum number of reads per filepublic void checkDependencies() throws Exception
Config
.INPUT_RAREFYING_MIN
is a non-negative integer
Config
.INPUT_RAREFYING_MAX
is a positive integer that is greater than or
equal to Config
.INPUT_RAREFYING_MIN
(if defined)
checkDependencies
in interface BioModule
checkDependencies
in class ScriptModuleImpl
Exception
- thrown if missing or invalid dependencies are foundpublic void cleanUp() throws Exception
cleanUp
in interface BioModule
cleanUp
in class BioModuleImpl
Exception
- thrown if any runtime error occurspublic List<String> getPreRequisiteModules() throws Exception
RegisterNumReads
. If paired reads found, also return a 2nd module:
PearMergeReads
.getPreRequisiteModules
in interface BioModule
getPreRequisiteModules
in class BioModuleImpl
Exception
- if invalid Class names are returned as prerequisitespublic List<File> getSeqFiles(Collection<File> files) throws Exception
SeqModule
Config
."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row.getSeqFiles
in interface SeqModule
files
- Module input filesException
- if no input files are foundpublic String getSummary() throws Exception
getSummary
in interface BioModule
getSummary
in class ScriptModuleImpl
Exception
- if any error occurspublic void runModule() throws Exception
Config
.INPUT_RAREFYING_MIN
and
Config
.INPUT_RAREFYING_MAX
values, generate a new sequence file from a shuffled list of
its sequences.runModule
in interface JavaModule
runModule
in class JavaModuleImpl
Exception
- thrown if any runtime error occursprotected void buildRarefiedFile(File input, List<Long> indexes) throws Exception
input
- Sequence fileindexes
- List of indexes to keepException
- if unable to build rarefied file