public class ImportMetadata extends BioModuleImpl
GZIP_EXT, LOG_EXT, PDF_EXT, RETURN, SH_EXT, TAB_DELIM, TSV_EXT, TXT_EXT
MAIN_SCRIPT_PREFIX, OUTPUT_DIR, TEMP_DIR
Constructor and Description |
---|
ImportMetadata() |
Modifier and Type | Method and Description |
---|---|
protected File |
buildNewMetadataFile()
Create a simple metadata file in the module output directory, with only the 1st column populated with Sample IDs.
|
void |
checkDependencies()
If restarting or running a direct pipeline execute the cleanup for completed modules.
|
void |
cleanUp()
Verify the metadata fields configured for R reports.
|
void |
executeTask()
If
Config ."metadata.filePath" is undefined, build a new metadata file
with only 1 column of sample IDs. |
protected String |
formatMetaId(String sampleIdColumnName)
Format the metadata ID to remove problematic invisible characters (particularly converted Excel files).
|
protected String |
getQuotedValue(String val)
The member variable quotedText caches the input held within a quoted block.
|
protected TreeSet<String> |
getSampleIds()
Extract the sample IDs from the file names with
SeqUtil.getSampleId(String) |
String |
getSummary()
The metadata file can be updated several times during pipeline execution.
|
protected boolean |
inQuotes(String val)
Method called each time a line from metadata contains the
Config ."metadata.columnDelim". |
protected String |
parseRow(String line,
boolean isHeader)
Method called to parse a row from the metadata file, where
Config ."metadata.columnDelim" separates columns. |
protected void |
verifyAllRowsMapToSeqFile(List<File> files)
Verify every row (every Sample ID) maps to a sequence file
|
protected static void |
verifyHeader(String cell,
List<String> colNames,
int colNum)
Verify column headers are not null and unique
|
cacheInputFiles, compareTo, equals, findModuleInputFiles, getFileCache, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getTempDir, init, isValidInputModule, toString, validateFileNameUnique
public void checkDependencies() throws Exception
BioModuleImpl
checkDependencies
in interface BioModule
checkDependencies
in class BioModuleImpl
Exception
- thrown if missing or invalid dependencies are foundpublic void cleanUp() throws Exception
cleanUp
in interface BioModule
cleanUp
in class BioModuleImpl
Exception
- thrown if any runtime error occurspublic void executeTask() throws Exception
Config
."metadata.filePath" is undefined, build a new metadata file
with only 1 column of sample IDs. Otherwise, import "metadata.filePath" file and call
MetaUtil.refreshCache()
to validate, format, and cache metadata as a tab delimited text
file.executeTask
in interface BioModule
executeTask
in class BioModuleImpl
Exception
- thrown if the module is unable to complete is taskpublic String getSummary() throws Exception
MetaUtil
).getSummary
in interface BioModule
getSummary
in class BioModuleImpl
Exception
- if any error occursprotected File buildNewMetadataFile() throws Exception
Exception
- if unable to build the new file due to invalid params or I/O errorsprotected String formatMetaId(String sampleIdColumnName)
sampleIdColumnName
- Current name of metadata Sample ID columnprotected String getQuotedValue(String val)
Config
."metadata.columnDelim" which will be read as a character. If
val closes an open quoted block, the entire quotedBlock is returned (ending with
Config
."metadata.columnDelim" as all cells do) and the quotedText
cache is cleared.val
- Parameter to evaluateprotected TreeSet<String> getSampleIds() throws Exception
SeqUtil.getSampleId(String)
protected boolean inQuotes(String val)
Config
."metadata.columnDelim". If the
Config
."metadata.columnDelim" is encountered within a quoted block,
it should be interpreted as a character (not interpreted as a column delimiter).val
- Parameter to evaluateprotected String parseRow(String line, boolean isHeader) throws Exception
Config
."metadata.columnDelim" separates columns. The quotedText
member variable serves as a cache to build cell values contained in quotes which may include the
Config
."metadata.columnDelim" as a standard character. Each row
increments rowNum member variable. When the header row is processed, colNames caches the field names.line
- read from metadata fileisHeader
- is true for only the first rowException
- if required Config values are missing or invalidprotected void verifyAllRowsMapToSeqFile(List<File> files) throws Exception
files
- List of sequence filesConfigViolationException
- if unmapped Sample IDs are foundException
- if other errors occurprotected static void verifyHeader(String cell, List<String> colNames, int colNum) throws Exception
cell
- value of the header column namecolNames
- a list of column names read so farcolNum
- included for reference in error message if neededException
- if a column header is null or not unique