public class ImportMetadata extends BioModuleImpl
GZIP_EXT, LOG_EXT, PDF_EXT, RETURN, SH_EXT, TAB_DELIM, TSV_EXT, TXT_EXTMAIN_SCRIPT_PREFIX, OUTPUT_DIR, TEMP_DIR| Constructor and Description |
|---|
ImportMetadata() |
| Modifier and Type | Method and Description |
|---|---|
protected File |
buildNewMetadataFile()
Create a simple metadata file in the module output directory, with only the 1st column populated with Sample IDs.
|
void |
checkDependencies()
If restarting or running a direct pipeline execute the cleanup for completed modules.
|
void |
cleanUp()
Verify the metadata fields configured for R reports.
|
void |
executeTask()
If
Config."metadata.filePath" is undefined, build a new metadata file
with only 1 column of sample IDs. |
protected String |
formatMetaId(String sampleIdColumnName)
Format the metadata ID to remove problematic invisible characters (particularly converted Excel files).
|
protected String |
getQuotedValue(String val)
The member variable quotedText caches the input held within a quoted block.
|
protected TreeSet<String> |
getSampleIds()
Extract the sample IDs from the file names with
SeqUtil.getSampleId(String) |
String |
getSummary()
The metadata file can be updated several times during pipeline execution.
|
protected boolean |
inQuotes(String val)
Method called each time a line from metadata contains the
Config."metadata.columnDelim". |
protected String |
parseRow(String line,
boolean isHeader)
Method called to parse a row from the metadata file, where
Config."metadata.columnDelim" separates columns. |
protected void |
verifyAllRowsMapToSeqFile(List<File> files)
Verify every row (every Sample ID) maps to a sequence file
|
protected static void |
verifyHeader(String cell,
List<String> colNames,
int colNum)
Verify column headers are not null and unique
|
cacheInputFiles, compareTo, equals, findModuleInputFiles, getFileCache, getID, getInputFiles, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getTempDir, init, isValidInputModule, toString, validateFileNameUniquepublic void checkDependencies()
throws Exception
BioModuleImplcheckDependencies in interface BioModulecheckDependencies in class BioModuleImplException - thrown if missing or invalid dependencies are foundpublic void cleanUp()
throws Exception
cleanUp in interface BioModulecleanUp in class BioModuleImplException - thrown if any runtime error occurspublic void executeTask()
throws Exception
Config."metadata.filePath" is undefined, build a new metadata file
with only 1 column of sample IDs. Otherwise, import "metadata.filePath" file and call
MetaUtil.refreshCache() to validate, format, and cache metadata as a tab delimited text
file.executeTask in interface BioModuleexecuteTask in class BioModuleImplException - thrown if the module is unable to complete is taskpublic String getSummary() throws Exception
MetaUtil).getSummary in interface BioModulegetSummary in class BioModuleImplException - if any error occursprotected File buildNewMetadataFile() throws Exception
Exception - if unable to build the new file due to invalid params or I/O errorsprotected String formatMetaId(String sampleIdColumnName)
sampleIdColumnName - Current name of metadata Sample ID columnprotected String getQuotedValue(String val)
Config."metadata.columnDelim" which will be read as a character. If
val closes an open quoted block, the entire quotedBlock is returned (ending with
Config."metadata.columnDelim" as all cells do) and the quotedText
cache is cleared.val - Parameter to evaluateprotected TreeSet<String> getSampleIds() throws Exception
SeqUtil.getSampleId(String)protected boolean inQuotes(String val)
Config."metadata.columnDelim". If the
Config."metadata.columnDelim" is encountered within a quoted block,
it should be interpreted as a character (not interpreted as a column delimiter).val - Parameter to evaluateprotected String parseRow(String line, boolean isHeader) throws Exception
Config."metadata.columnDelim" separates columns. The quotedText
member variable serves as a cache to build cell values contained in quotes which may include the
Config."metadata.columnDelim" as a standard character. Each row
increments rowNum member variable. When the header row is processed, colNames caches the field names.line - read from metadata fileisHeader - is true for only the first rowException - if required Config values are missing or invalidprotected void verifyAllRowsMapToSeqFile(List<File> files) throws Exception
files - List of sequence filesConfigViolationException - if unmapped Sample IDs are foundException - if other errors occurprotected static void verifyHeader(String cell, List<String> colNames, int colNum) throws Exception
cell - value of the header column namecolNames - a list of column names read so farcolNum - included for reference in error message if neededException - if a column header is null or not unique