WormBase-Caltech Weekly Calls
From WormBaseWikiJump to navigationJump to search
October 1, 2020
Gene association file formats on FTP
- For example, current production release ONTOLOGY directory: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/
- Our association files have format "*.wb"; is this useful or necessary?
- Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be
- We could add a README file and/or convert to the new GAF 2.2 format which would have a more expressive file header and possibly column headers(?)
- File headers could possibly link to the format specification page
Phenotype association file idiosyncrasy
- As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references
- According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional
- When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8
- However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank.
- This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file
- Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)?
- With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8