Difference between revisions of "WormBase-Caltech Weekly Calls"
From WormBaseWiki
Jump to navigationJump to searchm |
|||
Line 53: | Line 53: | ||
* We could add a README file and/or convert to the new GAF 2.2 format which would have a more expressive file header and possibly column headers(?) | * We could add a README file and/or convert to the new GAF 2.2 format which would have a more expressive file header and possibly column headers(?) | ||
** File headers could possibly link to the format specification page | ** File headers could possibly link to the format specification page | ||
+ | |||
+ | === Phenotype association file idiosyncrasy === | ||
+ | * As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references | ||
+ | * According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional | ||
+ | * When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8 | ||
+ | * However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank. | ||
+ | ** This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file | ||
+ | * Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)? | ||
+ | * With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8 |
Revision as of 13:50, 1 October 2020
Contents
Previous Years
2020 Meetings
October 1, 2020
Gene association file formats on FTP
- Our association files have format "*.wb"; is this useful or necessary?
- Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be
- We could add a README file and/or convert to the new GAF 2.2 format which would have a more expressive file header and possibly column headers(?)
- File headers could possibly link to the format specification page
Phenotype association file idiosyncrasy
- As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references
- According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional
- When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8
- However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank.
- This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file
- Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)?
- With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8