Difference between revisions of "WormBase-Caltech Weekly Calls"
From WormBaseWiki
Jump to navigationJump to searchLine 51: | Line 51: | ||
* Our association files have format "*.wb"; is this useful or necessary? | * Our association files have format "*.wb"; is this useful or necessary? | ||
* Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be | * Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be | ||
− | * We could add a README file and/or convert to the new GAF 2.2 format which would have a more expressive file header and possibly column headers(?) | + | * We could add a README file and/or convert to the new [https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md GAF 2.2 format] which would have a more expressive file header and possibly column headers(?) |
** File headers could possibly link to the format specification page | ** File headers could possibly link to the format specification page | ||
Revision as of 14:14, 1 October 2020
Contents
Previous Years
2020 Meetings
October 1, 2020
Gene association file formats on FTP
- Our association files have format "*.wb"; is this useful or necessary?
- Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be
- We could add a README file and/or convert to the new GAF 2.2 format which would have a more expressive file header and possibly column headers(?)
- File headers could possibly link to the format specification page
Phenotype association file idiosyncrasy
- As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references
- According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional
- When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8
- However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank.
- This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file
- Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)?
- With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8