Difference between revisions of "WormBase gene association file"

From WormBaseWiki
Jump to navigationJump to search
m
Line 94: Line 94:
 
* Column 6 will display the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
 
* Column 6 will display the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
 
* Column 7 will default to the evidence code "IDA" for inferred from direct assay
 
* Column 7 will default to the evidence code "IDA" for inferred from direct assay
* Column 8 will display the identifier of the expression pattern annotation (e.g. "WB:Expr2275") or the expression cluster annotation (e.g. "WB:WBPaper00051039:germline_enriched")
+
* Column 8 will display the identifier of the expression pattern annotation (e.g. "WB:Expr2275")
* Column 9 will default to "A" for anatomy (as opposed to one of three branches of GO, "P", "F", or "C")
+
* Column 9 will default to "L" for life stage (as opposed to one of three branches of GO, "P", "F", or "C")
 
* Column 10 will always be blank
 
* Column 10 will always be blank
 
* Column 11 will provide the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
 
* Column 11 will provide the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3

Revision as of 17:56, 2 February 2021

This page represents the current information about the WormBase gene association file. Click here for an archive of outdated/obsolete information about the WormBase gene association file.

Gene Association File (GAF) format

The original Gene Association File (GAF) format was specified within the Gene Ontology consortium to specify how gene associations to Gene Ontology (GO) terms would be reported in a tab-delimited download file.

To view the (now deprecated) GAF 2.0 format specification, visit:

http://geneontology.org/docs/go-annotation-file-gaf-format-2.0/

To view the (now stale, but not quite deprecated) GAF 2.1 format specification, visit:

http://geneontology.org/docs/go-annotation-file-gaf-format-2.1/

To view the latest GAF 2.2 format specification, visit:

https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md

WormBase ontology gene association files

WormBase provides a gene association file for each of the various ontologies that WormBase uses, essentially providing a single annotation associating a gene to an ontology term on a single row of the tab-delimited output file.

Current WormBase gene association files can always be found on the WormBase FTP site here:

ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/

Anatomy association file

The anatomy association file (associating genes to anatomical entities where the gene product has been reported to be expressed) has a general name like:

anatomy_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

anatomy_association.WS277.wb

The current header for the anatomy association file is

!gaf-version: 2.0
!Project_name: WormBase
!Contact Email: help@wormbase.org

The proposal (Feb 2021) is to update this header to:

!gaf-version: 2.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!Contact Email: help@wormbase.org

The WormBase anatomy association file generally follows the GAF 2.0 format with the following modifications:

  • The column 4 "Qualifier" will be one of four values specific to gene expression annotation: Certain, Uncertain, Partial, Enriched
  • Column 5 will report an anatomy term ID from the WormBase anatomy ontology, e.g. "WBbt:0003679"
  • Column 6 will display the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
  • Column 7 will default to the evidence code "IDA" for inferred from direct assay
  • Column 8 will display the identifier of the expression pattern annotation (e.g. "WB:Expr2275") or the expression cluster annotation (e.g. "WB:WBPaper00051039:germline_enriched")
  • Column 9 will default to "A" for anatomy (as opposed to one of three branches of GO, "P", "F", or "C")
  • Column 10 will always be blank
  • Column 11 will provide the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
  • Column 12 will always be "gene"
  • Column 13 will always be "taxon:6239"
  • Column 15 will always be "WB"
  • No column 16 or 17

Development (life stage) association file

The development (life stage) association file (associating genes to life stages when the gene product has been reported to be expressed) has a general name like:

development_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

development_association.WS277.wb

The current development association file has no header.

The proposal (Feb 2021) is to update this header to:

!gaf-version: 2.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!Contact Email: help@wormbase.org

The WormBase anatomy association file generally follows the GAF 2.0 format with the following modifications:

  • The column 4 "Qualifier" will either be blank or have the term "Anatomy_term" indicating that there is anatomical context for the life stage expression annotation
  • Column 5 will report a life stage term ID from the WormBase development/life-stage ontology, e.g. "WBls:0000038"
  • Column 6 will display the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
  • Column 7 will default to the evidence code "IDA" for inferred from direct assay
  • Column 8 will display the identifier of the expression pattern annotation (e.g. "WB:Expr2275")
  • Column 9 will default to "L" for life stage (as opposed to one of three branches of GO, "P", "F", or "C")
  • Column 10 will always be blank
  • Column 11 will provide the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
  • Column 12 will always be "gene"
  • Column 13 will always be "taxon:6239"
  • Column 15 will always be "WB"
  • No column 16 or 17

Disease association file

The disease association file (associating genes to human diseases the genes have been implicated in) has a general name like:

disease_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

disease_association.WS277.wb


Gene Ontology (GO) gene association file

The Gene Ontology gene association file (associating genes to Gene Ontology terms) has a general name like:

gene_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

gene_association.WS277.wb


Phenotype association file

The phenotype association file (associating genes to phenotype terms) has a general name like:

phenotype_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

phenotype_association.WS277.wb


RNAi phenotype association file

The RNAi phenotype association file (associating genes to RNAi phenotypes) has a general name like:

rnai_phenotypes.WS277.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

rnai_phenotypes.WS277.wb