WormBase gene association file

From WormBaseWiki
Jump to navigationJump to search

This page represents the current information about the WormBase gene association file. Click here for an archive of outdated/obsolete information about the WormBase gene association file.

Gene Association File (GAF) format

The original Gene Association File (GAF) format was specified within the Gene Ontology consortium to specify how gene associations to Gene Ontology (GO) terms would be reported in a tab-delimited download file.

To view the (now deprecated) GAF 2.0 format specification, visit:

http://geneontology.org/docs/go-annotation-file-gaf-format-2.0/

To view the (now stale, but not quite deprecated) GAF 2.1 format specification, visit:

http://geneontology.org/docs/go-annotation-file-gaf-format-2.1/

To view the latest GAF 2.2 format specification, visit:

https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md

WormBase ontology gene association files

WormBase provides a gene association file for each of the various ontologies that WormBase uses, essentially providing a single annotation associating a gene to an ontology term on a single row of the tab-delimited output file.

Current WormBase gene association files can always be found on the WormBase FTP site here:

ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/

Anatomy association file

The anatomy association file (associating genes to anatomical entities where the gene product has been reported to be expressed) has a general name like:

anatomy_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

anatomy_association.WS277.wb

The current header for the anatomy association file is

!gaf-version: 2.0
!Project_name: WormBase
!Contact Email: help@wormbase.org

The proposal (Feb 2021) is to update this header to:

!gaf-version: 2.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!Contact Email: help@wormbase.org

The WormBase anatomy association file generally follows the GAF 2.0 format with the following modifications:

  • The column 4 "Qualifier" is one of four values specific to gene expression annotation: Certain, Uncertain, Partial, Enriched
  • Column 5 reports an anatomy term ID from the WormBase anatomy ontology, e.g. "WBbt:0003679"
  • Column 6 displays the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
  • Column 7 defaults to the evidence code "IDA" for inferred from direct assay
  • Column 8 displays the identifier of the expression pattern annotation (e.g. "WB:Expr2275") or the expression cluster annotation (e.g. "WB:WBPaper00051039:germline_enriched")
  • Column 9 defaults to "A" for anatomy (as opposed to one of three branches of GO, "P", "F", or "C")
  • Column 10 is blank
  • Column 11 provides the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
  • Column 12 is "gene"
  • Column 13 is "taxon:6239"
  • Column 15 is "WB"
  • No column 16 or 17

Development (life stage) association file

The development (life stage) association file (associating genes to life stages when the gene product has been reported to be expressed) has a general name like:

development_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

development_association.WS277.wb

The current development association file has no header.

The proposal (Feb 2021) is to add the following header:

!gaf-version: 2.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!Contact Email: help@wormbase.org

The WormBase development association file generally follows the GAF 2.0 format with the following modifications:

  • The column 4 "Qualifier" will either be blank or have the term "Anatomy_term" indicating that there is anatomical context for the life stage expression annotation
  • Column 5 will report a life stage term ID from the WormBase development/life-stage ontology, e.g. "WBls:0000038"
  • Column 6 will display the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
  • Column 7 will default to the evidence code "IDA" for inferred from direct assay
  • Column 8 will display the identifier of the expression pattern annotation (e.g. "WB:Expr2275")
  • Column 9 will default to "L" for life stage (as opposed to one of three branches of GO, "P", "F", or "C")
  • Column 10 will always be blank
  • Column 11 will provide the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
  • Column 12 will always be "gene"
  • Column 13 will always be "taxon:6239"
  • Column 15 will always be "WB"
  • No column 16 or 17

Disease association file

The disease association file (associating genes to human diseases the genes have been implicated in) has a general name like:

disease_association.WSXXX.daf.txt

where WSXXX refers to the relevant release number. The file for the WS279 release of WormBase is called:

disease_association.WS279.daf.txt

The current header for the disease association file is

!daf-version 1.0
!Date: 2020-10-25T14:50:50+00:00
!Project_name: WormBase (WB) Version WS279
!URL: http://www.wormbase.org/
!Contact Email: wormbase-help@wormbase.org
!Funding: NHGRI at US NIH, grant number U41 HG002223

The proposal (Feb 2021) is to update this header to:

!daf-version 1.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!Contact Email: help@wormbase.org
!Funding: NHGRI at US NIH, grant number U41 HG002223

The file also contains column headers.

The WormBase disease association file has a GAF-like but distinct format:

  • Column 1 (analagous to GAF column 13) will be the NCBI taxon ID; for C. elegans this is "6239"
  • Column 2 (analagous to GAF column 12) will display the type of object the annotation is being made to: allele, gene, genotype, strain, or transgene
  • Column 3 (analagous to GAF column 1) will display "WB" or "WBTransgene" as the source database
  • Column 4 (analagous to GAF column 2) will display the ID of the object the annotation is being made to, e.g. "WBGene00011936"
  • Column 5 (analagous to GAF column 3) will display the symbol of the object the annotation is being made to, e.g. "pgrn-1"
  • Column 6 will display the inferred gene association (i.e. the gene associated with the DB object in column 4)
  • Column 7 (analagous to GAF column 17) will display the gene product form ID
  • Column 8 will display experimental conditions, for example molecules used in the disease model
  • Column 9 will display the association type, e.g. "is_implicated_in" or "is_model_of"
  • Column 10 (analagous to GAF column 4) "Qualifier" will display "NOT" for not observed disease models
  • Column 11 (analagous to GAF column 5) will report an disease (DO) term ID from the Human disease ontology, e.g. "DOID:206"
  • Column 12 (analagous to GAF column 8) will display the "With" term
  • Column 13 will display the modifier association type, e.g. "condition_ameliorated_by"
  • Column 14 will display the modifier qualifier "not" when the disease model is not modified by the indicated modifier
  • Column 15 will display genetic modifiers, e.g. "WB:WBVar00600790"
  • Column 16 will display experimental condition modifiers, e.g. molecules used to modify a disease model
  • Column 17 (analagous to GAF column 7) will display the evidence code from the ECO ontology
  • Column 18 will display the genetic sex of the disease model
  • Column 19 (analagous to GAF column 6) will display the reference for the annotation as, for example, "PMID:17237233"
  • Column 20 (analagous to GAF column 14) will display the date of the annotation
  • Column 21 (analagous to GAF column 15) will display the database that assigned the annotation

Gene Ontology (GO) gene association file

The Gene Ontology gene association file (associating genes to Gene Ontology terms) has a general name like:

gene_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

gene_association.WS277.wb

The current header for the gene association file is

!gaf-version: 2.2
!generated-by: WormBase
!date-generated: 2020-10-30

The proposal (Feb 2021) is to update this header to:

!gaf-version: 2.2
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md
!project-release: WS277
!Contact Email: help@wormbase.org

The WormBase gene association file conforms to the GAF 2.2 format:

https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md

Phenotype association file

The phenotype association file (associating genes to phenotype terms) has a general name like:

phenotype_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

phenotype_association.WS277.wb

The current header for the phenotype association file is

!gaf-version: 2.0
!Project_name: WormBase
!Contact Email: help@wormbase.org

The proposal (Feb 2021) is to update this header to:

!gaf-version: 2.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!Contact Email: help@wormbase.org

The WormBase phenotype association file generally follows the GAF 2.0 format with the following modifications:

  • The column 4 "Qualifier" will be blank or will be "NOT" to indicate an unobserved phenotype
  • Column 5 will report an phenotype term ID from the WormBase phenotype ontology, e.g. "WBPhenotype:0000061"
  • Column 6 will display the reference for the annotation as, for example, "WB_REF:WBPaper00005729"; in cases in which there is no publication reference, the column 8 ("With" column) entry, e.g. "WB:WBVar00249743", will be deposited here instead of column 8; this inconsistency will be addressed soon
  • Column 7 will default to the evidence code "IMP" for inferred from mutant phenotype
  • Column 8 will display the "With" column entry, e.g. the allele used "WB:WBVar00091444"
  • Column 9 will default to "P" for phenotype (as opposed to one of three branches of GO, "P", "F", or "C")
  • Column 11 will provide the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
  • Column 12 will always be "gene"
  • Column 13 will always be "taxon:6239"
  • Column 15 will always be "WB"
  • No column 16 or 17

RNAi phenotype association file

The RNAi phenotype association file (associating genes to RNAi phenotypes) has a general name like:

rnai_phenotypes.WS277.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

rnai_phenotypes.WS277.wb

The RNAi phenotype association file currently has no header.

The proposal (Feb 2021) is to add the following header:

!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!Contact Email: help@wormbase.org

The WormBase RNAi phenotype association file has a distinct format:

  • Column 1 displays the WormBase gene ID of the gene with the RNAi phenotype, e.g. "WBGene00019433"
  • Column 2 displays the WormBase gene sequence name of the gene in column 1
  • Column 3 displays the label of the phenotype term, e.g. "peptide uptake by intestinal cell decreased"
  • Column 4 displays the ID of the phenotype term from the WB phenotype ontology, e.g. "WBPhenotype:0002082"
  • Column 5 displays a space-separated list of WBRNAi/WBPaper IDs, each of which is bar separated, e.g. "WBRNAi00107985|WBPaper00042184 WBRNAi00107986|WBPaper00042184"