Difference between revisions of "WormBase gene association file"

From WormBaseWiki
Jump to navigationJump to search
 
(31 intermediate revisions by the same user not shown)
Line 16: Line 16:
  
 
https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md
 
https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md
 +
  
 
== WormBase ontology gene association files ==
 
== WormBase ontology gene association files ==
Line 24: Line 25:
  
 
ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/
 
ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/
 +
  
 
=== Anatomy association file ===
 
=== Anatomy association file ===
Line 34: Line 36:
  
 
  anatomy_association.WS277.wb
 
  anatomy_association.WS277.wb
 +
 +
The current header for the anatomy association file is
 +
 +
!gaf-version: 2.0
 +
!Project_name: WormBase
 +
!Contact Email: help@wormbase.org
 +
 +
The proposal (Feb 2021) is to update this header to:
 +
 +
!gaf-version: 2.0
 +
!generated-by: WormBase
 +
!date-generated: 2020-05-29
 +
!project-URL: https://wormbase.org
 +
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
 +
!project-release: WS277
 +
!contact-email: help@wormbase.org
 +
 +
The WormBase anatomy association file generally follows the GAF 2.0 format with the following modifications:
 +
 +
* The column 4 "Qualifier" is one of four values specific to gene expression annotation: Certain, Uncertain, Partial, Enriched
 +
* Column 5 reports an anatomy term ID from the WormBase anatomy ontology, e.g. "WBbt:0003679"
 +
* Column 6 displays the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
 +
* Column 7 defaults to the evidence code "IDA" for inferred from direct assay
 +
* Column 8 displays the identifier of the expression pattern annotation (e.g. "WB:Expr2275") or the expression cluster annotation (e.g. "WB:WBPaper00051039:germline_enriched")
 +
* Column 9 defaults to "A" for anatomy (as opposed to one of three branches of GO, "P", "F", or "C")
 +
* Column 10 is blank
 +
* Column 11 provides the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
 +
* Column 12 is "gene"
 +
* Column 13 is "taxon:6239"
 +
* Column 15 is "WB"
 +
* No column 16 or 17
 +
 +
 +
=== Development (life stage) association file ===
 +
 +
The development (life stage) association file (associating genes to life stages when the gene product has been reported to be expressed) has a general name like:
 +
 +
development_association.WSXXX.wb
 +
 +
where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:
 +
 +
development_association.WS277.wb
 +
 +
The current development association file has no header.
 +
 +
The proposal (Feb 2021) is to add the following header:
 +
 +
!gaf-version: 2.0
 +
!generated-by: WormBase
 +
!date-generated: 2020-05-29
 +
!project-URL: https://wormbase.org
 +
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
 +
!project-release: WS277
 +
!contact-email: help@wormbase.org
 +
 +
The WormBase development association file generally follows the GAF 2.0 format with the following modifications:
 +
 +
* The column 4 "Qualifier" is either blank or has the term "Anatomy_term" indicating that there is anatomical context for the life stage expression annotation
 +
* Column 5 reports a life stage term ID from the WormBase development/life-stage ontology, e.g. "WBls:0000038"
 +
* Column 6 displays the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
 +
* Column 7 defaults to the evidence code "IDA" for inferred from direct assay
 +
* Column 8 displays the identifier of the expression pattern annotation (e.g. "WB:Expr2275")
 +
* Column 9 defaults to "L" for life stage (as opposed to one of three branches of GO, "P", "F", or "C")
 +
* Column 10 is blank
 +
* Column 11 provides the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
 +
* Column 12 is "gene"
 +
* Column 13 is "taxon:6239"
 +
* Column 15 is "WB"
 +
* No column 16 or 17
 +
 +
 +
=== Disease association file ===
 +
 +
The disease association file (associating genes to human diseases the genes have been implicated in) has a general name like:
 +
 +
disease_association.WSXXX.daf.txt
 +
 +
where WSXXX refers to the relevant release number. The file for the WS279 release of WormBase is called:
 +
 +
disease_association.WS279.daf.txt
 +
 +
The current header for the disease association file is
 +
 +
!daf-version 1.0
 +
!Date: 2020-10-25T14:50:50+00:00
 +
!Project_name: WormBase (WB) Version WS279
 +
!URL: http://www.wormbase.org/
 +
!Contact Email: wormbase-help@wormbase.org
 +
!Funding: NHGRI at US NIH, grant number U41 HG002223
 +
 +
The proposal (Feb 2021) is to update this header to:
 +
 +
!daf-version 1.0
 +
!generated-by: WormBase
 +
!date-generated: 2020-05-29
 +
!project-URL: https://wormbase.org
 +
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
 +
!project-release: WS277
 +
!contact-email: help@wormbase.org
 +
!funding: NHGRI at US NIH, grant number U41 HG002223
 +
 +
The file also contains column headers.
 +
 +
The WormBase disease association file has a GAF-like but distinct format:
 +
* Column 1 (analagous to GAF column 13) is the NCBI taxon ID; for C. elegans this is "6239"
 +
* Column 2 (analagous to GAF column 12) displays the type of object the annotation is being made to: allele, gene, genotype, strain, or transgene
 +
* Column 3 (analagous to GAF column 1) displays "WB" or "WBTransgene" as the source database
 +
* Column 4 (analagous to GAF column 2) displays the ID of the object the annotation is being made to, e.g. "WBGene00011936"
 +
* Column 5 (analagous to GAF column 3) displays the symbol of the object the annotation is being made to, e.g. "pgrn-1"
 +
* Column 6 displays the inferred gene association (i.e. the gene associated with the DB object in column 4)
 +
* Column 7 (analagous to GAF column 17) displays the gene product form ID
 +
* Column 8 displays experimental conditions, for example molecules used in the disease model
 +
* Column 9 displays the association type, e.g. "is_implicated_in" or "is_model_of"
 +
* Column 10 (analagous to GAF column 4) "Qualifier" displays "NOT" for not observed disease models
 +
* Column 11 (analagous to GAF column 5) reports an disease (DO) term ID from the Human disease ontology, e.g. "DOID:206"
 +
* Column 12 (analagous to GAF column 8) displays the "With" term
 +
* Column 13 displays the modifier association type, e.g. "condition_ameliorated_by"
 +
* Column 14 displays the modifier qualifier "not" when the disease model is not modified by the indicated modifier
 +
* Column 15 displays genetic modifiers, e.g. "WB:WBVar00600790"
 +
* Column 16 displays experimental condition modifiers, e.g. molecules used to modify a disease model
 +
* Column 17 (analagous to GAF column 7) displays the evidence code from the ECO ontology
 +
* Column 18 displays the genetic sex of the disease model
 +
* Column 19 (analagous to GAF column 6) displays the reference for the annotation as, for example, "PMID:17237233"
 +
* Column 20 (analagous to GAF column 14) displays the date of the annotation
 +
* Column 21 (analagous to GAF column 15) displays the database that assigned the annotation
 +
 +
 +
=== Gene Ontology (GO) gene association file ===
 +
 +
The Gene Ontology gene association file (associating genes to Gene Ontology terms) has a general name like:
 +
 +
gene_association.WSXXX.wb
 +
 +
where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:
 +
 +
gene_association.WS277.wb
 +
 +
The current header for the gene association file is
 +
 +
!gaf-version: 2.2
 +
!generated-by: WormBase
 +
!date-generated: 2020-10-30
 +
 +
The proposal (Feb 2021) is to update this header to:
 +
 +
!gaf-version: 2.2
 +
!generated-by: WormBase
 +
!date-generated: 2020-05-29
 +
!project-URL: https://wormbase.org
 +
!specification-URL: https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md
 +
!project-release: WS277
 +
!contact-email: help@wormbase.org
 +
 +
The WormBase gene association file conforms to the GAF 2.2 format:
 +
 +
https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md
 +
 +
 +
=== Phenotype association file ===
 +
 +
The phenotype association file (associating genes to phenotype terms) has a general name like:
 +
 +
phenotype_association.WSXXX.wb
 +
 +
where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:
 +
 +
phenotype_association.WS277.wb
 +
 +
The current header for the phenotype association file is
 +
 +
!gaf-version: 2.0
 +
!Project_name: WormBase
 +
!Contact Email: help@wormbase.org
 +
 +
The proposal (Feb 2021) is to update this header to:
 +
 +
!gaf-version: 2.0
 +
!generated-by: WormBase
 +
!date-generated: 2020-05-29
 +
!project-URL: https://wormbase.org
 +
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
 +
!project-release: WS277
 +
!contact-email: help@wormbase.org
 +
 +
The WormBase phenotype association file generally follows the GAF 2.0 format with the following modifications:
 +
 +
* The column 4 "Qualifier" is blank or has "NOT" to indicate an unobserved phenotype
 +
* Column 5 reports an phenotype term ID from the WormBase phenotype ontology, e.g. "WBPhenotype:0000061"
 +
* Column 6 displays the reference for the annotation as, for example, "WB_REF:WBPaper00005729"; in cases in which there is no publication reference, the column 8 ("With" column) entry, e.g. "WB:WBVar00249743", is deposited here instead of column 8; this inconsistency will be addressed soon
 +
* Column 7 defaults to the evidence code "IMP" for inferred from mutant phenotype
 +
* Column 8 displays the "With" column entry, e.g. the allele used "WB:WBVar00091444"
 +
* Column 9 defaults to "P" for phenotype (as opposed to one of three branches of GO, "P", "F", or "C")
 +
* Column 11 provides the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
 +
* Column 12 is "gene"
 +
* Column 13 is "taxon:6239"
 +
* Column 15 is "WB"
 +
* No column 16 or 17
 +
 +
 +
=== RNAi phenotype association file ===
 +
 +
The RNAi phenotype association file (associating genes to RNAi phenotypes) has a general name like:
 +
 +
rnai_phenotypes.WS277.wb
 +
 +
where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:
 +
 +
rnai_phenotypes.WS277.wb
 +
 +
The RNAi phenotype association file currently has no header.
 +
 +
The proposal (Feb 2021) is to add the following header:
 +
 +
!generated-by: WormBase
 +
!date-generated: 2020-05-29
 +
!project-URL: https://wormbase.org
 +
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
 +
!project-release: WS277
 +
!contact-email: help@wormbase.org
 +
 +
The WormBase RNAi phenotype association file has a distinct format:
 +
 +
* Column 1 displays the WormBase gene ID of the gene with the RNAi phenotype, e.g. "WBGene00019433"
 +
* Column 2 displays the WormBase gene sequence name of the gene in column 1
 +
* Column 3 displays the label of the phenotype term, e.g. "peptide uptake by intestinal cell decreased"
 +
* Column 4 displays the ID of the phenotype term from the WB phenotype ontology, e.g. "WBPhenotype:0002082"
 +
* Column 5 displays a space-separated list of WBRNAi/WBPaper IDs, each of which is bar separated, e.g. "WBRNAi00107985|WBPaper00042184 WBRNAi00107986|WBPaper00042184"

Latest revision as of 17:45, 5 February 2021

This page represents the current information about the WormBase gene association file. Click here for an archive of outdated/obsolete information about the WormBase gene association file.

Gene Association File (GAF) format

The original Gene Association File (GAF) format was specified within the Gene Ontology consortium to specify how gene associations to Gene Ontology (GO) terms would be reported in a tab-delimited download file.

To view the (now deprecated) GAF 2.0 format specification, visit:

http://geneontology.org/docs/go-annotation-file-gaf-format-2.0/

To view the (now stale, but not quite deprecated) GAF 2.1 format specification, visit:

http://geneontology.org/docs/go-annotation-file-gaf-format-2.1/

To view the latest GAF 2.2 format specification, visit:

https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md


WormBase ontology gene association files

WormBase provides a gene association file for each of the various ontologies that WormBase uses, essentially providing a single annotation associating a gene to an ontology term on a single row of the tab-delimited output file.

Current WormBase gene association files can always be found on the WormBase FTP site here:

ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/


Anatomy association file

The anatomy association file (associating genes to anatomical entities where the gene product has been reported to be expressed) has a general name like:

anatomy_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

anatomy_association.WS277.wb

The current header for the anatomy association file is

!gaf-version: 2.0
!Project_name: WormBase
!Contact Email: help@wormbase.org

The proposal (Feb 2021) is to update this header to:

!gaf-version: 2.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!contact-email: help@wormbase.org

The WormBase anatomy association file generally follows the GAF 2.0 format with the following modifications:

  • The column 4 "Qualifier" is one of four values specific to gene expression annotation: Certain, Uncertain, Partial, Enriched
  • Column 5 reports an anatomy term ID from the WormBase anatomy ontology, e.g. "WBbt:0003679"
  • Column 6 displays the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
  • Column 7 defaults to the evidence code "IDA" for inferred from direct assay
  • Column 8 displays the identifier of the expression pattern annotation (e.g. "WB:Expr2275") or the expression cluster annotation (e.g. "WB:WBPaper00051039:germline_enriched")
  • Column 9 defaults to "A" for anatomy (as opposed to one of three branches of GO, "P", "F", or "C")
  • Column 10 is blank
  • Column 11 provides the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
  • Column 12 is "gene"
  • Column 13 is "taxon:6239"
  • Column 15 is "WB"
  • No column 16 or 17


Development (life stage) association file

The development (life stage) association file (associating genes to life stages when the gene product has been reported to be expressed) has a general name like:

development_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

development_association.WS277.wb

The current development association file has no header.

The proposal (Feb 2021) is to add the following header:

!gaf-version: 2.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!contact-email: help@wormbase.org

The WormBase development association file generally follows the GAF 2.0 format with the following modifications:

  • The column 4 "Qualifier" is either blank or has the term "Anatomy_term" indicating that there is anatomical context for the life stage expression annotation
  • Column 5 reports a life stage term ID from the WormBase development/life-stage ontology, e.g. "WBls:0000038"
  • Column 6 displays the reference for the annotation as, for example, "WB_REF:WBPaper00026980"; in some cases the reference is blank "WB_REF:"
  • Column 7 defaults to the evidence code "IDA" for inferred from direct assay
  • Column 8 displays the identifier of the expression pattern annotation (e.g. "WB:Expr2275")
  • Column 9 defaults to "L" for life stage (as opposed to one of three branches of GO, "P", "F", or "C")
  • Column 10 is blank
  • Column 11 provides the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
  • Column 12 is "gene"
  • Column 13 is "taxon:6239"
  • Column 15 is "WB"
  • No column 16 or 17


Disease association file

The disease association file (associating genes to human diseases the genes have been implicated in) has a general name like:

disease_association.WSXXX.daf.txt

where WSXXX refers to the relevant release number. The file for the WS279 release of WormBase is called:

disease_association.WS279.daf.txt

The current header for the disease association file is

!daf-version 1.0
!Date: 2020-10-25T14:50:50+00:00
!Project_name: WormBase (WB) Version WS279
!URL: http://www.wormbase.org/
!Contact Email: wormbase-help@wormbase.org
!Funding: NHGRI at US NIH, grant number U41 HG002223

The proposal (Feb 2021) is to update this header to:

!daf-version 1.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!contact-email: help@wormbase.org
!funding: NHGRI at US NIH, grant number U41 HG002223

The file also contains column headers.

The WormBase disease association file has a GAF-like but distinct format:

  • Column 1 (analagous to GAF column 13) is the NCBI taxon ID; for C. elegans this is "6239"
  • Column 2 (analagous to GAF column 12) displays the type of object the annotation is being made to: allele, gene, genotype, strain, or transgene
  • Column 3 (analagous to GAF column 1) displays "WB" or "WBTransgene" as the source database
  • Column 4 (analagous to GAF column 2) displays the ID of the object the annotation is being made to, e.g. "WBGene00011936"
  • Column 5 (analagous to GAF column 3) displays the symbol of the object the annotation is being made to, e.g. "pgrn-1"
  • Column 6 displays the inferred gene association (i.e. the gene associated with the DB object in column 4)
  • Column 7 (analagous to GAF column 17) displays the gene product form ID
  • Column 8 displays experimental conditions, for example molecules used in the disease model
  • Column 9 displays the association type, e.g. "is_implicated_in" or "is_model_of"
  • Column 10 (analagous to GAF column 4) "Qualifier" displays "NOT" for not observed disease models
  • Column 11 (analagous to GAF column 5) reports an disease (DO) term ID from the Human disease ontology, e.g. "DOID:206"
  • Column 12 (analagous to GAF column 8) displays the "With" term
  • Column 13 displays the modifier association type, e.g. "condition_ameliorated_by"
  • Column 14 displays the modifier qualifier "not" when the disease model is not modified by the indicated modifier
  • Column 15 displays genetic modifiers, e.g. "WB:WBVar00600790"
  • Column 16 displays experimental condition modifiers, e.g. molecules used to modify a disease model
  • Column 17 (analagous to GAF column 7) displays the evidence code from the ECO ontology
  • Column 18 displays the genetic sex of the disease model
  • Column 19 (analagous to GAF column 6) displays the reference for the annotation as, for example, "PMID:17237233"
  • Column 20 (analagous to GAF column 14) displays the date of the annotation
  • Column 21 (analagous to GAF column 15) displays the database that assigned the annotation


Gene Ontology (GO) gene association file

The Gene Ontology gene association file (associating genes to Gene Ontology terms) has a general name like:

gene_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

gene_association.WS277.wb

The current header for the gene association file is

!gaf-version: 2.2
!generated-by: WormBase
!date-generated: 2020-10-30

The proposal (Feb 2021) is to update this header to:

!gaf-version: 2.2
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md
!project-release: WS277
!contact-email: help@wormbase.org

The WormBase gene association file conforms to the GAF 2.2 format:

https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md


Phenotype association file

The phenotype association file (associating genes to phenotype terms) has a general name like:

phenotype_association.WSXXX.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

phenotype_association.WS277.wb

The current header for the phenotype association file is

!gaf-version: 2.0
!Project_name: WormBase
!Contact Email: help@wormbase.org

The proposal (Feb 2021) is to update this header to:

!gaf-version: 2.0
!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!contact-email: help@wormbase.org

The WormBase phenotype association file generally follows the GAF 2.0 format with the following modifications:

  • The column 4 "Qualifier" is blank or has "NOT" to indicate an unobserved phenotype
  • Column 5 reports an phenotype term ID from the WormBase phenotype ontology, e.g. "WBPhenotype:0000061"
  • Column 6 displays the reference for the annotation as, for example, "WB_REF:WBPaper00005729"; in cases in which there is no publication reference, the column 8 ("With" column) entry, e.g. "WB:WBVar00249743", is deposited here instead of column 8; this inconsistency will be addressed soon
  • Column 7 defaults to the evidence code "IMP" for inferred from mutant phenotype
  • Column 8 displays the "With" column entry, e.g. the allele used "WB:WBVar00091444"
  • Column 9 defaults to "P" for phenotype (as opposed to one of three branches of GO, "P", "F", or "C")
  • Column 11 provides the gene's WormBase sequence name (e.g. "Y41E3.4") if not already used in column 3
  • Column 12 is "gene"
  • Column 13 is "taxon:6239"
  • Column 15 is "WB"
  • No column 16 or 17


RNAi phenotype association file

The RNAi phenotype association file (associating genes to RNAi phenotypes) has a general name like:

rnai_phenotypes.WS277.wb

where WSXXX refers to the relevant release number. The file for the WS277 release of WormBase is called:

rnai_phenotypes.WS277.wb

The RNAi phenotype association file currently has no header.

The proposal (Feb 2021) is to add the following header:

!generated-by: WormBase
!date-generated: 2020-05-29
!project-URL: https://wormbase.org
!specification-URL: https://wiki.wormbase.org/index.php/WormBase_gene_association_file
!project-release: WS277
!contact-email: help@wormbase.org

The WormBase RNAi phenotype association file has a distinct format:

  • Column 1 displays the WormBase gene ID of the gene with the RNAi phenotype, e.g. "WBGene00019433"
  • Column 2 displays the WormBase gene sequence name of the gene in column 1
  • Column 3 displays the label of the phenotype term, e.g. "peptide uptake by intestinal cell decreased"
  • Column 4 displays the ID of the phenotype term from the WB phenotype ontology, e.g. "WBPhenotype:0002082"
  • Column 5 displays a space-separated list of WBRNAi/WBPaper IDs, each of which is bar separated, e.g. "WBRNAi00107985|WBPaper00042184 WBRNAi00107986|WBPaper00042184"