Difference between revisions of "Specifications for a DAF for gene-disease data"

From WormBaseWiki
Jump to navigationJump to search
 
(11 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
https://docs.google.com/spreadsheets/d/1PrUI8CwV7AejBloWG2e6SNXEz_FSpCeMEwHhcKE9KmU/edit?usp=sharing and scroll to the tab 'DAF spec-v2 (for developers)'
 
https://docs.google.com/spreadsheets/d/1PrUI8CwV7AejBloWG2e6SNXEz_FSpCeMEwHhcKE9KmU/edit?usp=sharing and scroll to the tab 'DAF spec-v2 (for developers)'
 
*Annotations from the computational pipeline that WB currently has for assigning 'Potential models' will not be included in the MOD-specific DAFs.  The plan is to have a separate AGR orthology pipeline to link human genes and their orthologs from model organisms.
 
*Annotations from the computational pipeline that WB currently has for assigning 'Potential models' will not be included in the MOD-specific DAFs.  The plan is to have a separate AGR orthology pipeline to link human genes and their orthologs from model organisms.
*Note that all models are not in place yet in WB to curate the data, this data is denoted by 'Coming soon', mos of this data is Optional, anyway.
+
 
  
 
*Header for file:
 
*Header for file:
Line 26: Line 26:
 
==Changes to DAF==
 
==Changes to DAF==
 
====Changes official as of the DWG-DQM call, July 5th, 2017====
 
====Changes official as of the DWG-DQM call, July 5th, 2017====
*Column 3: DB: which was the database from which the identifiers in 'DB object ID' and 'DB Object Symbol' are drawn, is now deleted
+
*Column 3: DB: which was the database from which the identifiers in 'DB object ID' and 'DB Object Symbol' are drawn, is now deleted (Column 3 is now DB Object ID)
*Column 8: is new, for additional genetic components
+
*Column 4: DB Object Symbol: Doesn't need the database prefix 'WB:" as it's tied to the ObjectID field which does
*Column 10: Association Type has only 3 allowed values, these values are no longer valid: causes_or_contributes _to_condition, causes_condition, contributes_to_condition
+
*Column 7: is new, for additional genetic components, note that DQM's script will fill in the symbol for the required format: <DB_object_id>;<DB_object_symbol>
*Column 13: With Ortholog, mostly for SGD
+
*Column 9: Association Type has only 3 allowed values, these values are no longer valid: causes_or_contributes _to_condition, causes_condition, contributes_to_condition
*Column 5: DB Object Symbol: Doesn't need the database prefix 'WB:" as it's tied to the ObjectID field which does
+
*Column 12: With Ortholog, mostly for SGD
 +
 
 +
====Rules for DAF/JSON====
 +
#Selecting DB Object:
 +
#*In the majority of our  annotations Disease_relevant_gene is going to be present, pick this (Gene) as the DB Object for DAF/JSON if Strain, Variation, Transgene are all absent.
 +
#*If either Strain, Variation or Transgene is present, pick that as the DB Object for DAF, even though Disease_relevant gene is present.  If more than one of these is present, then the order should be Strain>Variation>Transgene.
 +
#*Note that Inferred Gene will most likely be present for a majority of our annotations, it's value which is a gene will be redundant with the Disease relevant gene.  This is okay and required for AGR.
 +
#*Will clean up the data so that only one of Strain, Variation or Transgene is present.
 +
#For WS262:
 +
#*will have to hack this for DAF/JSON: Change Association type: causes_condition, causes_or_contributes_to_condition, and contributes_to_condition to is_implicated_in (all three have been deprecated by AGR DWG), this will be changed in the data for WS262.
  
==DAF columns: definitions and content==
+
====DAF columns: definitions and content====
 
{|Class="wikitable"
 
{|Class="wikitable"
 
|+DAF columns, allowed values and mapping of columns from old WB DAF
 
|+DAF columns, allowed values and mapping of columns from old WB DAF
Line 42: Line 51:
 
|2 ||DB Object Type||R||1||gene, allele, transgene, genotype, fish||The type of object being annotated||2||Modeled_by||WB uses gene, alllele, strain or transgene
 
|2 ||DB Object Type||R||1||gene, allele, transgene, genotype, fish||The type of object being annotated||2||Modeled_by||WB uses gene, alllele, strain or transgene
 
|-
 
|-
|3 ||DB Object ID||R||1||WB:WBGene00004887||A unique identifier from the database in DB for the entity being annotated||2|| ||Modeled_by||
+
|3||DB||R||1||WB||The database from which the identifiers in 'DB object ID' and 'DB Object Symbol' are drawn|| ||No Acedb tag||WB ||
 
|-
 
|-
|4 ||DB Object Symbol||R||1||smn-1||A (unique and valid) symbol to which DB object ID is matched||3||  ||Modeled_by||
+
|4 ||DB Object ID||R||1|WBGene00004887||A unique identifier from the database in DB for the entity being annotated; note, no DB prefix||2||Modeled_by||DB:<ID>
 
|-
 
|-
|5 ||Inferred Gene Association||O||0 or greater||WB:WBGene00004887||Database ID for inferred gene/marker association that can be made based on the DB object ID||Does not exist ||Inferred_gene||WBGene ID; repeat WBGene ID here even if 'DB Object Type' is gene
+
|5 ||DB Object Symbol||R||1||smn-1||A (unique and valid) symbol to which DB object ID is matched||3||Modeled_by||<symbol>
 
|-
 
|-
|6 ||Gene Product Form ID||O||0 or 1||UniProtKB id or PRO ID||this field allows the annotation <br/>of specific variants of that gene or gene product||Does not exist||No acedb tag||Blank, WB not using this for now
+
|6 ||Inferred Gene Association||O||0 or greater||WB:WBGene00004887||Database ID for inferred gene/marker association that can be made based on the DB object ID||Does not exist ||Inferred_gene||WBGene ID; repeat WBGene ID here even if 'DB Object Type' is gene
 
|-
 
|-
|7 ||Additional genetic compoments||O||0 or greater||WB:WBGene00004887;smn-1||Specifies additional genetic components of the model<br/>(where DB Object Type is not gentotype/strain/fish)||Does not exist||Interacting_variation, Interacting_transgene ||DB:<object ID>;<object symbol> (multiple values should be comma separated, implying and). <br/>Note that <object symbol> is automatically filled in by DQM script
+
|7 ||Gene Product Form ID||O||0 or 1||UniProtKB id or PRO ID||this field allows the annotation <br/>of specific variants of that gene or gene product||Does not exist||No acedb tag||Blank, WB not using this for now
 
|-
 
|-
 
|8||Experimental Conditions <br/> (to create the model)||O||0 or greater||standard conditions<br/>chemical/drug treatments (ChEBI ID, ZECO)<br/>dietary manipulations (specifiy entity)<br/>surgery/amputation<br/>bacterial/virus exposure (taxon ID)||Experimental/environmental (i.e. non-genetic) conditions <br/>required for the model, used particularly for induced models||Does not exist||Inducing_chemical, Inducing_agent||WB is using free text for now, will move to using an ontology soon
 
|8||Experimental Conditions <br/> (to create the model)||O||0 or greater||standard conditions<br/>chemical/drug treatments (ChEBI ID, ZECO)<br/>dietary manipulations (specifiy entity)<br/>surgery/amputation<br/>bacterial/virus exposure (taxon ID)||Experimental/environmental (i.e. non-genetic) conditions <br/>required for the model, used particularly for induced models||Does not exist||Inducing_chemical, Inducing_agent||WB is using free text for now, will move to using an ontology soon
Line 60: Line 69:
 
|11||DO ID||R||1||DOID:12858||DO identifier for disease||5||Disease_term||
 
|11||DO ID||R||1||DOID:12858||DO identifier for disease||5||Disease_term||
 
|-
 
|-
|12||With Ortholog||O||0 or greater||DB:gene_symbol<br/>DB:gene_id<br/>DB:gene_symbol[allele_symbol]<br/>DB:allele_id||EITHER specifies additional genetic components of the model (where DB Object Type is not genotype/strain/fish)OR<br/>specifies the orthologous (usually human) gene in annotations with ‘ISS/ISO’ evidence code||Does not exist ||No acedb tag||Blank, mostly for SGD, mandatory for ISS/ISO evidence codes
+
|12||With||O||0 or greater||DB:gene_symbol<br/>DB:gene_id<br/>DB:gene_symbol[allele_symbol]<br/>DB:allele_id||EITHER specifies additional genetic components of the model (where DB Object Type is not genotype/strain/fish)OR<br/>specifies the orthologous (usually human) gene in annotations with ‘ISS/ISO’ evidence code||Does not exist ||No acedb tag||Blank, mostly for SGD, mandatory for ISS/ISO evidence codes
 
|-
 
|-
 
|13||Modifier Association Type||O||0 or 1||condition_ameliorated_by <br/> condition_exacerbated_by||Relationship between the modifier and the disease model||Does not exist||condition_ameliorated_by <br/>condition_exacerbated_by||WB has started using
 
|13||Modifier Association Type||O||0 or 1||condition_ameliorated_by <br/> condition_exacerbated_by||Relationship between the modifier and the disease model||Does not exist||condition_ameliorated_by <br/>condition_exacerbated_by||WB has started using

Latest revision as of 18:53, 24 January 2019

New DAF specs as specified by the AGR Disease Working Group

  • These specifications have been discussed and specified by the Disease Working Group of the AGR

https://docs.google.com/spreadsheets/d/1PrUI8CwV7AejBloWG2e6SNXEz_FSpCeMEwHhcKE9KmU/edit?usp=sharing and scroll to the tab 'DAF spec-v2 (for developers)'

  • Annotations from the computational pipeline that WB currently has for assigning 'Potential models' will not be included in the MOD-specific DAFs. The plan is to have a separate AGR orthology pipeline to link human genes and their orthologs from model organisms.


  • Header for file:
!daf-version: 0.1
!Date: 02/08/2017
!Project_name: WormBase (WB) Version WS258
!URL: http://www.wormbase.org/
!Contact Email: wormbase-help@wormbase.org
!Funding: NHGRI at US NIH, grant number U41 HG002223
  • Use column headers, from 'Content' column on tab 'DAF spec-v2 (for developers)' here:

https://docs.google.com/spreadsheets/d/1PrUI8CwV7AejBloWG2e6SNXEz_FSpCeMEwHhcKE9KmU/edit?usp=sharing

  • Name of file: disease_association.WB.WS258.txt
  • Source of DO ID translation to name should be from the DO file:

raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/master/src/ontology/doid-merged.obo

Changes to DAF

Changes official as of the DWG-DQM call, July 5th, 2017

  • Column 3: DB: which was the database from which the identifiers in 'DB object ID' and 'DB Object Symbol' are drawn, is now deleted (Column 3 is now DB Object ID)
  • Column 4: DB Object Symbol: Doesn't need the database prefix 'WB:" as it's tied to the ObjectID field which does
  • Column 7: is new, for additional genetic components, note that DQM's script will fill in the symbol for the required format: <DB_object_id>;<DB_object_symbol>
  • Column 9: Association Type has only 3 allowed values, these values are no longer valid: causes_or_contributes _to_condition, causes_condition, contributes_to_condition
  • Column 12: With Ortholog, mostly for SGD

Rules for DAF/JSON

  1. Selecting DB Object:
    • In the majority of our annotations Disease_relevant_gene is going to be present, pick this (Gene) as the DB Object for DAF/JSON if Strain, Variation, Transgene are all absent.
    • If either Strain, Variation or Transgene is present, pick that as the DB Object for DAF, even though Disease_relevant gene is present. If more than one of these is present, then the order should be Strain>Variation>Transgene.
    • Note that Inferred Gene will most likely be present for a majority of our annotations, it's value which is a gene will be redundant with the Disease relevant gene. This is okay and required for AGR.
    • Will clean up the data so that only one of Strain, Variation or Transgene is present.
  2. For WS262:
    • will have to hack this for DAF/JSON: Change Association type: causes_condition, causes_or_contributes_to_condition, and contributes_to_condition to is_implicated_in (all three have been deprecated by AGR DWG), this will be changed in the data for WS262.

DAF columns: definitions and content

DAF columns, allowed values and mapping of columns from old WB DAF
Column Content Required(R)
Optional(O)
Cardinality Example Definition Old DAF column Acedb tag Required format/Restriction
1 Taxon R 1 taxon:6239 NCBI taxonomic identifier for the organism 13 No acedb tag taxon:####
2 DB Object Type R 1 gene, allele, transgene, genotype, fish The type of object being annotated 2 Modeled_by WB uses gene, alllele, strain or transgene
3 DB R 1 WB The database from which the identifiers in 'DB object ID' and 'DB Object Symbol' are drawn No Acedb tag WB
4 DB Object ID R WBGene00004887 A unique identifier from the database in DB for the entity being annotated; note, no DB prefix 2 Modeled_by DB:<ID>
5 DB Object Symbol R 1 smn-1 A (unique and valid) symbol to which DB object ID is matched 3 Modeled_by <symbol>
6 Inferred Gene Association O 0 or greater WB:WBGene00004887 Database ID for inferred gene/marker association that can be made based on the DB object ID Does not exist Inferred_gene WBGene ID; repeat WBGene ID here even if 'DB Object Type' is gene
7 Gene Product Form ID O 0 or 1 UniProtKB id or PRO ID this field allows the annotation
of specific variants of that gene or gene product
Does not exist No acedb tag Blank, WB not using this for now
8 Experimental Conditions
(to create the model)
O 0 or greater standard conditions
chemical/drug treatments (ChEBI ID, ZECO)
dietary manipulations (specifiy entity)
surgery/amputation
bacterial/virus exposure (taxon ID)
Experimental/environmental (i.e. non-genetic) conditions
required for the model, used particularly for induced models
Does not exist Inducing_chemical, Inducing_agent WB is using free text for now, will move to using an ontology soon
9 Association Type R 1 is_model_of
is_implicatd_in
is_marker_for
Relationship between the DB object and the disease does not exist Association_type is_model_of' only for DB object type: 'genotype, strain, organism, Fish'
is_implicated in only for alleles, genes, transgenes
model change did not make into WS261, so for now replace: causes_or_contributes _to_condition, causes_condition, contributes_to_condition with 'is_implicated_in'
10 Qualifier O 0 or 1 NOT Used to indicate that the DB object is not
associated with the DO term/association type
Does not exist Qualifier NOT
11 DO ID R 1 DOID:12858 DO identifier for disease 5 Disease_term
12 With O 0 or greater DB:gene_symbol
DB:gene_id
DB:gene_symbol[allele_symbol]
DB:allele_id
EITHER specifies additional genetic components of the model (where DB Object Type is not genotype/strain/fish)OR
specifies the orthologous (usually human) gene in annotations with ‘ISS/ISO’ evidence code
Does not exist No acedb tag Blank, mostly for SGD, mandatory for ISS/ISO evidence codes
13 Modifier Association Type O 0 or 1 condition_ameliorated_by
condition_exacerbated_by
Relationship between the modifier and the disease model Does not exist condition_ameliorated_by
condition_exacerbated_by
WB has started using
14 Modifier - qualifier O 0 or 1 NOT Used to indicate DB object is not
associated with the DO term/association type
Does not exist Need to add tag for WS262 WB has started using
15 Modifier (genetic) O 0 or greater DB:gene_symbol
DB:gene_symbol(allele_symbol)
DB:gene_id
DB:allele_id
Specifies a genetic object (allele or gene) that modifies the disease model Does not exist Modifier_transgene
Modifier_variation
Modifier_strain
Modifier_gene
WB will start using
16 Modifier - experimental conditions O 0 or greater standard conditions
chemical/drug treatments (ChEBI ID, ZECO)
dietary manipulations (specifiy entity)
surgery/amputation
bacterial/virus exposure (taxon ID)
Specifies a non-genetic object experimental condition
that modifies the disease model
Does not exist Modifier_molecule
Other_modifier
WB will start using
17 Evidence Code R 1 or greater EXP, IMP, IPM, IGI, IDA, IED, IEP,
IAGP, ISS, ISO, TAS, IC, IEA
From GO: Indicates the kind of evidence in the cited source that supports the disease annotation.
If reference describes multiple methods that each provide evidence, then multiple annotations should be made with same DO term and different evidence codes
7 Evidence_code 'IMP' for now, other evidence codes coming soon
18 Genetic sex O 0 or 1 male/female/hermaphrodite Genetic sex of the model Does not exist Genetic_sex Use value 'Hermaphrodite' for all rows
May use 'male' or 'female' in the future if authors specify
19 DB:Reference R 1 PMID:14978262 unique identifier(s) for a single source cited as an authority for the attribution of the DO ID to the DB object ID 6 Paper_evidence Use only PMIDs for AGR 0.3,
will get to individual DB identifiers later on
20 Date R 1 20090118 Date on which the annotation was made; format is YYYYMMDD 14 Date_last_updated
21 Assigned By R 1 WB The database which made the annotation - one of the values from the set of GO database cross-references 15 No acedb tag

Old pre-AGR WormBase DAF

DAF 2.0 for gene-disease data includes all genes with the Experimental_model and/or Potential_model tags.

Format: The gene-disease association file is a 17 column tab-delimited file, where 11 columns have to have data and 6 are optional.


Mapping of GAF column to gene-disease data
Column Column Name Required? Cardinality Example
1 DB required 1 WB
2 DB Object ID required 1 WBGene00007799
3 DB Object Symbol required 1 nrx-1
4 Qualifier optional 0 or greater (no value, leave empty)
5 GO ID required 1 DOID:0060041
6 DB:Reference (|DB:Reference) required 1 or greater
(separate values with pipes)
WBPaper00041363|WBPaperXXXXXXXX
7 Evidence code required 1 IMP or ISS (use 'IMP' for Experimental_model
genes and ISS for Potential_model genes)
8 With (or) From optional 0 or greater
(separate values with pipes)
OMIM:600565|OMIM:600566
9 Aspect required 1 D (D for disease ontology, all annotations
have this value)
10 DB Object Name optional 0 or 1 (no value, leave empty)
11 DB Object Synonym (|Synonym) optional 0 or greater (no value, leave empty)
12 DB Object Type required 1 gene (all annotations have this value)
13 Taxon required 1 or 2 taxon:6239
14 Date required 1 20130422 (this is the date of annotation dumped under Date_last_updated;
for potential_model genes, use the date on which the OMIM homology script is run)
15 Assigned By required 1 WB
16 Annotation Extension optional 0 or greater (no value, leave empty)
17 Gene Product Form ID optional 0 or greater (no value, leave empty)



Back To Disease and Drugs