Model changes to capture and consolidate human disease data
A new tag ‘Disease_info’ proposed to consolidate disease-related data in WormBase:
?Gene DB_info Database ?Database ?Database_field Text Disease_info Experimental_model_for_human ?DO_term XREF Gene_by_biology #Evidence Potential_model_for_human ?DO_term XREF Gene_by_orthology #Evidence Human_disease_relevance ?Text #Evidence //moved from ‘structured description’ tag.
Model for Disease Ontology Term:
?DO_term Name UNIQUE ?Text Status UNIQUE Valid Obsolete Alternate_id ?Text Definition UNIQUE ?Text Comment Text Synonymn ?Text Scope_modifier UNIQUE Broad Exact Narrow Related Relationship Is_a_child ?DO_term XREF Is_a_parent Is_a_parent ?DO_term XREF Is_a_child DB_info Database ?Database ?Database_field Text Replaced_by ?DO_term Subset Text Created_by Text Creation_date Text Attribute_of Gene_by_biology ?Gene XREF DO_term Gene_by_orthology ?Gene XREF DO_term Phenotype ?Phenotype XREF DO_term WBProcess ?WBProcess XREF DO_term Reference ?Paper XREF DO_term Index Ancestor ?DO_term XREF Descendent Descendent ?DO_term XREF Ancestor Version UNIQUE Text
Paul D's suggestions/corrections for the 'Disease_info' tag under ?Gene::
- Add ?Species tag instead of 'Experimental_model_for_human' type tags, to indicate species information.
Suggestions/Corrections for the ?DO_term model:
- Drop 'Created_by' and 'Creation_date', not useful information to import, check original source of information if needed.
Paul:If there is a genuine reason to capture the complete file then by all means capture the data, it's just when boiling the data down leaves you with some scrap of data that isn't informative I would vote for not incorporating as the raw data is always available in the source file if someone really wants to see it.
Kevin:I would definitely advocate pick-and-choose over include-everything. We are not the maintainers/develops of this ontology, so I don't see why we should be recording the name of the curator responsible for adding a particular term, or precisely when they added it (to give two examples). This ontology is still in the early stages of development, and is likely to change subtly in definition. By capturing only the "core" parts of it, we give ourselves at least a fighting chance of defining a Acedb model that will not have to be changed each time they release a new version of the DO.
- For the 'subset' tag, follow standard procedure across models:
?DO_term Type GOLD gram-negative_bacterial_infectious_disease gram-positive_bacterial_infectious_disease sexually_transmitted_infectious_disease tick-borne_infectious_disease zoonotic_infectious_disease
Ideally this would have been UNIQUE but there are 178 DO_terms with multiple subset lines populated.
Or is there something better that maintains flexibility but controls the vocabulary better that Text
Ranjana--If both ?GO_term and ?DO_term were to use 'Subset' and follow this structure, for the sake of uniformity across ontology models, would this be fine or should I switch the tag to 'Type'?
- This tag structure isn't workable as there isn't a general DO_term tag in the ?Gene model and it's a 2:1.
?DO_term Attribute_of Gene_by_biology ?Gene XREF DO_term Gene_by_orthology ?Gene XREF DO_term
- This is also not a permitted tag combination, you aren't allowed to build a branch off a Text/?Text tag, this might be a bug in the acedb code so I have raised a ticket with the acedb dev team, but as of now it's a show stopper, could you re-think this section.
?DO_term Synonymn ?Text Scope_modifier UNIQUE Broad Exact Narrow Related
Would this work?
?DO_term Synonymn UNIQUE Broad ?Text Exact ?Text Narrow ?Text Related ?Text OR ?DO_term Synonymn_broad ?Text Synonymn_exact ?Text Synonymn_narrow ?Text Synonymn_related ?Text OR ?DO_term Synonymn Broad_synonymn ?Text Exact_synonymn ?Text Narrow_synonymn ?Text Related_synonymn ?Text
- Relationships--Could we also modify the Relationship section as the proposed tag names are new and relationships are used in multiple other classes, could you copy one of the other models as I just created some test data and had the relationship reversed because of the tag names and I wasn't being careful. We should try and re-use common tag structures, that was if there are enough of them we can move them into a Hash to simplify the models file.
?Anatomy_term Lineage Parent_term UNIQUE ?Anatomy_term XREF Daughter_term Daughter_term ?Anatomy_term XREF Parent_term ?SO_term Parent Is_a ?SO_term XREF Is Part_of ?SO_term XREF Part Derived_from ?SO_term XREF Derives Member_of ?SO_term XREF Member Child Is ?SO_term XREF Is_a Part ?SO_term XREF Part_of Derives ?SO_term XREF Derived_from Member ?SO_term XREF Member_of ?Cell Lineage Parent UNIQUE ?Cell XREF Daughter Daughter ?Cell XREF Parent ?GO_term Child Instance ?GO_term XREF Instance_of Component ?GO_term XREF Component_of Parent Instance_of ?GO_term XREF Instance Component_of ?GO_term XREF Component
- Index and Relationship--Are these not storing the same data?
Ranjana--I think Relationship is for the immediate parent/child information, whereas under Index we would list every ancestor and descendant so as to be able to get the complete ancestory.
So as of 11/29/2012 we have:
?Gene DB_info Database ?Database ?Database_field Text//for pointing to OMIM ortholog and disease Disease_info Experimental_model ?DO_term XREF Gene_by_biology ?Species #Evidence Potential_model ?DO_term XREF Gene_by_orthology ?Species #Evidence Disease_relevance ?Text ?Species #Evidence
?DO_term Name UNIQUE ?Text Status UNIQUE Valid Obsolete Alternate_id ?Text Definition UNIQUE ?Text Comment Text Synonymn UNIQUE Broad ?Text Exact ?Text Narrow ?Text Related ?Text Parent Is_a ?DO_term XREF Is Child Is ?DO_term XREF Is_a DB_info Database ?Database ?Database_field Text Type GOLD gram-negative_bacterial_infectious_disease gram-positive_bacterial_infectious_disease sexually_transmitted_infectious_disease tick-borne_infectious_disease zoonotic_infectious_disease Attribute_of Gene_by_biology ?Gene XREF Experimental_model Gene_by_orthology ?Gene XREF Potential_model Phenotype ?Phenotype XREF DO_term WBProcess ?WBProcess XREF DO_term Reference ?Paper XREF DO_term Index Ancestor ?DO_term XREF Descendent Descendent ?DO_term XREF Ancestor Version UNIQUE Text
And we will need to add DO_term tag and object to the Phenotype, WBProcess and Reference models.
?Phenotype DO_term ?DO_term #Evidence ?WBProcess DO_term ?DO_term #Evidence ?Paper DO_term ?DO_term #Evidence