Difference between revisions of "Model changes to capture and consolidate human disease data"

From WormBaseWiki
Jump to navigationJump to search
Line 195: Line 195:
 
===Changes to other data models===
 
===Changes to other data models===
  
And we will need to add DO_term tag and object to the Phenotype, WBProcess and Reference models.
+
And we will need to add DO_term tag and object to the Phenotype, WBProcess, Molecule and Reference models.
 
  ?Phenotype
 
  ?Phenotype
 
  DO_term ?DO_term #Evidence
 
  DO_term ?DO_term #Evidence
Line 204: Line 204:
 
  ?Paper  
 
  ?Paper  
 
  DO_term ?DO_term #Evidence
 
  DO_term ?DO_term #Evidence
 
+
 
  ?Molecule
 
  ?Molecule
  DO_term ?DO_term #Evidence
+
  DO_term ?DO_term #Evidence
  
 
==Disease Ontology .obo to AceDB tag mapping==
 
==Disease Ontology .obo to AceDB tag mapping==

Revision as of 22:34, 5 December 2012

Original Model proposals

A new tag ‘Disease_info’ proposed to consolidate disease-related data in WormBase:

 ?Gene
 DB_info      Database                     ?Database ?Database_field Text
 Disease_info Experimental_model_for_human ?DO_term XREF Gene_by_biology #Evidence	            
              Potential_model_for_human	?DO_term  XREF	Gene_by_orthology	#Evidence
              Human_disease_relevance	?Text	#Evidence //moved from ‘structured description’ tag.

Model for Disease Ontology Term:

 ?DO_term 
 Name  UNIQUE               ?Text
 Status UNIQUE              Valid
                            Obsolete
 Alternate_id               ?Text
 Definition UNIQUE          ?Text
 Comment                    Text
 Synonymn                   ?Text Scope_modifier UNIQUE Broad
                                                        Exact
                                                        Narrow
                                                        Related
 Relationship      Is_a_child  	?DO_term  XREF  Is_a_parent
                   Is_a_parent 	?DO_term  XREF  Is_a_child 
 DB_info           Database     ?Database  ?Database_field   Text              
 Replaced_by                    ?DO_term
 Subset                         Text                  
 Created_by                     Text
 Creation_date                  Text             
 Attribute_of	Gene_by_biology    ?Gene       XREF   DO_term 
                Gene_by_orthology  ?Gene       XREF   DO_term
                Phenotype  ?Phenotype  XREF   DO_term
                WBProcess  ?WBProcess  XREF   DO_term
                Reference  ?Paper      XREF   DO_term 
 Index	Ancestor   ?DO_term   XREF Descendent      
        Descendent ?DO_term   XREF Ancestor   
 Version UNIQUE Text

Suggestions/Corrections from Hinxton

Paul D's suggestions/corrections for the 'Disease_info' tag under ?Gene::

  • Add ?Species tag instead of 'Experimental_model_for_human' type tags, to indicate species information.

Suggestions/Corrections for the ?DO_term model:

  • Drop 'Created_by' and 'Creation_date', not useful information to import, check original source of information if needed.

Paul:If there is a genuine reason to capture the complete file then by all means capture the data, it's just when boiling the data down leaves you with some scrap of data that isn't informative I would vote for not incorporating as the raw data is always available in the source file if someone really wants to see it.

Kevin:I would definitely advocate pick-and-choose over include-everything. We are not the maintainers/develops of this ontology, so I don't see why we should be recording the name of the curator responsible for adding a particular term, or precisely when they added it (to give two examples). This ontology is still in the early stages of development, and is likely to change subtly in definition. By capturing only the "core" parts of it, we give ourselves at least a fighting chance of defining a Acedb model that will not have to be changed each time they release a new version of the DO.

  • For the 'subset' tag, follow standard procedure across models:
?DO_term
Type       GOLD
           gram-negative_bacterial_infectious_disease
           gram-positive_bacterial_infectious_disease
           sexually_transmitted_infectious_disease
           tick-borne_infectious_disease
           zoonotic_infectious_disease

Ideally this would have been UNIQUE but there are 178 DO_terms with multiple subset lines populated.

Or is there something better that maintains flexibility but controls the vocabulary better that Text

Ranjana--If both ?GO_term and ?DO_term were to use 'Subset' and follow this structure, for the sake of uniformity across ontology models, would this be fine or should I switch the tag to 'Type'?

  • This tag structure isn't workable as there isn't a general DO_term tag in the ?Gene model and it's a 2:1.
?DO_term
Attribute_of         Gene_by_biology    ?Gene       XREF   DO_term 
                     Gene_by_orthology  ?Gene       XREF   DO_term
  • This is also not a permitted tag combination, you aren't allowed to build a branch off a Text/?Text tag, this might be a bug in the acedb code so I have raised a ticket with the acedb dev team, but as of now it's a show stopper, could you re-think this section.
?DO_term
Synonymn ?Text Scope_modifier UNIQUE Broad
                                     Exact
                                     Narrow
                                     Related

Would this work?

?DO_term
 Synonymn UNIQUE Broad   ?Text
                 Exact   ?Text
                 Narrow  ?Text 
                 Related ?Text
OR 
?DO_term
Synonymn_broad ?Text
Synonymn_exact ?Text
Synonymn_narrow ?Text
Synonymn_related ?Text 
OR
?DO_term
Synonymn Broad_synonymn ?Text
         Exact_synonymn ?Text
         Narrow_synonymn ?Text
         Related_synonymn ?Text
           
  • Relationships--Could we also modify the Relationship section as the proposed tag names are new and relationships are used in multiple other classes, could you copy one of the other models as I just created some test data and had the relationship reversed because of the tag names and I wasn't being careful. We should try and re-use common tag structures, that was if there are enough of them we can move them into a Hash to simplify the models file.
 ?Anatomy_term
 Lineage      Parent_term  UNIQUE  ?Anatomy_term XREF Daughter_term
              Daughter_term        ?Anatomy_term XREF Parent_term
 ?SO_term
         Parent Is_a ?SO_term XREF Is
                Part_of ?SO_term XREF Part
                Derived_from ?SO_term XREF Derives
                Member_of ?SO_term XREF Member
         Child  Is ?SO_term XREF Is_a
                Part ?SO_term XREF Part_of
                Derives ?SO_term XREF Derived_from
                Member ?SO_term XREF Member_of
 ?Cell
        Lineage Parent  UNIQUE  ?Cell XREF Daughter
                Daughter        ?Cell XREF Parent
 ?GO_term
        Child     Instance ?GO_term XREF Instance_of
                  Component ?GO_term XREF Component_of
        Parent    Instance_of ?GO_term XREF Instance
                  Component_of ?GO_term XREF Component

  • Index and Relationship--Are these not storing the same data?

Ranjana--I think Relationship is for the immediate parent/child information, whereas under Index we would list every ancestor and descendant so as to be able to get the complete ancestory.

Current version of models

So as of 12/05/2012 we have:

Gene

 ?Gene
 DB_info  Database ?Database ?Database_field Text//for pointing to OMIM ortholog and disease
 Disease_info 	Experimental_model ?DO_term XREF Gene_by_biology   ?Species   #Evidence	            
              	Potential_model	   ?DO_term XREF Gene_by_orthology ?Species #Evidence
              	Disease_relevance  ?Text ?Species #Evidence

Note: The 'Human_disease_relevance' tag is being moved from the 'Structured_description'tag under ?Gene to the 'Disease_info' tag and is renamed as 'Disease_relevance'.

DO_term

 ?DO_term 
 Name  UNIQUE               ?Text
 Status UNIQUE              Valid
                            Obsolete
 Alternate_id               ?Text
 Definition UNIQUE          ?Text
 Comment                    Text
 Synonymn      Broad   ?Text     
               Exact   ?Text     
               Narrow  ?Text     
               Related ?Text
 Parent             Is_a  	?DO_term  XREF  Is
 Child              Is 	        ?DO_term  XREF  Is_a 
 DB_info            Database     ?Database  ?Database_field   Text              
 Type               GOLD                   
                    gram-negative_bacterial_infectious_disease
                    gram-positive_bacterial_infectious_disease
                    sexually_transmitted_infectious_disease
                    tick-borne_infectious_disease
                    zoonotic_infectious_disease
 Attribute_of       Gene_by_biology    ?Gene       XREF   Experimental_model
                    Gene_by_orthology  ?Gene       XREF   Potential_model
                    Phenotype  ?Phenotype  XREF   DO_term
                    WBProcess  ?WBProcess  XREF   DO_term
                    Reference  ?Paper      XREF   DO_term 
 Version            UNIQUE Text

Changes to other data models

And we will need to add DO_term tag and object to the Phenotype, WBProcess, Molecule and Reference models.

?Phenotype
DO_term ?DO_term #Evidence
 
?WBProcess
DO_term ?DO_term #Evidence

?Paper 
DO_term ?DO_term #Evidence

?Molecule
DO_term ?DO_term #Evidence

Disease Ontology .obo to AceDB tag mapping

SourceForge repo of multiple DO files.

http://diseaseontology.svn.sourceforge.net/viewvc/diseaseontology/trunk/HumanDO.obo disease ontology .obo file


Adding Disease Ontology as a ?Database object - DONE

We should probably add the DO as a Database object:

Disease_ontology
Name Disease Ontology
Description The Disease Ontology has been developed as a standardized
ontology for human disease with the purpose of providing the biomedical 
community with consistent, reusable and sustainable descriptions of
human disease concepts through collaborative efforts of researchers at 
Northwestern University, Center for Genetic Medicine and the University
of Maryland School of Medicine, Institute for Genome Sciences. 
URL http://www.disease-ontology.org
URL_constructor - is there such a thing for this database?

I'll add this into geneace as this is the primary source of this class. (Paul D.)

Database : "Disease_ontology"
Name "Disease Ontology"
Description "The Disease Ontology has been developed as a standardized ontology for human disease with the purpose of providing the biomedical community with consistent, reusable and sustainable descriptions of human disease concepts through collaborative efforts of researchers at Northwestern University, Center for Genetic Medicine and the University of Maryland School of Medicine, Institute for Genome Sciences."
URL "http:\/\/www.disease-ontology.org"

Linking back to the Disease Ontology

primarily for the DB_info tags, currently it is not possible to link back to http://disease-ontology.org as they employ a java browser within the page to serve tabs of data :(

I was in contact with Cesar Arze about that issue and they provided the following url that might be ok for the short term until they develop the functionality.

http://disease-ontology.org/term/DOID%3A<DOID>

Example for DOID1115

Might be useable for now......ask Todd.

Disease Ontology Browser REST API

  • Might be worth mentioning this to Todd as he might be able to code something funky from their API.
---- snippet taken from their FAQ ----

A RESTful API service is also offered to users of the Disease Ontology browser website to allow for programmatic access to the metadata found in the database.

Metadata returned in JSON format may currently be requested by use of the following URL:

http://www.disease-ontology.org/api/metadata/<DOID> 

Example usage here would be querying for metadata from the term "Disease" (DOID:4):

http://www.disease-ontology.org/api/metadata/DOID:4 

which would return the following JSON:

 { definition: ""A disease is a disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism." [url:http\://ontology.buffalo.edu/medo/Disease_and_Diagnosis.pdf]" -xrefs: [ "MSH2010_2010_02_22:D004194" "NCI2009_04D:C2991" "SNOMEDCT_2010_1_31:64572001" "UMLS_CUI:C0012634" ] -children: [ -[ "disease of cellular proliferation" "DOID:14566" ] -[ "medical disorder" "DOID:0060035" ] -[ "disease of anatomical entity" "DOID:7" ] -[ "disease of metabolism" "DOID:0014667" ] -[ "genetic disease" "DOID:630" ] -[ "disease of mental health" "DOID:150" ] -[ "disease by infectious agent" "DOID:0050117" ] -[ "syndrome" "DOID:225" ] ] name: "disease" id: "DOID:4" }

In the future we hope to offer all the services that are available on the Disease Ontology browser website through the REST API as well.