Difference between revisions of "OA and scripts for disease data"

From WormBaseWiki
Jump to navigationJump to search
(67 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Ontology Annotator for disease data==
+
==Disease OA or Disease Model Annotation OA (Feb 2017)==
 
+
The gene-disease OA (connected in AceDB models to ?Gene class) and the newer Disease_model_annotation data are now in the Disease OA.
Note: All disease_relevant descriptions have been removed from the concise descriptions OA and moved to the disease OA.
+
===Disease OA fields===
 
 
====Fields====
 
One gene can be attached to more than one Experimental_Model and one Disease_Relevance (and their related papers, databases and species); they will be grouped together in one instance of the Editor and grouped together in one line in the data-table. This is similar to a gene being attached to more than one GO term. If a gene needs to be attached to a unrelated disease, enter all data on a new line, by hitting 'New' in the OA.
 
 
 
Editor:
 
'''Tab 1:'''
 
 
 
'''Field 1 WBGene (dis_wbgene):''' <br/ >
 
Behavior of field: Autocomplete obo <br/ >
 
Source: WBGene obo <br/ >
 
Similar to: WBGene in the GO OA or concise descrips OA <br/ >
 
As one starts typing locus name, eg, lin-10 or cosmid name, eg., C09H6 script autocompletes and fills in WBGene ID. <br/ >
 
 
 
'''Field 2 Curator (dis_curator):''' <br/ >
 
Behavior of field: Auto-complete drop-down with ready values <br/ >
 
Similar to: Curator field in GO OA <br/ >
 
 
 
'''Field 3 Curator History (dis_curhistory):''' <br/>
 
Behavior of field: However it is in the concise OA; this is not something that can be changed manually. <br/ >
 
Similar to: consise OA <br/ >
 
 
 
'''Field 4 Experimental model for (dis_humandoid):''' <br/ >
 
Behavior:Autocomplete obo <br/ >
 
Obo file to be used: DO_term obo <br/ >
 
Source: https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo <br/ >
 
Similar to: GO term field in the GO OA. <br/ >
 
For example, curator starts typing 'Alz', picks 'Alzheimer's disease' from the drop-down and script populates field with 'Alzheimer's disease (DOID:10652); similar to GO term OA in the GO OA. <br/ >
 
 
 
This is a multi value ontology field, but in general use only one disease/row signifying one experiment.
 
 
 
Q:Updating: How do we update this obo file, how frequently do other obo files get updated? <br/ >
 
A: Everyday at 8pm, if it has the proper .obo format it should be easy to
 
add to the cronjob that picks them up. <br/ >
 
/home/postgres/work/pgpopulation/obo_oa_ontologie/update_obo_oa_ontologies.pl <br/ >
 
 
 
'''Field 5 Variation (dis_variation)''': <br/>
 
Autocomplete dropdown, enter only one variation per row/experiment, right now no way to represent a double mutant, i.e no object exists. DO not enter a transgene as well.
 
 
 
'''Field 6 Disease Phenotype (dis_phenotypedisease):''' <br/>
 
Multi-ontology field, several disease phenotypes can be entered for a single variation OR a single transgene, but not both.
 
 
 
'''Field 7 Transgene (dis_transgene):''' <br/>
 
Autocomplete drop-down, allows only one, if transgene is entered do not enter a variation and vice versa.
 
 
 
'''Field 8 Paper for Exp Mod (dis_paperexpmod):''' <br/ >
 
Obo file to be used: Paper obo <br/ >
 
Behavior:Autocomplete obo <br/ >
 
Obo file to be used: WBPaper obo <br/ >
 
Similar to: The Paper field in the GO OA
 
this is multivalue
 
 
 
'''Field 9 OMIM disease for Exp Mod(dis_dbexpmod):''' Database for Exp Mod <br/ >
 
Behavior: Free text, multiple values comma-separated <br/ >
 
Q: Will they dump in separate lines in the output ?  Usually those are pipe-separated. 
 
If they'll dump literally as pasted in, then commas are good. <br/ >
 
A: Per latest conversation, using commas is fine, as long as there never will be a comma in the data itself, which is not likely to happen as these are OMIM IDs <br/ >
 
 
 
'''Field 10 Species(dis_species)''' : <br/ >
 
Behavior: Auto-complete drop-down with ready values <br/ >
 
Similar to: Project field in the GO OA <br/ >
 
Current values: Homo sapiens <br/ >
 
 
 
'''Field 11 Last Updated for Exp Model (dis_lastupdateexpmod):''' <br/ >
 
Script autopopulates date when data is a New line, i.e when the "New" button is used.  <br />
 
 
 
'''Field 12 Disease relevance (dis_diseaserelevance)''' <br/ >
 
Behavior: Big Text box (big text-box, keeps expanding) <br/ >
 
Similar To: 'Description Text' field in the Concise OA. <br/ >
 
It is Human_disease_relevance description (it appears as one of the drop down values) for the'Description Type' field in the 'Concise' OA. <br/ >
 
 
 
'''Field 13 Paper for Disease Rel(dis_paperdisrel)''' <br/ >
 
Behavior: Autocomplete obo <br/ >
 
Obo file to be used: WBPaper obo <br/ >
 
Similar to: The Paper field in the GO OA <br/ >
 
 
 
Q:So there's two papers fields.  Are they both required, or it must have at least one, or nothing is required ? <br/ >
 
A: Both are required. <br/ >
 
Q:single/multi value ? <br/ >
 
A: Multivalue <br/ >
 
 
 
'''Field 14 OMIM disease for Disease Rel(dis_dbdisrel) OMIM disease for Disease Rel <br/ >'''
 
Behavior: Free text, multiple values comma-separated <br/ >
 
 
 
Q:Same as xref Database, but a different field ? <br/ >
 
A: Exactly, again I will pipe-separate multiple values. <br/ >
 
 
 
'''Field 15 OMIM gene for Disease Rel:'''
 
Free text, comma separated
 
 
 
'''Field 16 Last Updated for Disease Rel (dis_lastupdatedisrel):''' <br/ >
 
Behavior: Script fills in current date if new annotation, if manually changing, entered as YYYY-MM-DD <br/ >
 
Script autopopulates date when its a new data line. <br />
 
 
 
'''Field 17 Comment (dis_comment):''' <br/ >
 
Behavior: Free text <br/ >
 
 
 
'''Field 18 pgid''' <br/ >
 
 
 
'''Tab 2 of Editor'''<br/ >
 
 
 
'''Field 19 Molecule Type (dis_moleculetype):'''<br/ >
 
Autocomplete with fixed values--Therapeutic_molecule, Toxic_molecule, Exacerbating_molecule
 
Choose only one 'type' per row/per pgid/per disease/per variation/per transgene/per phenotype
 
 
 
'''Field 20 Molecule (dis_molecule):'''<br/ >
 
Autocomplete drop-down from 'Molecule' class in WormBase
 
 
 
'''Field 21 Affected Phenotype (dis_phenotypeaffected):'''<br/ >
 
Autocomplete single ontology field, from WormBase Phenotype ontology.
 
 
 
====Data constraints====
 
For curators only at the tool level to check if required fields are filled. <br/ >
 
These dis_ tables : wbgene curator humandoid paperexpmod species diseaserelevance paperdisrel lastupdatedisrel
 
WBGene <br/ >
 
Curator <br/ >
 
Experimental model for <br/ >
 
Paper for Exp Mod <br/ >
 
Species <br/ >
 
Disease relevance <br/ >
 
Paper for Disease Rel <br/ >
 
Last Updated <br/ >
 
 
 
'''To make live:''' <br/ >
 
at : /home/postgres/work/pgpopulation/dis_disease/ <br/ >
 
create_dis_tables.pl -- create new postgres tables for dis_ disease OA <br/ >
 
synchronize OA <br/ >
 
transfer_concise_disease.pl -- take 95 entries that have con_desctype = 'Human_disease_relevance' and add them to dis_ tables starting with pgid 1. <br/ >
 
Ranjana, manually delete the Human_disease_relevance entries from the concise OA. <br/ >
 
remove the Human_disease_relevance option from the OA, resynchronize. <br/ >
 
 
 
====Dumper specifications====
 
 
 
Dumper module in sandbox at /home/postgres/work/citace_upload/dis_disease/get_dis_disease_ace.pm
 
Copy /home/postgres/work/citace_upload/dis_disease/use_package.pl to a directory you own and run it there.
 
 
 
====Mapping between OA fields and acedb tags====
 
Model:
 
 
 
?Gene
 
DB_info  Database ?Database ?Database_field Text
 
Disease_info Experimental_model ?DO_term XREF Gene_by_biology ?Species  #Evidence            
 
              Potential_model ?DO_term XREF Gene_by_orthology ?Species #Evidence
 
              Disease_relevance  ?Text ?Species #Evidence
 
 
 
We do not fill in Potential_model tag, Sanger does.
 
 
 
'''The example is lov-1 in the disease OA in the sandbox:'''
 
 
 
'''Model tag: ?Gene''' <br />
 
Use value: WBGene (take ID only) <br />
 
Eg: WBGene00003058 <br />
 
 
 
Model tag: DB_info  Database ?Database ?Database_field Text <br />
 
Use value(s) in 'xref Database' and in 'OMIM database' <br />
 
Eg: OMIM:173900 and OMIM:601313, do not take OMIM:173900 again from 'OMIM database' since it is a duplicateof that in 'xref Database'. <br />
 
 
 
.ace:
 
Database "OMIM"   "disease" "173900"
 
Repeat line for each value if there are multiple values
 
 
 
'''Model tag: Experimental_model ?DO_term XREF Gene_by_biology ?Species  #Evidence'''<br />
 
Use value in 'Experimental Model for' <br />
 
Eg:autosomal dominant polycystic kidney (DOID:5937); take ID only <br />
 
Use value in 'Species' for ?Species <br />
 
Eg: Homo sapiens <br />
 
Use value(s) in 'Paper for Disease Rel' for #Evidence <br />
 
Eg.WBPaper00038373 <br />
 
Repeat .ace line for every paper if multiple papers are present. <br />
 
 
 
.ace: <br />
 
Experimental_model  DOID:5937  "Homo sapiens" Paper_evidence "WBPaper00038373"
 
 
 
'''Model tag: Disease_relevance  ?Text ?Species #Evidence''' <br />
 
Use value in 'Disease Relevance' for ?Text <br />
 
Eg:lov-1 and pkd-2 encode the orthologs of human Polycystin-1 and Polycystin-2, which are mutated in autosomal dominant polycystic kidney disease; the polycystins regulate signaling involved in normal renal tubular structure and function; studies in the worm C. elegans have contributed extensively to the finding that cystic kidney diseases can be considered ciliopathies; in elegans lov-1 and pkd-2 are expressed in male ciliary neurons, are required for normal male mating behavior, do not seem to be required for ciliogenesis, and each polycystin may actually have a potential inhibitory function on the other for ciliary function; lov-1 and pkd-1 interact with a single-pass transmembrane protein, CWP-5, though the significance of this interaction for polycystic kidney disease is unknown. <br />
 
 
 
Use value in 'Species' for ?Species <br />
 
Eg. Homo sapiens
 
 
 
Use value in 'Paper for Disease Rel' for #Evidence <br />
 
Eg: WBPaper00038373
 
 
 
<pre style="white-space: pre-wrap;
 
white-space: -moz-pre-wrap;
 
white-space: -pre-wrap;
 
white-space: -o-pre-wrap;
 
word-wrap: break-word">
 
.ace:
 
Disease_relevance "lov-1 and pkd-2 encode the orthologs of human Polycystin-1 and    Polycystin-2, which are mutated in autosomal dominant polycystic kidney disease; the polycystins regulate signaling involved in normal renal tubular structure and function; studies in the worm C. elegans have contributed extensively to the finding that cystic kidney diseases can be considered ciliopathies; in elegans lov-1 and pkd-2 are expressed in male ciliary neurons, are required for normal male mating behavior, do not seem to be required for ciliogenesis, and each polycystin may actually have a potential inhibitory function on the other for ciliary function; lov-1 and pkd-1 interact with a single-pass transmembrane protein, CWP-5, though the significance of this interaction for polycystic kidney disease is unknown." "Homo sapiens" Paper_evidence "WBPaper00038373"
 
 
 
(Repeat this line for every paper, if multiple papers are present).
 
</pre>
 
 
 
'''So put together, .ace file for lov-1 looks like:'''
 
<pre style="white-space: pre-wrap;
 
white-space: -moz-pre-wrap;
 
white-space: -pre-wrap;
 
white-space: -o-pre-wrap;
 
word-wrap: break-word">
 
Gene : "WBGene00003058"
 
Database "OMIM" "disease" "173900"
 
Database "OMIM" "disease" "601313"
 
Experimental_model DOID:5937 "Homo sapiens" Paper_evidence "WBPaper00038373"
 
Disease_relevance "lov-1 and pkd-2 encode the orthologs of human Polycystin-1 and Polycystin-2, which are mutated in autosomal dominant polycystic kidney disease; the polycystins regulate signaling involved in normal renal tubular structure and function; studies in the worm C. elegans have contributed extensively to the finding that cystic kidney diseases can be considered ciliopathies; in elegans lov-1 and pkd-2 are expressed in male ciliary neurons, are required for normal male mating behavior, do not seem to be required for ciliogenesis, and each polycystin may actually have a potential inhibitory function on the other for ciliary function; lov-1 and pkd-1 interact with a single-pass transmembrane protein, CWP-5, though the significance of this interaction for polycystic kidney disease is unknown." "Homo sapiens" Paper_evidence "WBPaper00038373"
 
</pre>
 
 
 
====When to dump data====
 
 
 
If data is present in Field 4-- (dis_expmodelfor) Experimental model for, dump this field and the related fields: <br />
 
Field 5 Name:(dis_paperexpmod) Paper for Exp Mod <br />
 
Field 6 Name:(dis_xrefdb) Database for Exp Mod <br />
 
Field 7 Name:(dis_species) Species <br />
 
 
 
If data is present in Field 9 Name:(dis_diseaserelevance) Disease relevance, dump this and the related fields: <br />
 
Field 10 Name:(dis_paperdisrel) Paper for Disease Rel <br />
 
Field 11 Name:(dis_omimdb) Database for Disease Rel <br />
 
Field 7 Name:(dis_species) Species <br />
 
 
 
====Code annotation====
 
[[Annotation of Disease scripts]]
 
 
 
====Counting script specifications====
 
Counting script counts numbers in Postgres at any given instance and not from the .ace file.
 
 
 
Script at : /home/acedb/ranjana/human_disease/count_disease.pl
 
 
 
1. No. of genes (dis_wbgene): Counts all genes including duplicates, lists PGIDs of duplicate genes
 
 
 
2. No. of unique genes : Counts all genes, only once
 
 
 
3. No. of Experimental Models or DO_terms (dis_humandoid): counts all DO_terms
 
 
 
4. No. of unique Experimental models or DO_terms: does not count repeated DO_terms
 
 
 
5. No. of papers for Experimental models or DO_terms (dis_paperexpmod): counts all papers
 
 
 
6. No. of papers for Disease Relevance (dis_paperdisrel)
 
 
 
7. No. of unique papers in all of disease curation: no. of papers in dis_paperexpmod + no. of papers in dis_paperdisrel, counts a paper only once in both categories, no duplicates
 
 
 
8. No. of disease relevance descriptions (dis_diseaserelevance)
 
 
 
9. No. of OMIM genes connected to (WB)genes: from field 12 in OA-'OMIM gene for Disease Rel' entries look like 'OMIM:607485' or just '607485'; entries are comma separated  (What is the Postgres table name? -- dis_genedisrel)
 
 
 
10. No. of OMIM diseases connected to WB genes: from OA Field 'OMIM disease for Exp Mod (dis_dbexpmod) plus (dis_dbdisrel) OA field-OMIM disease for Disease Rel, counts a disease only once, if it appears in both categories; entries look like 'OMIM:607485' or just '607485'; entries are comma separated
 
 
 
==Ontology Annotator for Disease Term==
 
[[OA for disease term]]
 
 
 
==Dumping data for citace upload==
 
--All scripts are under: /home/acedb/ranjana/human_disease
 
 
 
--A symlink to the script has been created: ln -s /home/postgres/work/citace_upload/dis_disease/use_package.pl <br />
 
--disease ontology file for the OA is updated by a cron job that runs at 8pm every day.
 
(Script:0 20 * * * /home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl
 
 
 
 
 
(This no longer exists: Source:http://www.berkeleybop.org/ontologies/doid.obo, 08.08.2013)
 
 
 
 
 
1. '''Ontology file:'''
 
 
 
Run parseHuman.pl:
 
Feb. 24th 2017: Use the following source: https://raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/master/src/ontology/doid-non-classified.obo
 
 
 
 
 
Old: Downloads the HumanDO.obo from http://diseaseontology.svn.sourceforge.net/viewvc/diseaseontology/trunk/HumanDO.obo and converts it to HumanDO.ace.
 
 
 
Upload to Spica under Data_for_citace/Data_from_Ranjana/. Change name to HumanDO_WSXXX.ace
 
(Source URL now changed to http://www.berkeleybop.org/ontologies/doid.obo, 08.08.2013)
 
 
 
2. '''Gene-disease annotation file'''
 
 
 
Run use_package.pl at /home/acedb/ranjana/human_disease:
 
 
 
Dumps disease data from the disease OA, into disease_<date>.ace, scp file to local machine, change name to disease_WSXXX.ace Upload to Spica at citpub, under Data_for_citace/Data_from_Ranjana/.
 
 
 
Also checks whether all DOIDs in postgres are valid, outputs invalid DOIDs to err.out.<date> file. Note that invalid DOIDs cannot be seen in the OA, identify by PGID and then add the valid DOID to annotation, as the invalid one will not show.
 
 
 
3. '''DO_term-Worm_model_description annotation file'''
 
 
 
Run use_package.pl at /home/acedb/ranjana/human_disease/diseaseterm
 
 
 
Dumps disease data from the disease term OA, into diseaseterm_<date>.ace,scp file to local machine, change name to diseaseterm_WSXXX.ace Upload to Spica at citpub, under Data_for_citace/Data_from_Ranjana/.  
 
 
 
 
 
4. Download the HumanDO.obo file from http://www.berkeleybop.org/ontologies/doid.obo and rename as disease_ontology.WSXXX.obo.
 
 
 
 
 
'''All files should be deposited to:'''
 
 
 
/home/citpub/Data_for_citace/
 
 
 
/home/citpub/Data_for_Ontology/
 
 
 
==Changes required/issues by release==
 
*for the WS251 release: use_package.pl script reports that WBGene00004724 is dead and merged into WBGene00013742, need to query out by using pgid 347, which is sas-1 and then move data to the right gene
 
 
 
 
 
==Changes to OA May 2013==
 
 
 
*Database for Exp Mod  changes to 'OMIM disease for Exp Mod', data can be entered as IDs without the 'OMIM:' as prefix, multiple values comma-separated.
 
 
 
*'Database for Disease Rel' changes to 'OMIM disease for Disease Rel', multiple values are comma-separated, data be entered as IDs without the 'OMIM:' prefix.
 
 
 
*Extra free-text field called 'OMIM gene for Disease Rel' added, data can be entered as IDs without the 'OMIM:' prefix, multiple values comma-separated.
 
 
 
*When data is present in either the 'OMIM disease for Disease Rel' or 'OMIM gene for Disease Rel' fields, script dumps the following line in .ace for each entry as:
 
 
 
Database "OMIM" "disease" "456789" <br>
 
Database "OMIM" "gene" "456789"
 
 
 
==Changes to gene-disease dumper, Sept 2014: moving OMIM Ids to Accession_evidence==
 
*Reenable part of script that dumps OMIM ids under the 'Database' tag
 
*Start dumping the 'Accession_evidence' tag:
 
*for the Experimental_model tag, look at the Ids either entered as 'OMIM:XXXXX', or just 'XXXXX' in the 'OMIM disease for Exp Mod (dis_dbexpmod)'
 
*For the Disease_relevance tag, look at the OMIM Ids either as 'OMIM:XXXXX' or just 'XXXXX' in 'OMIM disease for Disease Rel (dis_dbdisrel)' and 'OMIM gene for Disease Relevance (gene_disrel)'
 
 
 
*For each unique OMIM ID the .ace syntax for the gene would be:
 
<pre style="white-space: pre-wrap;
 
white-space: -moz-pre-wrap;
 
white-space: -pre-wrap;
 
white-space: -o-pre-wrap;
 
word-wrap: break-word">
 
Gene : "WBGene00003052"
 
Database "OMIM" "disease" "115200"
 
Database "OMIM" "disease" "151660"
 
Database "OMIM" "disease" "159001"
 
Database "OMIM" "disease" "176670"
 
Database "OMIM" "disease" "181350"
 
Database "OMIM" "disease" "212112"
 
Database "OMIM" "disease" "248370"
 
Database "OMIM" "disease" "275210"
 
Database "OMIM" "disease" "605588"
 
Database "OMIM" "disease" "610140"
 
Database "OMIM" "disease" "613205"
 
Experimental_model "DOID:3911" "Homo sapiens" Accession_evidence  "OMIM"  "176670"
 
Experimental_model "DOID:0050557" "Homo sapiens" Accession_evidence  "OMIM"  "613205"
 
Experimental_model "DOID:11726" "Homo sapiens" Accession_evidence  "OMIM"  "181350"
 
Disease_relevance  "Mutations in human lamin, LMNA, are found in several diseases referred to as the laminopathic diseases, which include Emery-Dreifuss muscular dystrophy (EDMD), LMNA-related congenital muscular dystrophy (L-CMD), limb-girdle muscular dystrophy (L-CMD), Hutchison-Gilford progeria syndrome (HGPS), dilated cardiomyopathy (DCM), Charcot-Marie-Tooth disorder and atypical Werner syndrome; elegans B-type lamin, lmn-1, performs both A and B-type vertebrate lamin functions; similar to A-type lamins, it has roles in development, organization of nuclear pore complexes, and interacts with lamina and nuclear components; similar to B-type lamins, it is expressed widely throughout development, except for sperm, and interacts with B-type lamin-binding proteins; much of the knowledge of the organization and assembly of the nuclear lamina has come from studies in elegans; disease-causing mutations in human LMNA when introduced into elegans lmn-1/lamin alter nuclear lamina organization and dynamics, leading to phenotypes such as decreased fertility and muscle lesions; a mutation found in Hutchison-Gilford progeria syndrome disrupts the supramolecular structure of the lamin filaments in elegans; LMNA mutations that are found in EDMD, DCM and HGPS, when introduced into elegans lmn-1/lamin cause disruption in lamin filament assembly and nuclear localization; also, work in elegans has revealed that lamins are involved in the normal aging process, as worms mutant for lamin age faster."  "Homo sapiens"  Accession_evidence  "OMIM"  "115200"
 
 
 
 
 
(will be repeated for the rest of the 10 OMIM Ids in 'OMIM disease for Disease Rel (dis_dbdisrel)', no genes in 'OMIM gene for Disease Relevance (gene_disrel)').
 
</pre>
 
 
 
Old way of dumping OMIM IDs for genes:
 
Gene : "WBGene00003052"
 
Database "OMIM" "disease" "176670"
 
Database "OMIM" "disease" "613205"
 
Database "OMIM" "disease" "181350"
 
 
 
==To do==
 
*Need to tell the EBI team that from the WS239 upload (mid-July) we will be dumping Date_last_updated and Curator_confirmed data into citace and they should pick up.
 
*Disease ontology file location has changed, need to alert JC to change the locations for OA and scripts (done, 08.08.2013):
 
DO group lists two locations:
 
Sourceforge: http://sourceforge.net/p/diseaseontology/code/2599/tree/trunk/
 
 
 
OBO Foundry: http://www.berkeleybop.org/ontologies/doid.obo  (will use this source)
 
 
 
=='''Converting the Disease OA into Disease Model Annotation OA (Feb 2017)==
 
 
#'''Curator (dis_curator)'''
 
#'''Curator (dis_curator)'''
 
#'''Curator History (dis_curator)'''
 
#'''Curator History (dis_curator)'''
Line 370: Line 10:
 
#*Single value, constrain
 
#*Single value, constrain
 
#*Have Homo sapien as default value
 
#*Have Homo sapien as default value
 +
#*Comes from the obo_name_species table
 
#'''Disease relevant gene (dis_wbgene):'''this is the causative gene of the disease, the disease_relevant_gene in ace model, DB_Object_ID in the DAF  
 
#'''Disease relevant gene (dis_wbgene):'''this is the causative gene of the disease, the disease_relevant_gene in ace model, DB_Object_ID in the DAF  
 
#'''Variation (dis_variation):'''Autocomplete drop-down with WB Variation list
 
#'''Variation (dis_variation):'''Autocomplete drop-down with WB Variation list
Line 377: Line 18:
 
#'''Transgene (dis_transgene)'''
 
#'''Transgene (dis_transgene)'''
 
#*single value, constrain
 
#*single value, constrain
#'''New: Inferred gene (dis_inferredgene):'''Autocomplete drop-down with WBGene list
+
#'''Genotype (dis_genotype)'''
 +
#'''Asserted Gene (dis_inferredgene):'''Autocomplete drop-down with WBGene list
 
#*To indicate the gene that the Variation or Strain refers to, if known, in elegans usually authors state this
 
#*To indicate the gene that the Variation or Strain refers to, if known, in elegans usually authors state this
 
#*Can be multiple values, eg. if the Strain or Trangene that models the disease has more than one gene.
 
#*Can be multiple values, eg. if the Strain or Trangene that models the disease has more than one gene.
 +
#'''Asserted Variation (dis_assertedvariation):'''Autocomplete drop-down with WBVariation list
 +
#'''Asserted human gene (dis_assertedhumangene)'''
 +
#*for refering to the human gene, eg. Tau, in a trangenic strain model, HGNC gene names drop-down (formerly was free text
 
#*For the 'Modeled by' section at least one of Disease_relevant_gene, Variation, Strain or Transgene is required, constrain
 
#*For the 'Modeled by' section at least one of Disease_relevant_gene, Variation, Strain or Transgene is required, constrain
#'''New: Association Type (dis_associationtype):'''Relationship between the genetic entity (disease_relevant_gene, variation, transgene, or strain in ace model; DB Object in DAF) and the disease.  
+
#'''Association Type (dis_associationtype):'''Relationship between the genetic entity (disease_relevant_gene, variation, transgene, or strain in ace model; DB Object in DAF) and the disease.  
#*drop-down with the following controlled vocabulary:is_model_of, causes_or_contributes_to_condition, causes_condition, contributes_to_condition, is_marker_for
+
#*drop-down with the following controlled vocabulary:
 +
#**is_model_of
 +
#**'''is_implicated_in'''
 +
#**is_marker_for
 +
#**'''is_ameliorated_model_of'''
 +
#**'''is_exacerbated_model_of'''
 
#*single value, constrain
 
#*single value, constrain
 
#*If genetic entity dumped for DB Object ID is 'Disease_relevant_gene' than is_model_of not allowed, constrain
 
#*If genetic entity dumped for DB Object ID is 'Disease_relevant_gene' than is_model_of not allowed, constrain
#'''New: Evidence Code (dis_goinference):'''Autocomplete drop-down with GO codes for now (will adopt ECO later)  
+
#'''Evidence Code (dis_goinference):'''Autocomplete drop-down with GO codes for now (will adopt ECO later)  
 
#*allow multiple values
 
#*allow multiple values
 
#*multiple evidence codes allowed only from one publication, for one model
 
#*multiple evidence codes allowed only from one publication, for one model
#'''New: Qualifier (dis_qualifier):'''Autocomplete dropdown with only one value 'NOT"
+
#'''Qualifier (dis_qualifier):'''Autocomplete dropdown with only one value 'NOT"
 
#*Indicates that the genetic entity is 'NOT' a model for disease.
 
#*Indicates that the genetic entity is 'NOT' a model for disease.
 
#*the default value is blank, with 'NOT' as the only drop-down choice
 
#*the default value is blank, with 'NOT' as the only drop-down choice
#'''New:Genetic sex (dis_geneticsex):'''Autocomplete dropdown with the following values:hermaphrodite, male, female
+
#*Note that the qualifiers are tied together in postgres, any change here will affect Modifier_qualifier as well, #32.
 +
#'''Genetic sex (dis_geneticsex):'''Autocomplete dropdown with the following values:hermaphrodite, male, female
 
#*single value
 
#*single value
 
#*have 'hermaphrodite' as the default vlaue
 
#*have 'hermaphrodite' as the default vlaue
Line 397: Line 48:
 
#*for now will leave as multi-value ontology, until data is cleaned up
 
#*for now will leave as multi-value ontology, until data is cleaned up
 
#*later on for WS260 will have single value only
 
#*later on for WS260 will have single value only
 +
#'''Disease Model Description  (dis_diseasemodeldesc)'''
 
#'''Date_last updated (dis_lastupdateexpmod):''' Date of original annotation or date last modified
 
#'''Date_last updated (dis_lastupdateexpmod):''' Date of original annotation or date last modified
 
#*single value only, constrain
 
#*single value only, constrain
 
#'''Remark (dis_comment)'''
 
#'''Remark (dis_comment)'''
 
#'''pgid '''
 
#'''pgid '''
#'''New: Inducing Chemical (dis_inducingchemical  ):'''Drop-down with WB Molecule list/ontology
+
#'''Inducing Chemical (dis_inducingchemical  ):'''Drop-down with WB Molecule list/ontology
 
#*Allow multiple values
 
#*Allow multiple values
 
#*for curator: enter multiple values if only multiple molecules are used to induce the same disease in one experiment.
 
#*for curator: enter multiple values if only multiple molecules are used to induce the same disease in one experiment.
#'''New: Inducing agent (dis_inducingagent ):''' Free text, for inducers not in WB molecule ontology
+
#'''Inducing agent (dis_inducingagent ):''' Free text, for inducers not in WB molecule ontology
 
#*multiple values, will we comma separate?
 
#*multiple values, will we comma separate?
 
#*for curator: enter multiple values only if multiple agents were used as inducers for the same disease in one experiment/model.
 
#*for curator: enter multiple values only if multiple agents were used as inducers for the same disease in one experiment/model.
#'''New: Experimental condition comment (dis_commentexpcond):''' free big text
+
#'''Experimental condition comment (dis_commentexpcond):''' free big text
#'''New: Modifier transgene (dis_modtransgene):'''Autocomplete dropdown with WB Transgene list
+
#*for curator: meant for internal curator comments only, in case a WBMol does not exist, or other comments
 +
#'''Modifier transgene (dis_modtransgene):'''Autocomplete dropdown with WB Transgene list
 
#*multiple values
 
#*multiple values
 
#*for curator: enter multiple values, only if multiple transgenes were used as modifiers in one experiment.
 
#*for curator: enter multiple values, only if multiple transgenes were used as modifiers in one experiment.
#'''New: Modifier variation (dis_modvariation):'''Autocomplete dropdown with WB Transgene list
+
#'''Modifier variation (dis_modvariation):'''Autocomplete dropdown with WB Transgene list
 
#*multiple values
 
#*multiple values
#'''New: Modifier strain: (dis_modstrain)'''Autocomplete dropdown with WB Strain list
+
#'''Modifier strain: (dis_modstrain)'''Autocomplete dropdown with WB Strain list
 
#*multiple values
 
#*multiple values
#'''New: Modifier gene (dis_modgene):'''Autocomplete dropdown with WB Gene list
+
#'''Modifier gene (dis_modgene):'''Autocomplete dropdown with WB Gene list
 
#*multiple values, to indicate the gene in the modifying Transgene, Variation, Strain.
 
#*multiple values, to indicate the gene in the modifying Transgene, Variation, Strain.
#'''New: Modifier molecule (dis_modmolecule):'''Autocomplete dropdown with WB Molecule list
+
#'''Modifier human gene (dis_modhumangene):'''Autocomplete dropdown with HGNC human gene list, multivalue, to indicate the inferred modifier human gene in the strain or transgene used as modifier
 +
#'''Modifier Genotype (dis_modgenotype):'''Autocomplete dropdown with WBGenotype list used as modifier
 +
#'''Modifier molecule (dis_modmolecule):'''Autocomplete dropdown with WB Molecule list
 
#*multiple values                                                                                     
 
#*multiple values                                                                                     
#'''New: Other modifier (dis_modother):''' Free Big Text to indicate other modifiers of the disease eg., diet, radiation, surgery
+
#'''Other modifier (dis_modother):''' Free Big Text to indicate other modifiers of the disease eg., diet, radiation, surgery
 
#*multiple values
 
#*multiple values
 
#*comma separate multiple values
 
#*comma separate multiple values
Line 427: Line 82:
 
#*for curators: Use multiple values for each type of modifier, only if used in a single experiment to model a single disease from a single paper, they should all be consistant with the modifier_association_type chosen  
 
#*for curators: Use multiple values for each type of modifier, only if used in a single experiment to model a single disease from a single paper, they should all be consistant with the modifier_association_type chosen  
 
#*have condition_ameliorated_by as default value, as this is the most common
 
#*have condition_ameliorated_by as default value, as this is the most common
#'''Disease phenotype (dis_phenotypedisease):'''Autocomplete dropdown of WB Phenotype ontology terms
+
#'''Modifier qualifier (dis_modqualifier):''' Autocomplete drop-down with one value: NOT
#*multiple values
+
#**Note that the qualifiers are tied together in postgres, any change here will affect Qualifier as well, #16.
#*for curator: multiple disease phenotypes from same experiment allowed
 
#'''New: Ameliorated_phenotype (dis_phenotypeameliorated):''' Autocomplete dropdown of WB Phenotype ontology terms
 
#*multiple values
 
#*for curator: multiple ameliorated phenotypes from same experiment allowed
 
#'''New: Exacerbated Phenotype (dis_phenotypeexacerbated):''' Autocomplete dropdown of WB Phenotype ontology terms
 
#*multiple values
 
#*for curator: multiple exacerbated phenotypes from same experiment allowed
 
#'''New: Disease phenotype comment (dis_commentdisphen)''' (free big text)
 
 
#'''Disease relevance description (dis_diseaserelevance):''' Free big text box, keep as is
 
#'''Disease relevance description (dis_diseaserelevance):''' Free big text box, keep as is
 
#'''Paper for disease relevance (dis_paperdisrel)''': keep as is
 
#'''Paper for disease relevance (dis_paperdisrel)''': keep as is
Line 447: Line 94:
 
#*multiple values, will be comma separated
 
#*multiple values, will be comma separated
 
#OMIM disease for Disease Rel (dis_dbdisrel): leave as is
 
#OMIM disease for Disease Rel (dis_dbdisrel): leave as is
#'''Requested Strain (dis_suggested_strain):''' free text, keep as is, to hold a non-WB strain until it becomes one.
+
#pgid
 +
 
 +
==='''Deleted fields from OA UI'''===
 +
#'''Molecule (dis_molecule)'''
 +
#'''Affected_phenotype (dis_phenotypeaffected)'''
 +
#'''Disease Relevant Gene Text  (dis_wbgenetext)'''
 +
#'''Variation Text (dis_variationtext)
 +
#'''Strain text  (dis_straintext)
 +
#'''Requested Strain( (dis_suggested_strain)''': free text, keep as is, to hold a non-WB strain until it becomes one.
 +
#'''Transgene text  (dis_transgenetext)
 +
#'''Interacting variation (dis_interactvariation)'''
 +
#*auto-complete drop-down with WB variation list
 +
#*allow multiple values
 +
#'''Interacting gene (dis_interactgene)'''
 +
#*auto-complete drop-down with WB gene list
 +
#*allow multiple values
 +
#'''Interacting transgene (dis_interacttransgene)'''
 +
#*auto-complete drop-down with WB transgene list
 +
#*allow multiple values
 +
#'''RNAi experiment (dis_rnaiexperiment)'''
 +
#*autocomplete drop-down with WB RNAi experiment list
 +
#*allow multiple values
 +
#'''Model Remark''' (dis_modelremark): Free big-text field, to contain remarks specifically related to the model.
 +
#**causes_or_contributes_to_condition
 +
#**causes_condition
 +
#**contributes_to_condition
 +
#'''Disease phenotype (dis_phenotypedisease):'''Autocomplete dropdown of WB Phenotype ontology terms
 +
#*multiple values
 +
#*for curator: multiple disease phenotypes from same experiment allowed
 +
#'''Ameliorated_phenotype (dis_phenotypeameliorated):''' Autocomplete dropdown of WB Phenotype ontology terms
 +
#*multiple values
 +
#*for curator: multiple ameliorated phenotypes from same experiment allowed
 +
#'''Exacerbated Phenotype (dis_phenotypeexacerbated):''' Autocomplete dropdown of WB Phenotype ontology terms
 +
#*multiple values
 +
#*for curator: multiple exacerbated phenotypes from same experiment allowed
 +
#'''Disease phenotype comment (dis_commentdisphen)''' (free big text)
 
#'''Requested Phenotype (dis_suggested_phenotype)''': keep as is
 
#'''Requested Phenotype (dis_suggested_phenotype)''': keep as is
 
#'''Requested Phenotype Definition (dis_suggested_definition)''':keep as is
 
#'''Requested Phenotype Definition (dis_suggested_definition)''':keep as is
 
#'''Child of Phenotype (dis_child_of)''': keep as is
 
#'''Child of Phenotype (dis_child_of)''': keep as is
 
'''Deleted tables''':
 
1. Molecule (dis_molecule)
 
2. Affected_phenotype (dis_phenotypeaffected)
 
  
 
==Mapping of fields from old Disease OA to new disease OA (Feb 2017)==
 
==Mapping of fields from old Disease OA to new disease OA (Feb 2017)==
Line 517: Line 195:
 
====Dumper specifications Part 1====
 
====Dumper specifications Part 1====
 
April, 2017
 
April, 2017
 +
#Data comes from Disease OA data tables of postgres <br/>
 +
#All disease related files at: home/acedb/ranjana/human_disease
 +
#Run parseHuman.pl to generate the HumanDO.ace file (the disease ontology .ace file)
 +
#Run use_package.pl script to generate the disease data (disease_<date>.ace).<br/>, runs pm1 for old style data and pm2 for the disease model annotation style data.
 +
#For each annotation generate an ID in the form of, starting with: Disease_model_annotation : "00000001" <br/>
  
1. Data comes from Disease Ontology Annotator data tables of postgres <br/>
+
====Changes for WS262 (Aug 2017)====
2. Rule for dumping the Disease_model_annotation data: If OA field Variation (dis_variation) OR Strain (dis_strain) OR Transgene (dis_transgene) has data<br/>
+
#*Rule for dumping the Disease_model_annotation data: If (any) value exists for Association_type. Deprecate rule: If OA field Variation (dis_variation) OR Strain (dis_strain) OR Transgene (dis_transgene) has data<br/>
3. Disease_model_annotation model:  Most of the acedb model tags match names of OA fields.<br/>
+
#*new Association_type 'is_implicated_in', new tags: 'Qualifier_not' and 'Modifier_qualifier_not'.
4.For each annotation generate an ID in the form of, starting with: Disease_model_annotation : "00000001" <br/>
 
5. All disease related files at: home/acedb/ranjana/human_disease, I run parseHuman.pl to generate the HumanDO.ace file (the disease ontology .ace file) and the use_package.pl script to generate the disease data (disease_<date>.ace).<br/>
 
6. Please deposit new script and data file here, call file disease_model_annotation_<date>.ace<br/>.  After the new script works, maybe you can fold it into ./use_package.pl.
 
  
 
{|Class="wikitable"
 
{|Class="wikitable"
 
|+Mapping of acedb tags, OA fields and postgres tables
 
|+Mapping of acedb tags, OA fields and postgres tables
 
|-
 
|-
! !!acedb tag!!OA Field!!Postgres table!!Example .ace syntax
+
! !!acedb tag!!OA Field!!Postgres table!!Example .ace syntax!!Required or not
 
|-
 
|-
|1 || Disease_term || Disease Name||dis_humandoid||"DOID:3429"
+
|1 || Disease_term || Disease Name||dis_humandoid||"DOID:3429"||required
 
|-
 
|-
|2 || Disease_of_species || Disease of Species||dis_species||"Homo sapiens"
+
|2 || Disease_of_species || Disease of Species||dis_species||"Homo sapiens"||not required
 
|-
 
|-
|3 || Strain||Strain||dis_strain||"ANM30"
+
|3 || Strain||Strain||dis_strain||"ANM30"||required only if Variation, Transgene and Disease_relevant gene absent
 
|-  
 
|-  
|4 || Variation|| Variation||dis_variation||"WBVar00242728"
+
|4 || Variation|| Variation||dis_variation||"WBVar00242728"||required only if Strain, Transgene and Disease_relevant_gene absent
 +
|-
 +
|5 || Transgene || Transgene||dis_transgene||"WBTransgene00006901"||required only if Strain, Variation and Disease_relevant_gene are absent
 +
|-
 +
|6 || Disease_relevant_gene||Disease Relevant Gene ||dis_wbgene||"WBGene00003001"||required only if Strain, Variation and Transgene are absent
 +
|-
 +
|7|| Interacting_variation || Interacting Variation|| dis_interactvariation|| "WBVar00242728"|| not required
 +
|-
 +
|8|| Interacting_transgene || Interacting Transgene|| dis_interacttransgene|| "WBTransgene00006901"|| not required
 
|-
 
|-
|5 || Transgene || Transgene||dis_transgene||"WBTransgene00006901"
+
|9|| Interacting_gene || Interacting Gene || dis_interactgene||"WBGene00003001"|| not required
 
|-
 
|-
|6 || Disease_relevant_gene||Disease Relevant Gene ||dis_wbgene||"WBGene00003001"
+
|10|| RNAi_experiment || RNAi experiment ||dis_Rnaiexpt|| "WBRNAi00000171"||not required
 
|-
 
|-
|7|| Inferred_gene ||Inferred Gene ||dis_inferredgene||"WBGene00002990"
+
|11|| Inferred_gene ||Inferred Gene ||dis_inferredgene||"WBGene00002990"||not required
 
|-
 
|-
|8 || Association_type||Association Type||dis_associationtype||causes_condition
+
|12 || Association_type||Association Type||dis_associationtype||causes_condition||required
 
|-  
 
|-  
|9|| Evidence_code ||Evidence Code||dis_goinference||"IMP"
+
|13|| Evidence_code ||Evidence Code||dis_goinference||"IMP"||required
 
|-
 
|-
|10|| Qualifier||Qualifier||dis_qualifier||NOT
+
|14|| Change for WS262: Qualifier_NOT||Qualifier||dis_qualifier||Qualifier_not||not required
 
|-
 
|-
|12||Inducing_chemical || Inducing Chemical||dis_inducingchemical||"WBMol:00003650"
+
|15||Inducing_chemical || Inducing Chemical||dis_inducingchemical||"WBMol:00003650"||not required
 
|-
 
|-
|13 || Inducing_agent || Inducing Agent||dis_inducingagent||"Glucose"
+
|16 || Inducing_agent || Inducing Agent||dis_inducingagent||"Glucose"||not required
 
|-
 
|-
|14 || Modifier_transgene || Modifier Transgene||dis_modtransgene||"WBTransgene00006901"
+
|17 || Modifier_transgene || Modifier Transgene||dis_modtransgene||"WBTransgene00006901"||not required
 
|-
 
|-
|15 || Modifier_variation||Modifier Variation||dis_modvariation||"WBVar00242728"
+
|18 || Modifier_variation||Modifier Variation||dis_modvariation||"WBVar00242728"||not required
 
|-  
 
|-  
|16 || Modifier_strain|| Modifier Strain||dis_modstrain||"AN170"
+
|19 || Modifier_strain|| Modifier Strain||dis_modstrain||"AN170"||not required
 
|-
 
|-
|17|| Modifier_gene|| Modifier Gene||dis_modgene||"WBGene00004310"
+
|20|| Modifier_gene|| Modifier Gene||dis_modgene||"WBGene00004310"||not required
 
|-
 
|-
|18 || Modifier_molecule|| Modifier molecule||dis_modmolecule||"WBMol:00002660"
+
|21 || Modifier_molecule|| Modifier molecule||dis_modmolecule||"WBMol:00002660"||not required
 
|-
 
|-
|19|| Other_modifier|| Other Modifier||dis_modother||"Sugar"
+
|22|| Other_modifier|| Other Modifier||dis_modother||"Sugar"||not required
 
|-
 
|-
|20|| Modifier_association_type||Modifier Association Type||dis_moleculetype||condition_ameliorated_by
+
|23|| Modifier_association_type||Modifier Association Type||dis_moleculetype||condition_ameliorated_by||required only if any one of these is present: Modifier Transgene, Modifier Variation, Modifier Strain, Modifier gene, Modifier molecule, Other Modifier
 
|-  
 
|-  
|21|| Genetic sex ||Genetic Sex||dis_geneticsex||hermaphrodite
+
|24||Change for WS262: Modifier_qualifier_not||Modifier Qualifier||dis_modqualifier||Modifier_qualifier_not||not required
 +
|-
 +
|25||Genetic sex ||Genetic Sex||dis_geneticsex||hermaphrodite||not required
 
|-
 
|-
|22|| Disease_phenotype|| Disease Phenotype||dis_phenotypedisease||"WBPhenotype:0000884"
+
|26|| Disease_phenotype|| Disease Phenotype||dis_phenotypedisease||"WBPhenotype:0000884"||not required
 
|-
 
|-
|23||Ameliorated_phenotype||Ameliorated Phenotype||dis_phenotypeameliorated||"WBPhenotype:0003884"
+
|27||Ameliorated_phenotype||Ameliorated Phenotype||dis_phenotypeameliorated||"WBPhenotype:0003884"||not required
 
|-
 
|-
|24 || Exacerbated_phenotype||Exacerbated Phenotype||dis_phenotypeexacerbated||"WBPhenotype:0001884"
+
|28|| Exacerbated_phenotype||Exacerbated Phenotype||dis_phenotypeexacerbated||"WBPhenotype:0001884"||not required
 
|-  
 
|-  
|25|| Phenotype_comment||Disease Phenotype Comment ||dis_commentdisphen||"Neurons looked sick"
+
|29|| Phenotype_comment||Disease Phenotype Comment ||dis_commentdisphen||"Neurons looked sick"||not required
 
|-
 
|-
|26 || Paper_evidence || Reference||dis_paperexpmod||"WBPaper00033160"
+
|30|| Paper_evidence || Reference||dis_paperexpmod||"WBPaper00033160"||required
 
|-
 
|-
|27||Disease_model_description||Disease Model Description||dis_diseasemodeldesc||"C. elegans model of Alzeimer's where Abeta expressed."
+
|31||Disease_model_description||Disease Model Description||dis_diseasemodeldesc||"C. elegans model of Alzeimer's where Abeta expressed."||not required
 
|-
 
|-
|28|| Database "OMIM" "gene" || OMIM Gene||dis_genedisrel||"OMIM"    "gene"    "608441"
+
|32|| Database "OMIM" "gene" || OMIM Gene||dis_genedisrel||"OMIM"    "gene"    "608441"||not required
 
|-
 
|-
|29|| Database "OMIM" "disease|| OMIM Disease||dis_dbexpmod||"OMIM"    "disease"    "610743"
+
|33|| Database "OMIM" "disease|| OMIM Disease||dis_dbexpmod||"OMIM"    "disease"    "610743"||not required
 
|-
 
|-
|30|| Curator_confirmed || Curator||dis_curator||"WBPerson324"
+
|34|| Curator_confirmed || Curator||dis_curator||"WBPerson324"||required
 
|-
 
|-
|30|| Date_last_updated||Date Last Updated||dis_lastupdateexpmod||"2013-02-21"
+
|35|| Date_last_updated||Date Last Updated||dis_lastupdateexpmod||"2013-02-21"||required
 
|}
 
|}
 +
 +
====QC script for Disease_model_annotation_class data====
 +
*Script will check if all required data/fields is present and print pgid and errors to a file disease_model_annotation_errors_<date>
 +
*Need to decide if script will be part of dumping script or separate --can be part of dumping script
 +
*Need to decide if script will check the postgres database or the .ace dump --can check postgres database
 +
*Required fields:
 +
**Disease Name
 +
**One or more of Disease relevant gene, Variation, Strain or Transgene
 +
**If Disease relevant gene or Variation or Transgene is present, Inferred gene has to be present
 +
**Association Type
 +
**Evidence Code
 +
**Reference
 +
**Date Last Updated
 +
**If any one of-- Modifier Transgene, Modifier Variation, Modifier Strain, Modifier Gene, Modifier Molecule, or Other Modifier is present, then Modifier Association Type has to be present
  
 
====Model for Disease_model_annotation data====
 
====Model for Disease_model_annotation data====
Line 595: Line 299:
 
?Disease_model_annotation Disease_term UNIQUE ?DO_term XREF Disease_model_annotation
 
?Disease_model_annotation Disease_term UNIQUE ?DO_term XREF Disease_model_annotation
 
                           Disease_of_species ?Species
 
                           Disease_of_species ?Species
                           Modeled_by Strain UNIQUE ?Strain XREF Disease_model_annotation       // genetic entity that
+
                           Modeled_by Strain UNIQUE ?Strain XREF Disease_model_annotation   ?Text    // genetic entity that
 
                                                                                                   models the disease
 
                                                                                                   models the disease
                                     Variation UNIQUE ?Variation XREF Disease_model_annotation  // genetic entity that                                                                                                                                     
+
                                     Variation UNIQUE ?Variation XREF Disease_model_annotation   ?Text // genetic entity that                                                                                                                                     
 
                                                                                                   models the disease
 
                                                                                                   models the disease
                                     Transgene UNIQUE ?Transgene XREF Disease_model_annotation  // genetic entity that
+
                                     Transgene UNIQUE ?Transgene XREF Disease_model_annotation ?Text // genetic entity that
 
                                                                                                   models the disease
 
                                                                                                   models the disease
                                     Disease_relevant_gene UNIQUE ?Gene XREF Disease_model_annotation//when the
+
                                     Disease_relevant_gene UNIQUE ?Gene XREF Disease_model_annotation   ?Text  //when the
 
                                                                                                     genetic entity is a gene
 
                                                                                                     genetic entity is a gene
                          Inferred_gene    ?Gene XREF  Disease_model_annotation //to indicate the gene in Variation or
+
                                    Modeled_by_remark UNIQUE  ?Text
                                                                                      Strain, when the authors tell us
+
                                    Inferred_gene    ?Gene XREF  Disease_model_annotation //to indicate the associated gene
                           Association_type UNIQUE is_model_of            // All 5 tags describe the relationship
+
                           Association_type UNIQUE is_model_of            // All 5 tags describe the relationship between the genetic entity and the disease  
                                                                          between the genetic entity and the disease  
+
                                                                      is_implicated_in      //new for WS262                                                           
                                                                                                          (condition)
+
                                                                      is_marker_for
                                                  causes_or_contributes_to_condition
 
                                                  causes_condition
 
                                                  contributes_to_condition
 
                                                  is_marker_for
 
 
                           Evidence_code ?GO_code                              // will use ECO later on
 
                           Evidence_code ?GO_code                              // will use ECO later on
                           Qualifier      NOT                                  //to indicate that a disease is NOT  
+
                           Qualifier_not                              //new for WS262, to indicate that a disease is NOT modeled by X
                                                                                modeled by X
 
 
                           Experimental_condition Inducing_chemical ?Molecule XREF Disease_model_inducer  //to indicate  
 
                           Experimental_condition Inducing_chemical ?Molecule XREF Disease_model_inducer  //to indicate  
 
                                                                                               the disease-inducing agent
 
                                                                                               the disease-inducing agent
                                                Inducing_agent    ?Text    //e.g. diet, radiation,etc not in  
+
                                                                  Inducing_agent    ?Text    //e.g. diet, radiation,etc not in  
 
                                                                                     Molecule class
 
                                                                                     Molecule class
                           Modifier_info Modifier_transgene ?Transgene //genetic entity that modifies the disease
+
                           Modifier_info                 Modifier_transgene ?Transgene //genetic entity that modifies the disease
                                        Modifier_variation ?Variation //(same as above)
+
                                                                Modifier_variation ?Variation //(same as above)
                                        Modifier_strain    ?Strain    //(same as above)
+
                                                                Modifier_strain    ?Strain    //(same as above)
                                        Modifier_gene      ?Gene      //to indicate the gene in the modifying  
+
                                                                Modifier_gene      ?Gene      //to indicate the gene in the modifying Transgene, Variation, Strain.
                                                                        Transgene, Variation, Strain.
+
                                                                Modifier_molecule  ?Molecule XREF Disease_model_modifier  // to indicate chemical modifiers of the disease
                                        Modifier_molecule  ?Molecule XREF Disease_model_modifier  // to indicate  
+
                                                                Other_modifier    ?Text      //to indicate other modifiers of the disease eg. diet,radiation, surgery etc
                                                                                      chemical modifiers of the disease
+
                    Modifier_association_type UNIQUE condition_ameliorated_by  //to indicate the association type between modifiers and disease
                                        Other_modifier    ?Text      //to indicate other modifiers of the disease eg.
+
                                                                                condition_exacerbated_by
                                                                        diet,radiation, surgery etc
+
                                                                                condition_not_ameliorated_by //request for WS265 or WS266
                        Modifier_association_type UNIQUE condition_ameliorated_by  //to indicate the association type
+
                                                                                condition_not_exacerbated by//request for  Ws265 or WS266
                                                                                      between modifiers and disease
+
                          Modifier_qualifier_not    //new for WS262
                                                          condition_exacerbated_by
 
 
                           Genetic_sex UNIQUE      hermaphrodite  //indicates genetic sex of the disease model
 
                           Genetic_sex UNIQUE      hermaphrodite  //indicates genetic sex of the disease model
                                                  male
+
                                                                  male
                                                  female
+
                                                                  female
 
                           Disease_phenotype_info Disease_phenotype ?Phenotype    //Phenotypes similar to human disease  
 
                           Disease_phenotype_info Disease_phenotype ?Phenotype    //Phenotypes similar to human disease  
                                                Ameliorated_phenotype ?Phenotype // Phenotypes ameliorated by modifier
+
                                                                  Ameliorated_phenotype ?Phenotype // Phenotypes ameliorated by modifier
                                                Exacerbated_phenotype ?Phenotype // Phenotypes exacerbated by modifier
+
                                                                  Exacerbated_phenotype ?Phenotype // Phenotypes exacerbated by modifier
                                                Phenotype_comment ?Text  // To describe non-WPO phenotypes
+
                                                                  Phenotype_comment ?Text  // To describe non-WPO phenotypes
 
                           Paper_evidence        ?Paper
 
                           Paper_evidence        ?Paper
 
                           Disease_model_description  ?Text
 
                           Disease_model_description  ?Text
Line 662: Line 360:
 
Date_last_updated          "2017-03-15"
 
Date_last_updated          "2017-03-15"
  
This is fake data:
+
This is fake data, has example for the Qualifier_not and Modifier_qualifier_not (tags went in for WS262, maybe one real data example for WS262):
  
 
Disease_model_annotation : "00000002"
 
Disease_model_annotation : "00000002"
Line 668: Line 366:
 
Disease_of_species        "Homo sapiens"
 
Disease_of_species        "Homo sapiens"
 
Strain                    "CL4176"
 
Strain                    "CL4176"
 +
Interacting_variation    "WBVar00000001"
 +
Interacting_transgene  "WBTransgene00000034"
 +
Interacting_gene          "WBGene00000045"
 +
RNAi_experiment          "WBRNAi00000171"
 
Association_type          is_model_of
 
Association_type          is_model_of
 
Evidence_code              "IMP"
 
Evidence_code              "IMP"
 +
Qualifier_not
 
Inducing_chemical          "WBMol:00002044"
 
Inducing_chemical          "WBMol:00002044"
 
Modifier_gene              "WBGene00010882"
 
Modifier_gene              "WBGene00010882"
 
Modifier_association_type  condition_ameliorated_by
 
Modifier_association_type  condition_ameliorated_by
 +
Modifier_qualifier_not
 
Genetic_sex                hermaphrodite
 
Genetic_sex                hermaphrodite
 
Disease_phenotype          "WBPhenotype:0001935"
 
Disease_phenotype          "WBPhenotype:0001935"
Line 700: Line 404:
 
Curator_confirmed          "WBPerson324"
 
Curator_confirmed          "WBPerson324"
 
Date_last_updated          "2017-03-13"
 
Date_last_updated          "2017-03-13"
</pre>  
+
</pre>
  
 
====Dumper Specifications, Part 2====
 
====Dumper Specifications, Part 2====
Line 739: Line 443:
 
Molecule_modifier                          "WBMol:00002044"
 
Molecule_modifier                          "WBMol:00002044"
 
</pre>
 
</pre>
 +
 +
===Ontology Annotator for Disease Term===
 +
[[OA for disease term]]
 +
 +
==Dumping data for citace upload==
 +
--All scripts are under: /home/acedb/ranjana/human_disease
 +
 +
--A symlink to the script has been created: ln -s /home/postgres/work/citace_upload/dis_disease/use_package.pl <br />
 +
--disease ontology file for the OA is updated by a cron job that runs at 8pm every day.
 +
(Script:0 20 * * * /home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl
 +
 +
4 disease related files are sent to citpub@spica.
 +
 +
'''1. Ontology files:'''
 +
 +
*Run parseHuman.pl:
 +
* (Aug 2018) Switched the source of the ontology to: http://purl.obolibrary.org/obo/doid.obo  which currently points to-
 +
https://raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/master/src/ontology/doid.obo  (Aug 2018) to avoid using the single-inheritance hierarchy for is_a relationships and to use the poly hierarchy.
 +
*Uses the following source: https://raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/master/src/ontology/doid-non-classified.obo (Feb. 24th 2017)
 +
*Dumps the HumanDO.ace file; Upload to Spica under Data_for_citace/Data_from_Ranjana/. Change name to HumanDO_WSXXX.ace; test file in acedb database
 +
*Download the HumanDO obo file from the above link, always check that this is a text file, as sometimes just the link can be downloaded; transfer file to citpub@spica  /Data_for_Ontology; change name of file to disease_ontology.WSXXX.obo
 +
 +
'''2. Gene-disease annotation files'''
 +
 +
*Run use_package.pl:
 +
*Note changes (Sept 2018): On Tazendra: if it doesn't have the association type, we just skip that whole pgid, so it doesn't get dumped at all.  In the future, when we get rid of this, it will dump things and let you know in the error messages that they need fixing.
 +
*On the sandbox, we've removed the above, so dumps all the lines that have the new errors and lets you know in the error messages that they need fixing. Will be like this on Tazendra, possibly for WS269.
 +
*Dumps disease data from the disease OA, into disease_<date>.ace and disease_annotation_<date>.ace
 +
*script also checks whether all DOIDs in postgres are valid, outputs invalid DOIDs to err.out.<date> file. Note that invalid DOIDs cannot be seen in the OA, identify by PGID and then add the valid DOID to annotation, as the invalid one will not show.
 +
*scp both files to citpub@spica, under Data_for_citace/Data_from_Ranjana/, add WSXXX to both files; test both files in acedb database
 +
 +
'''3. Testing files in cminus on spica'''
 +
*Transfer file/s to spica, under /home/citpub/Data_for_citace/Data_from_Ranjana
 +
*On local computer, ssh -Y citpub@spica.caltech.edu, call the database by using 'cminus'; read files and check for errors
 +
* Transferring multiple remote files to local machine: scp your_username@remote.edu:/some/remote/directory/\{a,b,c\} ./
 +
 +
==Changes by release==
 +
*for the WS251 release: use_package.pl script reports that WBGene00004724 is dead and merged into WBGene00013742, need to query out by using pgid 347, which is sas-1 and then move data to the right gene
 +
 +
 +
==Changes to OA==
 +
====March 2023====
 +
Checking script changes (check_disease_annotation.pl)
 +
*Remove check for inferred gene (did_inferredgene), as this is now optional
 +
*Add "genotype" (dis_genotype?)to the check that verifies for presence of wbgene or variation or strain (subject of the annotation), meaning it has to check for the absence of all 4 before writing an error message
 +
 +
====Aug 2018====
 +
*Remove the following values from the Modifier Association type field in the OA drop-down (tab 2)
 +
**Does_not_exacerbate
 +
**Does_not_ameliorate
 +
**No_effect
 +
**Toxic
 +
**Not_toxic
 +
 +
====May 2013====
 +
*Database for Exp Mod  changes to 'OMIM disease for Exp Mod', data can be entered as IDs without the 'OMIM:' as prefix, multiple values comma-separated.
 +
 +
*'Database for Disease Rel' changes to 'OMIM disease for Disease Rel', multiple values are comma-separated, data be entered as IDs without the 'OMIM:' prefix.
 +
 +
*Extra free-text field called 'OMIM gene for Disease Rel' added, data can be entered as IDs without the 'OMIM:' prefix, multiple values comma-separated.
 +
 +
*When data is present in either the 'OMIM disease for Disease Rel' or 'OMIM gene for Disease Rel' fields, script dumps the following line in .ace for each entry as:
 +
 +
Database "OMIM" "disease" "456789" <br>
 +
Database "OMIM" "gene" "456789"
 +
 +
====Aug 2022 (for WS287)====
 +
*Added Asserted gene and Asserted Variation fields to OA, copied everything from Inferred gene to Asserted gene field
 +
*Need to do before uploading for WS287: Take out Asserted gene values if only Disease relevant gene are present without a variation
 +
*Inferred gene deleted from UI, will add if needed in the future as this is meant to be used for rollup rules only (by script)
 +
 +
==Changes to gene-disease dumper==
 +
===Aug 2018===
 +
*See example annotation of PGID 456, that has both 'condition_ameliorated_by' and 'Modifier_qualifier_not', ace file needs to dump as above in example (done)
 +
 +
====Sept 2014: moving OMIM Ids to Accession_evidence====
 +
*Reenable part of script that dumps OMIM ids under the 'Database' tag
 +
*Start dumping the 'Accession_evidence' tag:
 +
*for the Experimental_model tag, look at the Ids either entered as 'OMIM:XXXXX', or just 'XXXXX' in the 'OMIM disease for Exp Mod (dis_dbexpmod)'
 +
*For the Disease_relevance tag, look at the OMIM Ids either as 'OMIM:XXXXX' or just 'XXXXX' in 'OMIM disease for Disease Rel (dis_dbdisrel)' and 'OMIM gene for Disease Relevance (gene_disrel)'
 +
 +
*For each unique OMIM ID the .ace syntax for the gene would be:
 +
<pre style="white-space: pre-wrap;
 +
white-space: -moz-pre-wrap;
 +
white-space: -pre-wrap;
 +
white-space: -o-pre-wrap;
 +
word-wrap: break-word">
 +
Gene : "WBGene00003052"
 +
Database "OMIM" "disease" "115200"
 +
Database "OMIM" "disease" "151660"
 +
Database "OMIM" "disease" "159001"
 +
Database "OMIM" "disease" "176670"
 +
Database "OMIM" "disease" "181350"
 +
Database "OMIM" "disease" "212112"
 +
Database "OMIM" "disease" "248370"
 +
Database "OMIM" "disease" "275210"
 +
Database "OMIM" "disease" "605588"
 +
Database "OMIM" "disease" "610140"
 +
Database "OMIM" "disease" "613205"
 +
Experimental_model "DOID:3911" "Homo sapiens" Accession_evidence  "OMIM"  "176670"
 +
Experimental_model "DOID:0050557" "Homo sapiens" Accession_evidence  "OMIM"  "613205"
 +
Experimental_model "DOID:11726" "Homo sapiens" Accession_evidence  "OMIM"  "181350"
 +
Disease_relevance  "Mutations in human lamin, LMNA, are found in several diseases referred to as the laminopathic diseases, which include Emery-Dreifuss muscular dystrophy (EDMD), LMNA-related congenital muscular dystrophy (L-CMD), limb-girdle muscular dystrophy (L-CMD), Hutchison-Gilford progeria syndrome (HGPS), dilated cardiomyopathy (DCM), Charcot-Marie-Tooth disorder and atypical Werner syndrome; elegans B-type lamin, lmn-1, performs both A and B-type vertebrate lamin functions; similar to A-type lamins, it has roles in development, organization of nuclear pore complexes, and interacts with lamina and nuclear components; similar to B-type lamins, it is expressed widely throughout development, except for sperm, and interacts with B-type lamin-binding proteins; much of the knowledge of the organization and assembly of the nuclear lamina has come from studies in elegans; disease-causing mutations in human LMNA when introduced into elegans lmn-1/lamin alter nuclear lamina organization and dynamics, leading to phenotypes such as decreased fertility and muscle lesions; a mutation found in Hutchison-Gilford progeria syndrome disrupts the supramolecular structure of the lamin filaments in elegans; LMNA mutations that are found in EDMD, DCM and HGPS, when introduced into elegans lmn-1/lamin cause disruption in lamin filament assembly and nuclear localization; also, work in elegans has revealed that lamins are involved in the normal aging process, as worms mutant for lamin age faster."  "Homo sapiens"  Accession_evidence  "OMIM"  "115200"
 +
 +
 +
(will be repeated for the rest of the 10 OMIM Ids in 'OMIM disease for Disease Rel (dis_dbdisrel)', no genes in 'OMIM gene for Disease Relevance (gene_disrel)').
 +
</pre>
 +
 +
Old way of dumping OMIM IDs for genes:
 +
Gene : "WBGene00003052"
 +
Database "OMIM" "disease" "176670"
 +
Database "OMIM" "disease" "613205"
 +
Database "OMIM" "disease" "181350"
 +
 +
===History===
 +
*Need to tell the EBI team that from the WS239 upload (mid-July) we will be dumping Date_last_updated and Curator_confirmed data into citace and they should pick up.
 +
*Disease ontology file location has changed, need to alert JC to change the locations for OA and scripts (done, 08.08.2013):
 +
DO group lists two locations:
 +
Sourceforge: http://sourceforge.net/p/diseaseontology/code/2599/tree/trunk/
 +
 +
OBO Foundry: http://www.berkeleybop.org/ontologies/doid.obo  (will use this source)
 +
 +
==Data checking scripts==
 +
*err.out.<date> and err.annotation.out<date> files are ONLY generated when the data is dumped, meaning run .use_package.pl first to see the errors, these are current as of 03.06.2023 (to alliance standards as well).
 +
*err.out.<date> checks for only invalid papers and invalid DO IDs.
 +
*check_disease_annotation.pl checks an .ace file given as input; looks like this is an old script and needs updating to match above scripts, difference is that this can take an .ace file as input, probably not needed going forward, once data is moved to Alliance. Giving errors that are not valid (03.07.2023)
 +
 +
Change made Nov 5th, 2019-for the disease report-script prints PGIds with no DOID, but does not print other missing fields. 73 PGIDs with no DO term, need to check whether these are valid or they will not have a DOID (because of the biology such as the bah genes).
 +
 +
check the following
 +
*#Curator - required
 +
*#Disease Name - required
 +
*#Disease of Species - required
 +
*#One of the following is required- Disease Relevant Gene, Variation, Strain or Transgene
 +
*#If Variation or Transgene is present, Disease Relevant Gene is required (Remove check, won't work for human transgenes)
 +
*#If Disease Relevant Gene or Variation is present, Inferred Gene is required (keep this check)
 +
*#Association Type - required
 +
*#Evidence Code - required
 +
*#Reference - at least one reference is required
 +
*#Date Last Updated - required
 +
*#If either one of these is present: Modifier Transgene, Modifier Variation, Modifier Strain, Modifier Gene, Modifier Molecule or Other Molecule, then Modifier Association Type is required.
 +
 +
Dump only those annotations into the disease_annotation.ace file when 2, 4, 6,7,8,9,10 and 11 (from above) are satisfied.
 +
 +
==Old pipeline==
 +
[[Old pipeline for disease data]]
  
 
Back To [[Disease and Drugs]]
 
Back To [[Disease and Drugs]]

Revision as of 18:10, 15 March 2023

Disease OA or Disease Model Annotation OA (Feb 2017)

The gene-disease OA (connected in AceDB models to ?Gene class) and the newer Disease_model_annotation data are now in the Disease OA.

Disease OA fields

  1. Curator (dis_curator)
  2. Curator History (dis_curator)
  3. Disease Name (dis_humandoid):Auto-complete drop-down with Disease Ontology (DO) terms
    • single value, constrain
  4. Disease of Species (dis_species)
    • Auto-complete drop-down with controlled vocabulary of species list
    • Single value, constrain
    • Have Homo sapien as default value
    • Comes from the obo_name_species table
  5. Disease relevant gene (dis_wbgene):this is the causative gene of the disease, the disease_relevant_gene in ace model, DB_Object_ID in the DAF
  6. Variation (dis_variation):Autocomplete drop-down with WB Variation list
    • single value, constrain
  7. Strain (dis_strain):Autocomplete drop-down with WB Strain list
    • single value, constrain
  8. Transgene (dis_transgene)
    • single value, constrain
  9. Genotype (dis_genotype)
  10. Asserted Gene (dis_inferredgene):Autocomplete drop-down with WBGene list
    • To indicate the gene that the Variation or Strain refers to, if known, in elegans usually authors state this
    • Can be multiple values, eg. if the Strain or Trangene that models the disease has more than one gene.
  11. Asserted Variation (dis_assertedvariation):Autocomplete drop-down with WBVariation list
  12. Asserted human gene (dis_assertedhumangene)
    • for refering to the human gene, eg. Tau, in a trangenic strain model, HGNC gene names drop-down (formerly was free text
    • For the 'Modeled by' section at least one of Disease_relevant_gene, Variation, Strain or Transgene is required, constrain
  13. Association Type (dis_associationtype):Relationship between the genetic entity (disease_relevant_gene, variation, transgene, or strain in ace model; DB Object in DAF) and the disease.
    • drop-down with the following controlled vocabulary:
      • is_model_of
      • is_implicated_in
      • is_marker_for
      • is_ameliorated_model_of
      • is_exacerbated_model_of
    • single value, constrain
    • If genetic entity dumped for DB Object ID is 'Disease_relevant_gene' than is_model_of not allowed, constrain
  14. Evidence Code (dis_goinference):Autocomplete drop-down with GO codes for now (will adopt ECO later)
    • allow multiple values
    • multiple evidence codes allowed only from one publication, for one model
  15. Qualifier (dis_qualifier):Autocomplete dropdown with only one value 'NOT"
    • Indicates that the genetic entity is 'NOT' a model for disease.
    • the default value is blank, with 'NOT' as the only drop-down choice
    • Note that the qualifiers are tied together in postgres, any change here will affect Modifier_qualifier as well, #32.
  16. Genetic sex (dis_geneticsex):Autocomplete dropdown with the following values:hermaphrodite, male, female
    • single value
    • have 'hermaphrodite' as the default vlaue
  17. Reference (dis_paperexpmod): Autocomplete drop-down with WB Paper list
    • for now will leave as multi-value ontology, until data is cleaned up
    • later on for WS260 will have single value only
  18. Disease Model Description (dis_diseasemodeldesc)
  19. Date_last updated (dis_lastupdateexpmod): Date of original annotation or date last modified
    • single value only, constrain
  20. Remark (dis_comment)
  21. pgid
  22. Inducing Chemical (dis_inducingchemical ):Drop-down with WB Molecule list/ontology
    • Allow multiple values
    • for curator: enter multiple values if only multiple molecules are used to induce the same disease in one experiment.
  23. Inducing agent (dis_inducingagent ): Free text, for inducers not in WB molecule ontology
    • multiple values, will we comma separate?
    • for curator: enter multiple values only if multiple agents were used as inducers for the same disease in one experiment/model.
  24. Experimental condition comment (dis_commentexpcond): free big text
    • for curator: meant for internal curator comments only, in case a WBMol does not exist, or other comments
  25. Modifier transgene (dis_modtransgene):Autocomplete dropdown with WB Transgene list
    • multiple values
    • for curator: enter multiple values, only if multiple transgenes were used as modifiers in one experiment.
  26. Modifier variation (dis_modvariation):Autocomplete dropdown with WB Transgene list
    • multiple values
  27. Modifier strain: (dis_modstrain)Autocomplete dropdown with WB Strain list
    • multiple values
  28. Modifier gene (dis_modgene):Autocomplete dropdown with WB Gene list
    • multiple values, to indicate the gene in the modifying Transgene, Variation, Strain.
  29. Modifier human gene (dis_modhumangene):Autocomplete dropdown with HGNC human gene list, multivalue, to indicate the inferred modifier human gene in the strain or transgene used as modifier
  30. Modifier Genotype (dis_modgenotype):Autocomplete dropdown with WBGenotype list used as modifier
  31. Modifier molecule (dis_modmolecule):Autocomplete dropdown with WB Molecule list
    • multiple values
  32. Other modifier (dis_modother): Free Big Text to indicate other modifiers of the disease eg., diet, radiation, surgery
    • multiple values
    • comma separate multiple values
  33. Modifier association type (dis_moleculetype):Autocomplete dropdown with the following new values:
    • condition_ameliorated_by, condition_exacerbated_by
    • Keep old values: Toxic, No_effect, Does_not_exacerbate, Does_not_ameliorate, Not_toxic
    • for curators: Use multiple values for each type of modifier, only if used in a single experiment to model a single disease from a single paper, they should all be consistant with the modifier_association_type chosen
    • have condition_ameliorated_by as default value, as this is the most common
  34. Modifier qualifier (dis_modqualifier): Autocomplete drop-down with one value: NOT
      • Note that the qualifiers are tied together in postgres, any change here will affect Qualifier as well, #16.
  35. Disease relevance description (dis_diseaserelevance): Free big text box, keep as is
  36. Paper for disease relevance (dis_paperdisrel): keep as is
  37. Last Updated for Disease Rel (dis_lastupdatedisrel): keep as is
    • Do not fill in date when 'New' (annotation) is clicked on
  38. OMIM gene (dis_genedisrel): this is the 'OMIM gene for Disease Rel' in the old OA
    • for OMIM ids, free text, not big, keep as is
    • multiple values, will be comma separated
  39. OMIM disease (dis_dbexpmod): for OMIM ids, free text, not big
    • multiple values, will be comma separated
  40. OMIM disease for Disease Rel (dis_dbdisrel): leave as is
  41. pgid

Deleted fields from OA UI

  1. Molecule (dis_molecule)
  2. Affected_phenotype (dis_phenotypeaffected)
  3. Disease Relevant Gene Text (dis_wbgenetext)
  4. Variation Text (dis_variationtext)
  5. Strain text (dis_straintext)
  6. Requested Strain( (dis_suggested_strain): free text, keep as is, to hold a non-WB strain until it becomes one.
  7. Transgene text (dis_transgenetext)
  8. Interacting variation (dis_interactvariation)
    • auto-complete drop-down with WB variation list
    • allow multiple values
  9. Interacting gene (dis_interactgene)
    • auto-complete drop-down with WB gene list
    • allow multiple values
  10. Interacting transgene (dis_interacttransgene)
    • auto-complete drop-down with WB transgene list
    • allow multiple values
  11. RNAi experiment (dis_rnaiexperiment)
    • autocomplete drop-down with WB RNAi experiment list
    • allow multiple values
  12. Model Remark (dis_modelremark): Free big-text field, to contain remarks specifically related to the model.
      • causes_or_contributes_to_condition
      • causes_condition
      • contributes_to_condition
  13. Disease phenotype (dis_phenotypedisease):Autocomplete dropdown of WB Phenotype ontology terms
    • multiple values
    • for curator: multiple disease phenotypes from same experiment allowed
  14. Ameliorated_phenotype (dis_phenotypeameliorated): Autocomplete dropdown of WB Phenotype ontology terms
    • multiple values
    • for curator: multiple ameliorated phenotypes from same experiment allowed
  15. Exacerbated Phenotype (dis_phenotypeexacerbated): Autocomplete dropdown of WB Phenotype ontology terms
    • multiple values
    • for curator: multiple exacerbated phenotypes from same experiment allowed
  16. Disease phenotype comment (dis_commentdisphen) (free big text)
  17. Requested Phenotype (dis_suggested_phenotype): keep as is
  18. Requested Phenotype Definition (dis_suggested_definition):keep as is
  19. Child of Phenotype (dis_child_of): keep as is

Mapping of fields from old Disease OA to new disease OA (Feb 2017)

Mapping of fields from old Disease OA to new Disease OA
In old Disease OA In new Disease OA DAF column number and name Cardinality in DAF Comment
WB Gene Disease Relevant Gene Inferred gene 0 or more
Curator Curator 1 or more
Curator History Curator History 1 or more
Experimental Model for Disease Name 1
Strain Strain 1
Variation Variation 1
Disease Phenotype Disease Phenotype 0 or more
Transgene Transgene 1 or more
Paper for Exp Mod Reference 1
OMIM disease for Exp Mod OMIM disease
Last Updated for Exp Mod Date Last Updated
Species Disease of Species
Paper for Disease Rel
OMIM disease for Disease Rel Gene Product Form ID
OMIM gene for Disease Rel Experimental Conditions
(to create the model)
Last Updated for Disease Rel DB Object Type R 1 gene, allele, tra
Comment DB R 1 WB
DB Object ID R 1 WB:WBGene00004887
Molecule Type DB Object Symbol R 1 smn-1 A
Molecule Inferred Gene Association O 0 or greater
Affected Phenotype Gene Product Form ID O 0 or 1
Suggested Phenotype
Suggested Phenotype Definition
Child of Phenotype
Suggested Strain

Dumping Disease_model_annotation data

Dumper specifications Part 1

April, 2017

  1. Data comes from Disease OA data tables of postgres
  2. All disease related files at: home/acedb/ranjana/human_disease
  3. Run parseHuman.pl to generate the HumanDO.ace file (the disease ontology .ace file)
  4. Run use_package.pl script to generate the disease data (disease_<date>.ace).
    , runs pm1 for old style data and pm2 for the disease model annotation style data.
  5. For each annotation generate an ID in the form of, starting with: Disease_model_annotation : "00000001"

Changes for WS262 (Aug 2017)

    • Rule for dumping the Disease_model_annotation data: If (any) value exists for Association_type. Deprecate rule: If OA field Variation (dis_variation) OR Strain (dis_strain) OR Transgene (dis_transgene) has data
    • new Association_type 'is_implicated_in', new tags: 'Qualifier_not' and 'Modifier_qualifier_not'.
Mapping of acedb tags, OA fields and postgres tables
acedb tag OA Field Postgres table Example .ace syntax Required or not
1 Disease_term Disease Name dis_humandoid "DOID:3429" required
2 Disease_of_species Disease of Species dis_species "Homo sapiens" not required
3 Strain Strain dis_strain "ANM30" required only if Variation, Transgene and Disease_relevant gene absent
4 Variation Variation dis_variation "WBVar00242728" required only if Strain, Transgene and Disease_relevant_gene absent
5 Transgene Transgene dis_transgene "WBTransgene00006901" required only if Strain, Variation and Disease_relevant_gene are absent
6 Disease_relevant_gene Disease Relevant Gene dis_wbgene "WBGene00003001" required only if Strain, Variation and Transgene are absent
7 Interacting_variation Interacting Variation dis_interactvariation "WBVar00242728" not required
8 Interacting_transgene Interacting Transgene dis_interacttransgene "WBTransgene00006901" not required
9 Interacting_gene Interacting Gene dis_interactgene "WBGene00003001" not required
10 RNAi_experiment RNAi experiment dis_Rnaiexpt "WBRNAi00000171" not required
11 Inferred_gene Inferred Gene dis_inferredgene "WBGene00002990" not required
12 Association_type Association Type dis_associationtype causes_condition required
13 Evidence_code Evidence Code dis_goinference "IMP" required
14 Change for WS262: Qualifier_NOT Qualifier dis_qualifier Qualifier_not not required
15 Inducing_chemical Inducing Chemical dis_inducingchemical "WBMol:00003650" not required
16 Inducing_agent Inducing Agent dis_inducingagent "Glucose" not required
17 Modifier_transgene Modifier Transgene dis_modtransgene "WBTransgene00006901" not required
18 Modifier_variation Modifier Variation dis_modvariation "WBVar00242728" not required
19 Modifier_strain Modifier Strain dis_modstrain "AN170" not required
20 Modifier_gene Modifier Gene dis_modgene "WBGene00004310" not required
21 Modifier_molecule Modifier molecule dis_modmolecule "WBMol:00002660" not required
22 Other_modifier Other Modifier dis_modother "Sugar" not required
23 Modifier_association_type Modifier Association Type dis_moleculetype condition_ameliorated_by required only if any one of these is present: Modifier Transgene, Modifier Variation, Modifier Strain, Modifier gene, Modifier molecule, Other Modifier
24 Change for WS262: Modifier_qualifier_not Modifier Qualifier dis_modqualifier Modifier_qualifier_not not required
25 Genetic sex Genetic Sex dis_geneticsex hermaphrodite not required
26 Disease_phenotype Disease Phenotype dis_phenotypedisease "WBPhenotype:0000884" not required
27 Ameliorated_phenotype Ameliorated Phenotype dis_phenotypeameliorated "WBPhenotype:0003884" not required
28 Exacerbated_phenotype Exacerbated Phenotype dis_phenotypeexacerbated "WBPhenotype:0001884" not required
29 Phenotype_comment Disease Phenotype Comment dis_commentdisphen "Neurons looked sick" not required
30 Paper_evidence Reference dis_paperexpmod "WBPaper00033160" required
31 Disease_model_description Disease Model Description dis_diseasemodeldesc "C. elegans model of Alzeimer's where Abeta expressed." not required
32 Database "OMIM" "gene" OMIM Gene dis_genedisrel "OMIM" "gene" "608441" not required
33 Database "OMIM" "disease OMIM Disease dis_dbexpmod "OMIM" "disease" "610743" not required
34 Curator_confirmed Curator dis_curator "WBPerson324" required
35 Date_last_updated Date Last Updated dis_lastupdateexpmod "2013-02-21" required

QC script for Disease_model_annotation_class data

  • Script will check if all required data/fields is present and print pgid and errors to a file disease_model_annotation_errors_<date>
  • Need to decide if script will be part of dumping script or separate --can be part of dumping script
  • Need to decide if script will check the postgres database or the .ace dump --can check postgres database
  • Required fields:
    • Disease Name
    • One or more of Disease relevant gene, Variation, Strain or Transgene
    • If Disease relevant gene or Variation or Transgene is present, Inferred gene has to be present
    • Association Type
    • Evidence Code
    • Reference
    • Date Last Updated
    • If any one of-- Modifier Transgene, Modifier Variation, Modifier Strain, Modifier Gene, Modifier Molecule, or Other Modifier is present, then Modifier Association Type has to be present

Model for Disease_model_annotation data

?Disease_model_annotation Disease_term UNIQUE ?DO_term XREF Disease_model_annotation
                          Disease_of_species ?Species
                          Modeled_by Strain UNIQUE ?Strain XREF Disease_model_annotation   ?Text     // genetic entity that
                                                                                                  models the disease
                                     Variation UNIQUE ?Variation XREF Disease_model_annotation    ?Text  // genetic entity that                                                                                                                                     
                                                                                                  models the disease
                                     Transgene UNIQUE ?Transgene XREF Disease_model_annotation  ?Text  // genetic entity that
                                                                                                  models the disease
                                     Disease_relevant_gene UNIQUE ?Gene XREF Disease_model_annotation   ?Text   //when the
                                                                                                    genetic entity is a gene
                                     Modeled_by_remark UNIQUE  ?Text
                                     Inferred_gene     ?Gene XREF  Disease_model_annotation //to indicate the associated gene
                          Association_type UNIQUE is_model_of             // All 5 tags describe the relationship between the genetic entity and the disease 
                                                                      is_implicated_in      //new for WS262                                                             
                                                                      is_marker_for
                          Evidence_code ?GO_code                              // will use ECO later on
                          Qualifier_not                               //new for WS262, to indicate that a disease is NOT modeled by X
                          Experimental_condition Inducing_chemical ?Molecule XREF Disease_model_inducer  //to indicate 
                                                                                              the disease-inducing agent
                                                                  Inducing_agent    ?Text     //e.g. diet, radiation,etc not in 
                                                                                    Molecule class
                          Modifier_info                  Modifier_transgene ?Transgene //genetic entity that modifies the disease
                                                                 Modifier_variation ?Variation //(same as above)
                                                                 Modifier_strain    ?Strain    //(same as above)
                                                                 Modifier_gene      ?Gene      //to indicate the gene in the modifying Transgene, Variation, Strain.
                                                                 Modifier_molecule  ?Molecule XREF Disease_model_modifier  // to indicate chemical modifiers of the disease
                                                                 Other_modifier     ?Text      //to indicate other modifiers of the disease eg. diet,radiation, surgery etc
                     Modifier_association_type UNIQUE condition_ameliorated_by  //to indicate the association type between modifiers and disease
                                                                                condition_exacerbated_by
                                                                                condition_not_ameliorated_by //request for WS265 or WS266
                                                                                condition_not_exacerbated by//request for  Ws265 or WS266
                          Modifier_qualifier_not    //new for WS262
                          Genetic_sex UNIQUE      hermaphrodite   //indicates genetic sex of the disease model
                                                                   male
                                                                   female
                          Disease_phenotype_info Disease_phenotype ?Phenotype     //Phenotypes similar to human disease 
                                                                   Ameliorated_phenotype ?Phenotype // Phenotypes ameliorated by modifier
                                                                   Exacerbated_phenotype ?Phenotype // Phenotypes exacerbated by modifier
                                                                   Phenotype_comment ?Text  // To describe non-WPO phenotypes
                          Paper_evidence         ?Paper
                          Disease_model_description  ?Text
                          DB_info Database ?Database ?Database_field ?Text //To indicate the OMIM gene/disease
                          Curator_confirmed ?Curator
                          Date_last_updated UNIQUE DateType

Sample .ace file

This is pgid 397 postgres:

Disease_model_annotation : "00000001"
Disease_term               "DOID:3429"
Disease_of_species         "Homo sapiens"
Strain                     "ANM30"
Association_type           is_model_of
Evidence_code              "IMP"
Genetic_sex                hermaphrodite
Disease_phenotype          "WBPhenotype:0001408"
Paper_evidence             "WBPaper00039877"
Database                   "OMIM"       "disease"       "147421"
Curator_confirmed          "WBPerson324"
Date_last_updated          "2017-03-15"

This is fake data, has example for the Qualifier_not and Modifier_qualifier_not (tags went in for WS262, maybe one real data example for WS262):

Disease_model_annotation : "00000002"
Disease_term               "DOID:14330"
Disease_of_species         "Homo sapiens"
Strain                     "CL4176"
Interacting_variation     "WBVar00000001"
Interacting_transgene  "WBTransgene00000034"
Interacting_gene           "WBGene00000045"
RNAi_experiment          "WBRNAi00000171"
Association_type           is_model_of
Evidence_code              "IMP"
Qualifier_not
Inducing_chemical          "WBMol:00002044"
Modifier_gene              "WBGene00010882"
Modifier_association_type  condition_ameliorated_by
Modifier_qualifier_not
Genetic_sex                hermaphrodite
Disease_phenotype          "WBPhenotype:0001935"
Disease_phenotype          "WBPhenotype:0002426"
Ameliorated_phenotype      "WBPhenotype:0001935"
Paper_evidence             "WBPaper00031384"
Disease_model_description  "In an elegans model of Parkinson's disease, where human alpha-synuclein was overexpressed, RNA interference studies showed that the elegans atg-7/ATG7 (ubiquitin-activating E1 enzyme-like protein), significantly protected against age and dose-dependent degeneration in the dopamine neurons of transgenic worms."
Database                   "OMIM"       "gene"          "608309"
Database                   "OMIM"       "disease"       "605909"
Curator_confirmed          "WBPerson324"
Date_last_updated          "2017-03-13"

Disease_model_annotation : "00000003"
Disease_term               "DOID:10652"
Disease_of_species         "Homo sapiens"
Strain                     "CL4176"
Association_type           is_model_of
Evidence_code              "IMP"
Genetic_sex                hermaphrodite
Modifier_molecule          "WBMol:00004468"
Modifier_association_type  condition_ameliorated_by
Ameliorated_phenotype      "WBPhenotype:0001935"
Paper_evidence             "WBPaper00028904"
Disease_model_description  "A transgenic model of Alzheimer's disease in C. elegans where the expression of human beta amyloid protein (Abeta) in muscle cells causes Abeta aggregation induced paralysis; treatment with Ginkgo biloba extraxt ginkgolide J alleviates Abeta induced paralysis and inhibits Abeta oligomerization."
Database                   "OMIM"       "gene"          "608309"
Database                   "OMIM"       "disease"       "605909"
Curator_confirmed          "WBPerson324"
Date_last_updated          "2017-03-13"

Dumper Specifications, Part 2

  1. All data comes from the Disease OA
  2. For every DO_term in the Disease Name field of the OA take data from the Variation, Strain, Transgene, Inducing Chemical and Modifier Moleucle fields, if present, and dump under the 'Attribute_of' tag of the ?DO_term class, according to the following acedb model:
?DO_term                                                                                                                                                   
Attribute_of     Disease_model_variation    ?Variation XREF  Models_disease //To associate Variations to a disease
	                 Disease_model_strain         ?Strain XREF  Models_disease    //To associate Strains to a disease
                         Disease_model_transgene ?Transgene  XREF  Models_disease //To associate Transgenes to disease
                         Chemical_inducer               ?Molecule   XREF  Induces_disease  //To associate inducing chemicals to a disease
                         Molecule_modifier              ?Molecule   XREF  Modifies_disease   //To associate modifying chemicals to a disease
Mapping of acedb tags, OA fields and postgres tables
acedb tag OA Field Postgres table Example .ace syntax
1 Disease_model_variation Variation dis_variation "WBVar00242728"
2 Disease_model_strain Strain dis_strain "ANM30"
3 Disease_model_transgene Transgene dis_transgene "WBTransgene00006901"
4 Chemical_inducer Inducing Chemical dis_inducingchemical "WBMol:00002044"
5 Molecule_modifier Modifier Molecule dis_modmolecule "WBMol:00004468"

Sample .ace file for ?DO_term class

 
DO_term :                                       "DOID:14330"
Disease_model_variation               "WBVar00242728"
Disease_model_strain                    "CL4176"
Disease_model_transgene             "WBTransgene00006901"
Chemical_inducer                           "WBMol:00002044"
Molecule_modifier                          "WBMol:00002044"

Ontology Annotator for Disease Term

OA for disease term

Dumping data for citace upload

--All scripts are under: /home/acedb/ranjana/human_disease

--A symlink to the script has been created: ln -s /home/postgres/work/citace_upload/dis_disease/use_package.pl
--disease ontology file for the OA is updated by a cron job that runs at 8pm every day. (Script:0 20 * * * /home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl

4 disease related files are sent to citpub@spica.

1. Ontology files:

https://raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/master/src/ontology/doid.obo (Aug 2018) to avoid using the single-inheritance hierarchy for is_a relationships and to use the poly hierarchy.

2. Gene-disease annotation files

  • Run use_package.pl:
  • Note changes (Sept 2018): On Tazendra: if it doesn't have the association type, we just skip that whole pgid, so it doesn't get dumped at all. In the future, when we get rid of this, it will dump things and let you know in the error messages that they need fixing.
  • On the sandbox, we've removed the above, so dumps all the lines that have the new errors and lets you know in the error messages that they need fixing. Will be like this on Tazendra, possibly for WS269.
  • Dumps disease data from the disease OA, into disease_<date>.ace and disease_annotation_<date>.ace
  • script also checks whether all DOIDs in postgres are valid, outputs invalid DOIDs to err.out.<date> file. Note that invalid DOIDs cannot be seen in the OA, identify by PGID and then add the valid DOID to annotation, as the invalid one will not show.
  • scp both files to citpub@spica, under Data_for_citace/Data_from_Ranjana/, add WSXXX to both files; test both files in acedb database

3. Testing files in cminus on spica

  • Transfer file/s to spica, under /home/citpub/Data_for_citace/Data_from_Ranjana
  • On local computer, ssh -Y citpub@spica.caltech.edu, call the database by using 'cminus'; read files and check for errors
  • Transferring multiple remote files to local machine: scp your_username@remote.edu:/some/remote/directory/\{a,b,c\} ./

Changes by release

  • for the WS251 release: use_package.pl script reports that WBGene00004724 is dead and merged into WBGene00013742, need to query out by using pgid 347, which is sas-1 and then move data to the right gene


Changes to OA

March 2023

Checking script changes (check_disease_annotation.pl)

  • Remove check for inferred gene (did_inferredgene), as this is now optional
  • Add "genotype" (dis_genotype?)to the check that verifies for presence of wbgene or variation or strain (subject of the annotation), meaning it has to check for the absence of all 4 before writing an error message

Aug 2018

  • Remove the following values from the Modifier Association type field in the OA drop-down (tab 2)
    • Does_not_exacerbate
    • Does_not_ameliorate
    • No_effect
    • Toxic
    • Not_toxic

May 2013

  • Database for Exp Mod changes to 'OMIM disease for Exp Mod', data can be entered as IDs without the 'OMIM:' as prefix, multiple values comma-separated.
  • 'Database for Disease Rel' changes to 'OMIM disease for Disease Rel', multiple values are comma-separated, data be entered as IDs without the 'OMIM:' prefix.
  • Extra free-text field called 'OMIM gene for Disease Rel' added, data can be entered as IDs without the 'OMIM:' prefix, multiple values comma-separated.
  • When data is present in either the 'OMIM disease for Disease Rel' or 'OMIM gene for Disease Rel' fields, script dumps the following line in .ace for each entry as:

Database "OMIM" "disease" "456789"
Database "OMIM" "gene" "456789"

Aug 2022 (for WS287)

  • Added Asserted gene and Asserted Variation fields to OA, copied everything from Inferred gene to Asserted gene field
  • Need to do before uploading for WS287: Take out Asserted gene values if only Disease relevant gene are present without a variation
  • Inferred gene deleted from UI, will add if needed in the future as this is meant to be used for rollup rules only (by script)

Changes to gene-disease dumper

Aug 2018

  • See example annotation of PGID 456, that has both 'condition_ameliorated_by' and 'Modifier_qualifier_not', ace file needs to dump as above in example (done)

Sept 2014: moving OMIM Ids to Accession_evidence

  • Reenable part of script that dumps OMIM ids under the 'Database' tag
  • Start dumping the 'Accession_evidence' tag:
  • for the Experimental_model tag, look at the Ids either entered as 'OMIM:XXXXX', or just 'XXXXX' in the 'OMIM disease for Exp Mod (dis_dbexpmod)'
  • For the Disease_relevance tag, look at the OMIM Ids either as 'OMIM:XXXXX' or just 'XXXXX' in 'OMIM disease for Disease Rel (dis_dbdisrel)' and 'OMIM gene for Disease Relevance (gene_disrel)'
  • For each unique OMIM ID the .ace syntax for the gene would be:
 
Gene : "WBGene00003052"
Database	"OMIM"	"disease"	"115200"
Database	"OMIM"	"disease"	"151660"
Database	"OMIM"	"disease"	"159001"
Database	"OMIM"	"disease"	"176670"
Database	"OMIM"	"disease"	"181350"
Database	"OMIM"	"disease"	"212112"
Database	"OMIM"	"disease"	"248370"
Database	"OMIM"	"disease"	"275210"
Database	"OMIM"	"disease"	"605588"
Database	"OMIM"	"disease"	"610140"
Database	"OMIM"	"disease"	"613205"
Experimental_model "DOID:3911" "Homo sapiens" Accession_evidence  "OMIM"  "176670"
Experimental_model "DOID:0050557" "Homo sapiens" Accession_evidence  "OMIM"  "613205"
Experimental_model "DOID:11726" "Homo sapiens" Accession_evidence  "OMIM"  "181350"
Disease_relevance   "Mutations in human lamin, LMNA, are found in several diseases referred to as the laminopathic diseases, which include Emery-Dreifuss muscular dystrophy (EDMD), LMNA-related congenital muscular dystrophy (L-CMD), limb-girdle muscular dystrophy (L-CMD), Hutchison-Gilford progeria syndrome (HGPS), dilated cardiomyopathy (DCM), Charcot-Marie-Tooth disorder and atypical Werner syndrome; elegans B-type lamin, lmn-1, performs both A and B-type vertebrate lamin functions; similar to A-type lamins, it has roles in development, organization of nuclear pore complexes, and interacts with lamina and nuclear components; similar to B-type lamins, it is expressed widely throughout development, except for sperm, and interacts with B-type lamin-binding proteins; much of the knowledge of the organization and assembly of the nuclear lamina has come from studies in elegans; disease-causing mutations in human LMNA when introduced into elegans lmn-1/lamin alter nuclear lamina organization and dynamics, leading to phenotypes such as decreased fertility and muscle lesions; a mutation found in Hutchison-Gilford progeria syndrome disrupts the supramolecular structure of the lamin filaments in elegans; LMNA mutations that are found in EDMD, DCM and HGPS, when introduced into elegans lmn-1/lamin cause disruption in lamin filament assembly and nuclear localization; also, work in elegans has revealed that lamins are involved in the normal aging process, as worms mutant for lamin age faster."  "Homo sapiens"  Accession_evidence  "OMIM"  "115200"


(will be repeated for the rest of the 10 OMIM Ids in 'OMIM disease for Disease Rel (dis_dbdisrel)', no genes in 'OMIM gene for Disease Relevance (gene_disrel)').

Old way of dumping OMIM IDs for genes:

Gene : "WBGene00003052"
Database	"OMIM"	"disease"	"176670"
Database	"OMIM"	"disease"	"613205"
Database	"OMIM"	"disease"	"181350"

History

  • Need to tell the EBI team that from the WS239 upload (mid-July) we will be dumping Date_last_updated and Curator_confirmed data into citace and they should pick up.
  • Disease ontology file location has changed, need to alert JC to change the locations for OA and scripts (done, 08.08.2013):

DO group lists two locations: Sourceforge: http://sourceforge.net/p/diseaseontology/code/2599/tree/trunk/

OBO Foundry: http://www.berkeleybop.org/ontologies/doid.obo (will use this source)

Data checking scripts

  • err.out.<date> and err.annotation.out<date> files are ONLY generated when the data is dumped, meaning run .use_package.pl first to see the errors, these are current as of 03.06.2023 (to alliance standards as well).
  • err.out.<date> checks for only invalid papers and invalid DO IDs.
  • check_disease_annotation.pl checks an .ace file given as input; looks like this is an old script and needs updating to match above scripts, difference is that this can take an .ace file as input, probably not needed going forward, once data is moved to Alliance. Giving errors that are not valid (03.07.2023)

Change made Nov 5th, 2019-for the disease report-script prints PGIds with no DOID, but does not print other missing fields. 73 PGIDs with no DO term, need to check whether these are valid or they will not have a DOID (because of the biology such as the bah genes).

check the following

    1. Curator - required
    2. Disease Name - required
    3. Disease of Species - required
    4. One of the following is required- Disease Relevant Gene, Variation, Strain or Transgene
    5. If Variation or Transgene is present, Disease Relevant Gene is required (Remove check, won't work for human transgenes)
    6. If Disease Relevant Gene or Variation is present, Inferred Gene is required (keep this check)
    7. Association Type - required
    8. Evidence Code - required
    9. Reference - at least one reference is required
    10. Date Last Updated - required
    11. If either one of these is present: Modifier Transgene, Modifier Variation, Modifier Strain, Modifier Gene, Modifier Molecule or Other Molecule, then Modifier Association Type is required.

Dump only those annotations into the disease_annotation.ace file when 2, 4, 6,7,8,9,10 and 11 (from above) are satisfied.

Old pipeline

Old pipeline for disease data

Back To Disease and Drugs