OA and scripts for disease data
Contents
- 1 Disease OA or Disease Model Annotation OA (Feb 2017)
- 2 Mapping of fields from old Disease OA to new disease OA (Feb 2017)
- 3 Dumping Disease_model_annotation data
- 4 Dumping data for citace upload
- 5 Old SOP for upload
- 6 Changes by release
- 7 Changes to OA
- 8 Changes to gene-disease dumper
- 9 Data checking scripts
- 10 Old pipeline
Disease OA or Disease Model Annotation OA (Feb 2017)
The gene-disease OA (connected in AceDB models to ?Gene class) and the newer Disease_model_annotation data are now in the Disease OA.
Disease OA fields
- Curator (dis_curator)
- Curator History (dis_curator)
- Disease Name (dis_humandoid):Auto-complete drop-down with Disease Ontology (DO) terms
- single value, constrain
- Disease of Species (dis_species)
- Auto-complete drop-down with controlled vocabulary of species list
- Single value, constrain
- Have Homo sapien as default value
- Comes from the obo_name_species table
- Disease relevant gene (dis_wbgene):this is the causative gene of the disease, the disease_relevant_gene in ace model, DB_Object_ID in the DAF
- Variation (dis_variation):Autocomplete drop-down with WB Variation list
- single value, constrain
- Strain (dis_strain):Autocomplete drop-down with WB Strain list
- single value, constrain
- Transgene (dis_transgene)
- single value, constrain
- Genotype (dis_genotype)
- Asserted Gene (dis_inferredgene):Autocomplete drop-down with WBGene list
- To indicate the gene that the Variation or Strain refers to, if known, in elegans usually authors state this
- Can be multiple values, eg. if the Strain or Trangene that models the disease has more than one gene.
- Asserted Variation (dis_assertedvariation):Autocomplete drop-down with WBVariation list
- Asserted human gene (dis_assertedhumangene)
- for refering to the human gene, eg. Tau, in a trangenic strain model, HGNC gene names drop-down (formerly was free text
- For the 'Modeled by' section at least one of Disease_relevant_gene, Variation, Strain or Transgene is required, constrain
- Association Type (dis_associationtype):Relationship between the genetic entity (disease_relevant_gene, variation, transgene, or strain in ace model; DB Object in DAF) and the disease.
- drop-down with the following controlled vocabulary:
- is_model_of
- is_implicated_in
- is_marker_for
- is_ameliorated_model_of
- is_exacerbated_model_of
- single value, constrain
- If genetic entity dumped for DB Object ID is 'Disease_relevant_gene' than is_model_of not allowed, constrain
- drop-down with the following controlled vocabulary:
- Evidence Code (dis_goinference):Autocomplete drop-down with GO codes for now (will adopt ECO later)
- allow multiple values
- multiple evidence codes allowed only from one publication, for one model
- Qualifier (dis_qualifier):Autocomplete dropdown with only one value 'NOT"
- Indicates that the genetic entity is 'NOT' a model for disease.
- the default value is blank, with 'NOT' as the only drop-down choice
- Note that the qualifiers are tied together in postgres, any change here will affect Modifier_qualifier as well, #32.
- Genetic sex (dis_geneticsex):Autocomplete dropdown with the following values:hermaphrodite, male, female
- single value
- have 'hermaphrodite' as the default vlaue
- Reference (dis_paperexpmod): Autocomplete drop-down with WB Paper list
- for now will leave as multi-value ontology, until data is cleaned up
- later on for WS260 will have single value only
- Disease Model Description (dis_diseasemodeldesc)
- Date_last updated (dis_lastupdateexpmod): Date of original annotation or date last modified
- single value only, constrain
- Remark (dis_comment)
- pgid
- Inducing Chemical (dis_inducingchemical ):Drop-down with WB Molecule list/ontology
- Allow multiple values
- for curator: enter multiple values if only multiple molecules are used to induce the same disease in one experiment.
- Inducing agent (dis_inducingagent ): Free text, for inducers not in WB molecule ontology
- multiple values, will we comma separate?
- for curator: enter multiple values only if multiple agents were used as inducers for the same disease in one experiment/model.
- Experimental condition comment (dis_commentexpcond): free big text
- for curator: meant for internal curator comments only, in case a WBMol does not exist, or other comments
- Modifier transgene (dis_modtransgene):Autocomplete dropdown with WB Transgene list
- multiple values
- for curator: enter multiple values, only if multiple transgenes were used as modifiers in one experiment.
- Modifier variation (dis_modvariation):Autocomplete dropdown with WB Transgene list
- multiple values
- Modifier strain: (dis_modstrain)Autocomplete dropdown with WB Strain list
- multiple values
- Modifier gene (dis_modgene):Autocomplete dropdown with WB Gene list
- multiple values, to indicate the gene in the modifying Transgene, Variation, Strain.
- Modifier human gene (dis_modhumangene):Autocomplete dropdown with HGNC human gene list, multivalue, to indicate the inferred modifier human gene in the strain or transgene used as modifier
- Modifier Genotype (dis_modgenotype):Autocomplete dropdown with WBGenotype list used as modifier
- During build of WS295 (Nov 2024), it was discovered that no modifier_genotypes were being dumped, changed script to modify this (will be reflected in WS296)
- Comment from JC as follows: I see that the perl module was not dumping it. get_dis_disease_ace_annotation.pm
- It was getting read from the database, but it wasn't being dumped as its own tag. I've made the mapping from table name to .ace tag.
- There's also a separate section for "extraTableToTag" which already was using the data from that table and dumping:
- $extraTableToTag{"modgenotype"} = "Disease_modifier_genotype"; but .ace file didn't have that data
- Modifier molecule (dis_modmolecule):Autocomplete dropdown with WB Molecule list
- multiple values
- Other modifier (dis_modother): Free Big Text to indicate other modifiers of the disease eg., diet, radiation, surgery
- multiple values
- comma separate multiple values
- Modifier association type (dis_moleculetype):Autocomplete dropdown with the following new values:
- condition_ameliorated_by, condition_exacerbated_by
- Keep old values: Toxic, No_effect, Does_not_exacerbate, Does_not_ameliorate, Not_toxic
- for curators: Use multiple values for each type of modifier, only if used in a single experiment to model a single disease from a single paper, they should all be consistant with the modifier_association_type chosen
- have condition_ameliorated_by as default value, as this is the most common
- Modifier qualifier (dis_modqualifier): Autocomplete drop-down with one value: NOT
- Note that the qualifiers are tied together in postgres, any change here will affect Qualifier as well, #16.
- Disease relevance description (dis_diseaserelevance): Free big text box, keep as is
- Paper for disease relevance (dis_paperdisrel): keep as is
- Last Updated for Disease Rel (dis_lastupdatedisrel): keep as is
- Do not fill in date when 'New' (annotation) is clicked on
- OMIM gene (dis_genedisrel): this is the 'OMIM gene for Disease Rel' in the old OA
- for OMIM ids, free text, not big, keep as is
- multiple values, will be comma separated
- OMIM disease (dis_dbexpmod): for OMIM ids, free text, not big
- multiple values, will be comma separated
- OMIM disease for Disease Rel (dis_dbdisrel): leave as is
- pgid
Deleted fields from OA UI
- Molecule (dis_molecule)
- Affected_phenotype (dis_phenotypeaffected)
- Disease Relevant Gene Text (dis_wbgenetext)
- Variation Text (dis_variationtext)
- Strain text (dis_straintext)
- Requested Strain( (dis_suggested_strain): free text, keep as is, to hold a non-WB strain until it becomes one.
- Transgene text (dis_transgenetext)
- Interacting variation (dis_interactvariation)
- auto-complete drop-down with WB variation list
- allow multiple values
- Interacting gene (dis_interactgene)
- auto-complete drop-down with WB gene list
- allow multiple values
- Interacting transgene (dis_interacttransgene)
- auto-complete drop-down with WB transgene list
- allow multiple values
- RNAi experiment (dis_rnaiexperiment)
- autocomplete drop-down with WB RNAi experiment list
- allow multiple values
- Model Remark (dis_modelremark): Free big-text field, to contain remarks specifically related to the model.
- causes_or_contributes_to_condition
- causes_condition
- contributes_to_condition
- Disease phenotype (dis_phenotypedisease):Autocomplete dropdown of WB Phenotype ontology terms
- multiple values
- for curator: multiple disease phenotypes from same experiment allowed
- Ameliorated_phenotype (dis_phenotypeameliorated): Autocomplete dropdown of WB Phenotype ontology terms
- multiple values
- for curator: multiple ameliorated phenotypes from same experiment allowed
- Exacerbated Phenotype (dis_phenotypeexacerbated): Autocomplete dropdown of WB Phenotype ontology terms
- multiple values
- for curator: multiple exacerbated phenotypes from same experiment allowed
- Disease phenotype comment (dis_commentdisphen) (free big text)
- Requested Phenotype (dis_suggested_phenotype): keep as is
- Requested Phenotype Definition (dis_suggested_definition):keep as is
- Child of Phenotype (dis_child_of): keep as is
Mapping of fields from old Disease OA to new disease OA (Feb 2017)
In old Disease OA | In new Disease OA | DAF column number and name | Cardinality in DAF | Comment | |
---|---|---|---|---|---|
WB Gene | Disease Relevant Gene | Inferred gene | 0 or more | ||
Curator | Curator | 1 or more | |||
Curator History | Curator History | 1 or more | |||
Experimental Model for | Disease Name | 1 | |||
Strain | Strain | 1 | |||
Variation | Variation | 1 | |||
Disease Phenotype | Disease Phenotype | 0 or more | |||
Transgene | Transgene | 1 or more | |||
Paper for Exp Mod | Reference | 1 | |||
OMIM disease for Exp Mod | OMIM disease | ||||
Last Updated for Exp Mod | Date Last Updated | ||||
Species | Disease of Species | ||||
Paper for Disease Rel | |||||
OMIM disease for Disease Rel | Gene Product Form ID | ||||
OMIM gene for Disease Rel | Experimental Conditions (to create the model) | ||||
Last Updated for Disease Rel | DB Object Type | R | 1 | gene, allele, tra | |
Comment | DB | R | 1 | WB | |
DB Object ID | R | 1 | WB:WBGene00004887 | ||
Molecule Type | DB Object Symbol | R | 1 | smn-1 | A |
Molecule | Inferred Gene Association | O | 0 or greater | ||
Affected Phenotype | Gene Product Form ID | O | 0 or 1 | ||
Suggested Phenotype | |||||
Suggested Phenotype Definition | |||||
Child of Phenotype | |||||
Suggested Strain |
Dumping Disease_model_annotation data
Dumper specifications Part 1
April, 2017
- Data comes from Disease OA data tables of postgres
- All disease related files at: home/acedb/ranjana/human_disease
- Run parseHuman.pl to generate the HumanDO.ace file (the disease ontology .ace file)
- Run use_package.pl script to generate the disease data (disease_<date>.ace).
, runs pm1 for old style data and pm2 for the disease model annotation style data. - For each annotation generate an ID in the form of, starting with: Disease_model_annotation : "00000001"
Changes for WS262 (Aug 2017)
- Rule for dumping the Disease_model_annotation data: If (any) value exists for Association_type. Deprecate rule: If OA field Variation (dis_variation) OR Strain (dis_strain) OR Transgene (dis_transgene) has data
- new Association_type 'is_implicated_in', new tags: 'Qualifier_not' and 'Modifier_qualifier_not'.
- Rule for dumping the Disease_model_annotation data: If (any) value exists for Association_type. Deprecate rule: If OA field Variation (dis_variation) OR Strain (dis_strain) OR Transgene (dis_transgene) has data
acedb tag | OA Field | Postgres table | Example .ace syntax | Required or not | |
---|---|---|---|---|---|
1 | Disease_term | Disease Name | dis_humandoid | "DOID:3429" | required |
2 | Disease_of_species | Disease of Species | dis_species | "Homo sapiens" | not required |
3 | Strain | Strain | dis_strain | "ANM30" | required only if Variation, Transgene and Disease_relevant gene absent |
4 | Variation | Variation | dis_variation | "WBVar00242728" | required only if Strain, Transgene and Disease_relevant_gene absent |
5 | Transgene | Transgene | dis_transgene | "WBTransgene00006901" | required only if Strain, Variation and Disease_relevant_gene are absent |
6 | Disease_relevant_gene | Disease Relevant Gene | dis_wbgene | "WBGene00003001" | required only if Strain, Variation and Transgene are absent |
7 | Interacting_variation | Interacting Variation | dis_interactvariation | "WBVar00242728" | not required |
8 | Interacting_transgene | Interacting Transgene | dis_interacttransgene | "WBTransgene00006901" | not required |
9 | Interacting_gene | Interacting Gene | dis_interactgene | "WBGene00003001" | not required |
10 | RNAi_experiment | RNAi experiment | dis_Rnaiexpt | "WBRNAi00000171" | not required |
11 | Inferred_gene | Inferred Gene | dis_inferredgene | "WBGene00002990" | not required |
12 | Association_type | Association Type | dis_associationtype | causes_condition | required |
13 | Evidence_code | Evidence Code | dis_goinference | "IMP" | required |
14 | Change for WS262: Qualifier_NOT | Qualifier | dis_qualifier | Qualifier_not | not required |
15 | Inducing_chemical | Inducing Chemical | dis_inducingchemical | "WBMol:00003650" | not required |
16 | Inducing_agent | Inducing Agent | dis_inducingagent | "Glucose" | not required |
17 | Modifier_transgene | Modifier Transgene | dis_modtransgene | "WBTransgene00006901" | not required |
18 | Modifier_variation | Modifier Variation | dis_modvariation | "WBVar00242728" | not required |
19 | Modifier_strain | Modifier Strain | dis_modstrain | "AN170" | not required |
20 | Modifier_gene | Modifier Gene | dis_modgene | "WBGene00004310" | not required |
21 | Modifier_molecule | Modifier molecule | dis_modmolecule | "WBMol:00002660" | not required |
22 | Other_modifier | Other Modifier | dis_modother | "Sugar" | not required |
23 | Modifier_association_type | Modifier Association Type | dis_moleculetype | condition_ameliorated_by | required only if any one of these is present: Modifier Transgene, Modifier Variation, Modifier Strain, Modifier gene, Modifier molecule, Other Modifier |
24 | Change for WS262: Modifier_qualifier_not | Modifier Qualifier | dis_modqualifier | Modifier_qualifier_not | not required |
25 | Genetic sex | Genetic Sex | dis_geneticsex | hermaphrodite | not required |
26 | Disease_phenotype | Disease Phenotype | dis_phenotypedisease | "WBPhenotype:0000884" | not required |
27 | Ameliorated_phenotype | Ameliorated Phenotype | dis_phenotypeameliorated | "WBPhenotype:0003884" | not required |
28 | Exacerbated_phenotype | Exacerbated Phenotype | dis_phenotypeexacerbated | "WBPhenotype:0001884" | not required |
29 | Phenotype_comment | Disease Phenotype Comment | dis_commentdisphen | "Neurons looked sick" | not required |
30 | Paper_evidence | Reference | dis_paperexpmod | "WBPaper00033160" | required |
31 | Disease_model_description | Disease Model Description | dis_diseasemodeldesc | "C. elegans model of Alzeimer's where Abeta expressed." | not required |
32 | Database "OMIM" "gene" | OMIM Gene | dis_genedisrel | "OMIM" "gene" "608441" | not required |
33 | Database "OMIM" "disease | OMIM Disease | dis_dbexpmod | "OMIM" "disease" "610743" | not required |
34 | Curator_confirmed | Curator | dis_curator | "WBPerson324" | required |
35 | Date_last_updated | Date Last Updated | dis_lastupdateexpmod | "2013-02-21" | required |
QC script for Disease_model_annotation_class data
- Script will check if all required data/fields is present and print pgid and errors to a file disease_model_annotation_errors_<date>
- Need to decide if script will be part of dumping script or separate --can be part of dumping script
- Need to decide if script will check the postgres database or the .ace dump --can check postgres database
- Required fields:
- Disease Name
- One or more of Disease relevant gene, Variation, Strain or Transgene
- If Disease relevant gene or Variation or Transgene is present, Inferred gene has to be present
- Association Type
- Evidence Code
- Reference
- Date Last Updated
- If any one of-- Modifier Transgene, Modifier Variation, Modifier Strain, Modifier Gene, Modifier Molecule, or Other Modifier is present, then Modifier Association Type has to be present
Model for Disease_model_annotation data
?Disease_model_annotation Disease_term UNIQUE ?DO_term XREF Disease_model_annotation Disease_of_species ?Species Modeled_by Strain UNIQUE ?Strain XREF Disease_model_annotation ?Text // genetic entity that models the disease Variation UNIQUE ?Variation XREF Disease_model_annotation ?Text // genetic entity that models the disease Transgene UNIQUE ?Transgene XREF Disease_model_annotation ?Text // genetic entity that models the disease Disease_relevant_gene UNIQUE ?Gene XREF Disease_model_annotation ?Text //when the genetic entity is a gene Modeled_by_remark UNIQUE ?Text Inferred_gene ?Gene XREF Disease_model_annotation //to indicate the associated gene Association_type UNIQUE is_model_of // All 5 tags describe the relationship between the genetic entity and the disease is_implicated_in //new for WS262 is_marker_for Evidence_code ?GO_code // will use ECO later on Qualifier_not //new for WS262, to indicate that a disease is NOT modeled by X Experimental_condition Inducing_chemical ?Molecule XREF Disease_model_inducer //to indicate the disease-inducing agent Inducing_agent ?Text //e.g. diet, radiation,etc not in Molecule class Modifier_info Modifier_transgene ?Transgene //genetic entity that modifies the disease Modifier_variation ?Variation //(same as above) Modifier_strain ?Strain //(same as above) Modifier_gene ?Gene //to indicate the gene in the modifying Transgene, Variation, Strain. Modifier_molecule ?Molecule XREF Disease_model_modifier // to indicate chemical modifiers of the disease Other_modifier ?Text //to indicate other modifiers of the disease eg. diet,radiation, surgery etc Modifier_association_type UNIQUE condition_ameliorated_by //to indicate the association type between modifiers and disease condition_exacerbated_by condition_not_ameliorated_by //request for WS265 or WS266 condition_not_exacerbated by//request for Ws265 or WS266 Modifier_qualifier_not //new for WS262 Genetic_sex UNIQUE hermaphrodite //indicates genetic sex of the disease model male female Disease_phenotype_info Disease_phenotype ?Phenotype //Phenotypes similar to human disease Ameliorated_phenotype ?Phenotype // Phenotypes ameliorated by modifier Exacerbated_phenotype ?Phenotype // Phenotypes exacerbated by modifier Phenotype_comment ?Text // To describe non-WPO phenotypes Paper_evidence ?Paper Disease_model_description ?Text DB_info Database ?Database ?Database_field ?Text //To indicate the OMIM gene/disease Curator_confirmed ?Curator Date_last_updated UNIQUE DateType
Sample .ace file
This is pgid 397 postgres: Disease_model_annotation : "00000001" Disease_term "DOID:3429" Disease_of_species "Homo sapiens" Strain "ANM30" Association_type is_model_of Evidence_code "IMP" Genetic_sex hermaphrodite Disease_phenotype "WBPhenotype:0001408" Paper_evidence "WBPaper00039877" Database "OMIM" "disease" "147421" Curator_confirmed "WBPerson324" Date_last_updated "2017-03-15" This is fake data, has example for the Qualifier_not and Modifier_qualifier_not (tags went in for WS262, maybe one real data example for WS262): Disease_model_annotation : "00000002" Disease_term "DOID:14330" Disease_of_species "Homo sapiens" Strain "CL4176" Interacting_variation "WBVar00000001" Interacting_transgene "WBTransgene00000034" Interacting_gene "WBGene00000045" RNAi_experiment "WBRNAi00000171" Association_type is_model_of Evidence_code "IMP" Qualifier_not Inducing_chemical "WBMol:00002044" Modifier_gene "WBGene00010882" Modifier_association_type condition_ameliorated_by Modifier_qualifier_not Genetic_sex hermaphrodite Disease_phenotype "WBPhenotype:0001935" Disease_phenotype "WBPhenotype:0002426" Ameliorated_phenotype "WBPhenotype:0001935" Paper_evidence "WBPaper00031384" Disease_model_description "In an elegans model of Parkinson's disease, where human alpha-synuclein was overexpressed, RNA interference studies showed that the elegans atg-7/ATG7 (ubiquitin-activating E1 enzyme-like protein), significantly protected against age and dose-dependent degeneration in the dopamine neurons of transgenic worms." Database "OMIM" "gene" "608309" Database "OMIM" "disease" "605909" Curator_confirmed "WBPerson324" Date_last_updated "2017-03-13" Disease_model_annotation : "00000003" Disease_term "DOID:10652" Disease_of_species "Homo sapiens" Strain "CL4176" Association_type is_model_of Evidence_code "IMP" Genetic_sex hermaphrodite Modifier_molecule "WBMol:00004468" Modifier_association_type condition_ameliorated_by Ameliorated_phenotype "WBPhenotype:0001935" Paper_evidence "WBPaper00028904" Disease_model_description "A transgenic model of Alzheimer's disease in C. elegans where the expression of human beta amyloid protein (Abeta) in muscle cells causes Abeta aggregation induced paralysis; treatment with Ginkgo biloba extraxt ginkgolide J alleviates Abeta induced paralysis and inhibits Abeta oligomerization." Database "OMIM" "gene" "608309" Database "OMIM" "disease" "605909" Curator_confirmed "WBPerson324" Date_last_updated "2017-03-13"
Dumper Specifications, Part 2
- All data comes from the Disease OA
- For every DO_term in the Disease Name field of the OA take data from the Variation, Strain, Transgene, Inducing Chemical and Modifier Moleucle fields, if present, and dump under the 'Attribute_of' tag of the ?DO_term class, according to the following acedb model:
?DO_term Attribute_of Disease_model_variation ?Variation XREF Models_disease //To associate Variations to a disease Disease_model_strain ?Strain XREF Models_disease //To associate Strains to a disease Disease_model_transgene ?Transgene XREF Models_disease //To associate Transgenes to disease Chemical_inducer ?Molecule XREF Induces_disease //To associate inducing chemicals to a disease Molecule_modifier ?Molecule XREF Modifies_disease //To associate modifying chemicals to a disease
acedb tag | OA Field | Postgres table | Example .ace syntax | |
---|---|---|---|---|
1 | Disease_model_variation | Variation | dis_variation | "WBVar00242728" |
2 | Disease_model_strain | Strain | dis_strain | "ANM30" |
3 | Disease_model_transgene | Transgene | dis_transgene | "WBTransgene00006901" |
4 | Chemical_inducer | Inducing Chemical | dis_inducingchemical | "WBMol:00002044" |
5 | Molecule_modifier | Modifier Molecule | dis_modmolecule | "WBMol:00004468" |
Sample .ace file for ?DO_term class
DO_term : "DOID:14330" Disease_model_variation "WBVar00242728" Disease_model_strain "CL4176" Disease_model_transgene "WBTransgene00006901" Chemical_inducer "WBMol:00002044" Molecule_modifier "WBMol:00002044"
Ontology Annotator for Disease Term
Dumping data for citace upload
Scripts
--All scripts are under: acedb@caltech-curation.textpressolab.com
- for genotypes:/usr/caltech_curation_files/ranjana/genotype
- for disease: /usr/caltech_curation_files/ranjana/human_disease
- gene descriptions files at (generated by Valerio): /usr/caltech_curation_files/pub/gene_descriptions
- for disease ontology obo file, use wget http://purl.obolibrary.org/obo/doid.obo directly in the /usr/caltech_curation_files/Data_for_Ontology
--A symlink to the script has been created: ln -s /home/postgres/work/citace_upload/dis_disease/use_package.pl
--disease ontology file for the OA is updated by a cron job that runs at 8pm every day.
(Script:0 20 * * * /home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl
Upload location
All ace files (total of 7 files) are copied for upload to (on the same machine): /usr/caltech_curation_files/Data_for_citace/Data_from_Ranjana
- 4 disease related files
- 1 genotype file
- 2 gene descriptions file
1 disease ontology.obo file to: /usr/caltech_curation_files/Data_for_Ontology, use wget http://purl.obolibrary.org/obo/doid.obo to directly downolad in this directory
Old SOP for upload
1. Ontology files:
- Run parseHuman.pl:
- (Aug 2018) Switched the source of the ontology to: http://purl.obolibrary.org/obo/doid.obo which currently points to-
https://raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/master/src/ontology/doid.obo (Aug 2018) to avoid using the single-inheritance hierarchy for is_a relationships and to use the poly hierarchy.
- Uses the following source: https://raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/master/src/ontology/doid-non-classified.obo (Feb. 24th 2017)
- Dumps the HumanDO.ace file; Change name to HumanDO_WSXXX.ace; test file in acedb database
- Download the HumanDO obo file from the above link, always check that this is a text file, as sometimes just the link can be downloaded;
2. Gene-disease annotation files
- Run use_package.pl:
- Note changes (Sept 2018): On Tazendra: if it doesn't have the association type, we just skip that whole pgid, so it doesn't get dumped at all. In the future, when we get rid of this, it will dump things and let you know in the error messages that they need fixing.
- On the sandbox, we've removed the above, so dumps all the lines that have the new errors and lets you know in the error messages that they need fixing. Will be like this on Tazendra, possibly for WS269.
- Dumps disease data from the disease OA, into disease_<date>.ace and disease_annotation_<date>.ace
- script also checks whether all DOIDs in postgres are valid, outputs invalid DOIDs to err.out.<date> file. Note that invalid DOIDs cannot be seen in the OA, identify by PGID and then add the valid DOID to annotation, as the invalid one will not show.
- scp both files to citpub@spica, under Data_for_citace/Data_from_Ranjana/, add WSXXX to both files; test both files in acedb database
3. Testing files in cminus on spica
- Transfer file/s to spica, under /home/citpub/Data_for_citace/Data_from_Ranjana
- On local computer, ssh -Y citpub@spica.caltech.edu, call the database by using 'cminus'; read files and check for errors
- Transferring multiple remote files to local machine: scp your_username@remote.edu:/some/remote/directory/\{a,b,c\} ./
Changes by release
- for the WS251 release: use_package.pl script reports that WBGene00004724 is dead and merged into WBGene00013742, need to query out by using pgid 347, which is sas-1 and then move data to the right gene
Changes to OA
March 2023
Checking script changes (check_disease_annotation.pl)
- Remove check for inferred gene (did_inferredgene), as this is now optional
- Add "genotype" (dis_genotype?)to the check that verifies for presence of wbgene or variation or strain (subject of the annotation), meaning it has to check for the absence of all 4 before writing an error message
Aug 2018
- Remove the following values from the Modifier Association type field in the OA drop-down (tab 2)
- Does_not_exacerbate
- Does_not_ameliorate
- No_effect
- Toxic
- Not_toxic
May 2013
- Database for Exp Mod changes to 'OMIM disease for Exp Mod', data can be entered as IDs without the 'OMIM:' as prefix, multiple values comma-separated.
- 'Database for Disease Rel' changes to 'OMIM disease for Disease Rel', multiple values are comma-separated, data be entered as IDs without the 'OMIM:' prefix.
- Extra free-text field called 'OMIM gene for Disease Rel' added, data can be entered as IDs without the 'OMIM:' prefix, multiple values comma-separated.
- When data is present in either the 'OMIM disease for Disease Rel' or 'OMIM gene for Disease Rel' fields, script dumps the following line in .ace for each entry as:
Database "OMIM" "disease" "456789"
Database "OMIM" "gene" "456789"
Aug 2022 (for WS287)
- Added Asserted gene and Asserted Variation fields to OA, copied everything from Inferred gene to Asserted gene field
- Need to do before uploading for WS287: Take out Asserted gene values if only Disease relevant gene are present without a variation
- Inferred gene deleted from UI, will add if needed in the future as this is meant to be used for rollup rules only (by script)
Changes to gene-disease dumper
Aug 2018
- See example annotation of PGID 456, that has both 'condition_ameliorated_by' and 'Modifier_qualifier_not', ace file needs to dump as above in example (done)
Sept 2014: moving OMIM Ids to Accession_evidence
- Reenable part of script that dumps OMIM ids under the 'Database' tag
- Start dumping the 'Accession_evidence' tag:
- for the Experimental_model tag, look at the Ids either entered as 'OMIM:XXXXX', or just 'XXXXX' in the 'OMIM disease for Exp Mod (dis_dbexpmod)'
- For the Disease_relevance tag, look at the OMIM Ids either as 'OMIM:XXXXX' or just 'XXXXX' in 'OMIM disease for Disease Rel (dis_dbdisrel)' and 'OMIM gene for Disease Relevance (gene_disrel)'
- For each unique OMIM ID the .ace syntax for the gene would be:
Gene : "WBGene00003052" Database "OMIM" "disease" "115200" Database "OMIM" "disease" "151660" Database "OMIM" "disease" "159001" Database "OMIM" "disease" "176670" Database "OMIM" "disease" "181350" Database "OMIM" "disease" "212112" Database "OMIM" "disease" "248370" Database "OMIM" "disease" "275210" Database "OMIM" "disease" "605588" Database "OMIM" "disease" "610140" Database "OMIM" "disease" "613205" Experimental_model "DOID:3911" "Homo sapiens" Accession_evidence "OMIM" "176670" Experimental_model "DOID:0050557" "Homo sapiens" Accession_evidence "OMIM" "613205" Experimental_model "DOID:11726" "Homo sapiens" Accession_evidence "OMIM" "181350" Disease_relevance "Mutations in human lamin, LMNA, are found in several diseases referred to as the laminopathic diseases, which include Emery-Dreifuss muscular dystrophy (EDMD), LMNA-related congenital muscular dystrophy (L-CMD), limb-girdle muscular dystrophy (L-CMD), Hutchison-Gilford progeria syndrome (HGPS), dilated cardiomyopathy (DCM), Charcot-Marie-Tooth disorder and atypical Werner syndrome; elegans B-type lamin, lmn-1, performs both A and B-type vertebrate lamin functions; similar to A-type lamins, it has roles in development, organization of nuclear pore complexes, and interacts with lamina and nuclear components; similar to B-type lamins, it is expressed widely throughout development, except for sperm, and interacts with B-type lamin-binding proteins; much of the knowledge of the organization and assembly of the nuclear lamina has come from studies in elegans; disease-causing mutations in human LMNA when introduced into elegans lmn-1/lamin alter nuclear lamina organization and dynamics, leading to phenotypes such as decreased fertility and muscle lesions; a mutation found in Hutchison-Gilford progeria syndrome disrupts the supramolecular structure of the lamin filaments in elegans; LMNA mutations that are found in EDMD, DCM and HGPS, when introduced into elegans lmn-1/lamin cause disruption in lamin filament assembly and nuclear localization; also, work in elegans has revealed that lamins are involved in the normal aging process, as worms mutant for lamin age faster." "Homo sapiens" Accession_evidence "OMIM" "115200" (will be repeated for the rest of the 10 OMIM Ids in 'OMIM disease for Disease Rel (dis_dbdisrel)', no genes in 'OMIM gene for Disease Relevance (gene_disrel)').
Old way of dumping OMIM IDs for genes:
Gene : "WBGene00003052" Database "OMIM" "disease" "176670" Database "OMIM" "disease" "613205" Database "OMIM" "disease" "181350"
History
- Need to tell the EBI team that from the WS239 upload (mid-July) we will be dumping Date_last_updated and Curator_confirmed data into citace and they should pick up.
- Disease ontology file location has changed, need to alert JC to change the locations for OA and scripts (done, 08.08.2013):
DO group lists two locations: Sourceforge: http://sourceforge.net/p/diseaseontology/code/2599/tree/trunk/
OBO Foundry: http://www.berkeleybop.org/ontologies/doid.obo (will use this source)
Data checking scripts
- err.out.<date> and err.annotation.out<date> files are ONLY generated when the data is dumped, meaning run .use_package.pl first to see the errors, these are current as of 03.06.2023 (to alliance standards as well).
- err.out.<date> checks for only invalid papers and invalid DO IDs.
- check_disease_annotation.pl checks an .ace file given as input; looks like this is an old script and needs updating to match above scripts, difference is that this can take an .ace file as input, probably not needed going forward, once data is moved to Alliance. Giving errors that are not valid (03.07.2023)
Change made Nov 5th, 2019-for the disease report-script prints PGIds with no DOID, but does not print other missing fields. 73 PGIDs with no DO term, need to check whether these are valid or they will not have a DOID (because of the biology such as the bah genes).
check the following
- Curator - required
- Disease Name - required
- Disease of Species - required
- One of the following is required- Disease Relevant Gene, Variation, Strain or Transgene
- If Variation or Transgene is present, Disease Relevant Gene is required (Remove check, won't work for human transgenes)
- If Disease Relevant Gene or Variation is present, Inferred Gene is required (keep this check)
- Association Type - required
- Evidence Code - required
- Reference - at least one reference is required
- Date Last Updated - required
- If either one of these is present: Modifier Transgene, Modifier Variation, Modifier Strain, Modifier Gene, Modifier Molecule or Other Molecule, then Modifier Association Type is required.
Dump only those annotations into the disease_annotation.ace file when 2, 4, 6,7,8,9,10 and 11 (from above) are satisfied.
Old pipeline
Back To Disease and Drugs