Difference between revisions of "OA and scripts for disease data"
Line 557: | Line 557: | ||
#'''Curator History''' | #'''Curator History''' | ||
#'''Disease Name:'''Auto-complete drop-down with Disease Ontology (DO) terms | #'''Disease Name:'''Auto-complete drop-down with Disease Ontology (DO) terms | ||
− | + | ##single value, constrain | |
#'''Disease of Species''' | #'''Disease of Species''' | ||
− | + | ##Auto-complete drop-down with controlled vocabulary of species list | |
− | + | ##Single value, constrain | |
#'''Section Heading: Modeled by''' | #'''Section Heading: Modeled by''' | ||
#'''Disease relevant gene:'''this is the causative gene of the disease, the disease_relevant_gene in ace model, DB_Object_ID in the DAF | #'''Disease relevant gene:'''this is the causative gene of the disease, the disease_relevant_gene in ace model, DB_Object_ID in the DAF | ||
#'''Variation:'''Autocomplete drop-down with WB Variation list | #'''Variation:'''Autocomplete drop-down with WB Variation list | ||
− | + | ##single value, constrain | |
#'''Strain:'''Autocomplete drop-down with WB Strain list | #'''Strain:'''Autocomplete drop-down with WB Strain list | ||
− | + | ##single value, constrain | |
− | + | #Transgene | |
− | + | ##single value, constrain | |
− | ''' | + | #'''Inferred gene:'''Autocomplete drop-down with WBGene list |
− | + | ##To indicate the gene that the Variation or Strain refers to, if known, in elegans usually authors state this | |
− | + | ##Can be multiple values, eg. if the Strain or Trangene that models the disease has more than one gene. | |
− | #For the 'Modeled by' section at least one of Disease_relevant_gene, Variation, Strain or Transgene is required, constrain | + | ##For the 'Modeled by' section at least one of Disease_relevant_gene, Variation, Strain or Transgene is required, constrain |
#'''Association Type:'''Relationship between the genetic entity (disease_relevant_gene, variation, transgene, or strain in ace model; DB Object in DAF) and the disease. | #'''Association Type:'''Relationship between the genetic entity (disease_relevant_gene, variation, transgene, or strain in ace model; DB Object in DAF) and the disease. | ||
− | + | ##multi-value drop-down with the following controlled vocabulary:is_model_of, causes_or_contributes_to_condition, causes_condition, contributes_to_condition, is_marker_for | |
− | + | ##single value, constrain | |
− | + | ##If genetic entity dumped for DB Object ID is 'Disease_relevant_gene' than is_model_of not allowed, constrain | |
#'''Evidence Code:'''Autocomplete drop-down with GO codes for now (will adopt ECO later) | #'''Evidence Code:'''Autocomplete drop-down with GO codes for now (will adopt ECO later) | ||
− | + | ##allow multiple values | |
− | + | ##multiple evidence codes allowed only from one publication, for one model | |
#'''Qualifier:'''Autocomplete dropdown with only one value 'NOT" | #'''Qualifier:'''Autocomplete dropdown with only one value 'NOT" | ||
− | + | ##Indicates that the genetic entity is 'NOT' a model for disease. | |
− | + | ##the default value is blank, with 'NOT' as the only drop-down choice | |
#'''Genetic sex:'''Autocomplete dropdown with the following values:hermaphrodite, male, female | #'''Genetic sex:'''Autocomplete dropdown with the following values:hermaphrodite, male, female | ||
− | + | ##single value | |
− | + | ##have 'hermaphrodite' as the default vlaue | |
#'''Reference''': Autocomplete drop-down with WB Paper list | #'''Reference''': Autocomplete drop-down with WB Paper list | ||
− | + | ##single value only, constrain | |
#Date_last updated: Date of original annotation or date last modified | #Date_last updated: Date of original annotation or date last modified | ||
− | + | ##single value only, constrain | |
'''#Section Heading: Experimental Conditions''' | '''#Section Heading: Experimental Conditions''' | ||
#'''Inducing Chemical:'''Drop-down with WB Molecule list/ontology | #'''Inducing Chemical:'''Drop-down with WB Molecule list/ontology | ||
− | + | ##Allow multiple values | |
− | + | ##for curator: enter multiple values if only multiple molecules are used to induce the same disease in one experiment. | |
#'''Inducing agent:''' Free text, for inducers not in WB molecule ontology | #'''Inducing agent:''' Free text, for inducers not in WB molecule ontology | ||
− | + | ##multiple values, will we comma separate? | |
− | + | ##for curator: enter multiple values only if multiple agents were used as inducers for the same disease in one experiment/model. | |
#'''Section Heading: Modifiers of disease''' | #'''Section Heading: Modifiers of disease''' | ||
#'''Modifier_transgene:'''Autocomplete dropdown with WB Transgene list | #'''Modifier_transgene:'''Autocomplete dropdown with WB Transgene list | ||
− | + | ##multiple values | |
− | + | ##for curator: enter multiple values, only if multiple transgenes were used as modifiers in one experiment. | |
#'''Modifier_variation:'''Autocomplete dropdown with WB Transgene list | #'''Modifier_variation:'''Autocomplete dropdown with WB Transgene list | ||
*multiple values | *multiple values |
Revision as of 19:58, 21 February 2017
Contents
- 1 Ontology Annotator for disease data
- 2 Ontology Annotator for Disease Term
- 3 Dumping data for citace upload
- 4 Changes required/issues by release
- 5 Changes to OA May 2013
- 6 Changes to gene-disease dumper, Sept 2014: moving OMIM Ids to Accession_evidence
- 7 To do
- 8 Disease Model Annotation OA (Feb 2017)
- 9 Changes to expand the Disease-Gene OA
- 10 Changes to expand the Disease Term OA
Ontology Annotator for disease data
Note: All disease_relevant descriptions have been removed from the concise descriptions OA and moved to the disease OA.
Fields
One gene can be attached to more than one Experimental_Model and one Disease_Relevance (and their related papers, databases and species); they will be grouped together in one instance of the Editor and grouped together in one line in the data-table. This is similar to a gene being attached to more than one GO term. If a gene needs to be attached to a unrelated disease, enter all data on a new line, by hitting 'New' in the OA.
Editor: Tab 1:
Field 1 WBGene (dis_wbgene):
Behavior of field: Autocomplete obo
Source: WBGene obo
Similar to: WBGene in the GO OA or concise descrips OA
As one starts typing locus name, eg, lin-10 or cosmid name, eg., C09H6 script autocompletes and fills in WBGene ID.
Field 2 Curator (dis_curator):
Behavior of field: Auto-complete drop-down with ready values
Similar to: Curator field in GO OA
Field 3 Curator History (dis_curhistory):
Behavior of field: However it is in the concise OA; this is not something that can be changed manually.
Similar to: consise OA
Field 4 Experimental model for (dis_humandoid):
Behavior:Autocomplete obo
Obo file to be used: DO_term obo
Source: https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo
Similar to: GO term field in the GO OA.
For example, curator starts typing 'Alz', picks 'Alzheimer's disease' from the drop-down and script populates field with 'Alzheimer's disease (DOID:10652); similar to GO term OA in the GO OA.
This is a multi value ontology field, but in general use only one disease/row signifying one experiment.
Q:Updating: How do we update this obo file, how frequently do other obo files get updated?
A: Everyday at 8pm, if it has the proper .obo format it should be easy to
add to the cronjob that picks them up.
/home/postgres/work/pgpopulation/obo_oa_ontologie/update_obo_oa_ontologies.pl
Field 5 Variation (dis_variation):
Autocomplete dropdown, enter only one variation per row/experiment, right now no way to represent a double mutant, i.e no object exists. DO not enter a transgene as well.
Field 6 Disease Phenotype (dis_phenotypedisease):
Multi-ontology field, several disease phenotypes can be entered for a single variation OR a single transgene, but not both.
Field 7 Transgene (dis_transgene):
Autocomplete drop-down, allows only one, if transgene is entered do not enter a variation and vice versa.
Field 8 Paper for Exp Mod (dis_paperexpmod):
Obo file to be used: Paper obo
Behavior:Autocomplete obo
Obo file to be used: WBPaper obo
Similar to: The Paper field in the GO OA
this is multivalue
Field 9 OMIM disease for Exp Mod(dis_dbexpmod): Database for Exp Mod
Behavior: Free text, multiple values comma-separated
Q: Will they dump in separate lines in the output ? Usually those are pipe-separated.
If they'll dump literally as pasted in, then commas are good.
A: Per latest conversation, using commas is fine, as long as there never will be a comma in the data itself, which is not likely to happen as these are OMIM IDs
Field 10 Species(dis_species) :
Behavior: Auto-complete drop-down with ready values
Similar to: Project field in the GO OA
Current values: Homo sapiens
Field 11 Last Updated for Exp Model (dis_lastupdateexpmod):
Script autopopulates date when data is a New line, i.e when the "New" button is used.
Field 12 Disease relevance (dis_diseaserelevance)
Behavior: Big Text box (big text-box, keeps expanding)
Similar To: 'Description Text' field in the Concise OA.
It is Human_disease_relevance description (it appears as one of the drop down values) for the'Description Type' field in the 'Concise' OA.
Field 13 Paper for Disease Rel(dis_paperdisrel)
Behavior: Autocomplete obo
Obo file to be used: WBPaper obo
Similar to: The Paper field in the GO OA
Q:So there's two papers fields. Are they both required, or it must have at least one, or nothing is required ?
A: Both are required.
Q:single/multi value ?
A: Multivalue
Field 14 OMIM disease for Disease Rel(dis_dbdisrel) OMIM disease for Disease Rel
Behavior: Free text, multiple values comma-separated
Q:Same as xref Database, but a different field ?
A: Exactly, again I will pipe-separate multiple values.
Field 15 OMIM gene for Disease Rel: Free text, comma separated
Field 16 Last Updated for Disease Rel (dis_lastupdatedisrel):
Behavior: Script fills in current date if new annotation, if manually changing, entered as YYYY-MM-DD
Script autopopulates date when its a new data line.
Field 17 Comment (dis_comment):
Behavior: Free text
Field 18 pgid
Tab 2 of Editor
Field 19 Molecule Type (dis_moleculetype):
Autocomplete with fixed values--Therapeutic_molecule, Toxic_molecule, Exacerbating_molecule
Choose only one 'type' per row/per pgid/per disease/per variation/per transgene/per phenotype
Field 20 Molecule (dis_molecule):
Autocomplete drop-down from 'Molecule' class in WormBase
Field 21 Affected Phenotype (dis_phenotypeaffected):
Autocomplete single ontology field, from WormBase Phenotype ontology.
Data constraints
For curators only at the tool level to check if required fields are filled.
These dis_ tables : wbgene curator humandoid paperexpmod species diseaserelevance paperdisrel lastupdatedisrel
WBGene
Curator
Experimental model for
Paper for Exp Mod
Species
Disease relevance
Paper for Disease Rel
Last Updated
To make live:
at : /home/postgres/work/pgpopulation/dis_disease/
create_dis_tables.pl -- create new postgres tables for dis_ disease OA
synchronize OA
transfer_concise_disease.pl -- take 95 entries that have con_desctype = 'Human_disease_relevance' and add them to dis_ tables starting with pgid 1.
Ranjana, manually delete the Human_disease_relevance entries from the concise OA.
remove the Human_disease_relevance option from the OA, resynchronize.
Dumper specifications
Dumper module in sandbox at /home/postgres/work/citace_upload/dis_disease/get_dis_disease_ace.pm Copy /home/postgres/work/citace_upload/dis_disease/use_package.pl to a directory you own and run it there.
Mapping between OA fields and acedb tags
Model:
?Gene DB_info Database ?Database ?Database_field Text Disease_info Experimental_model ?DO_term XREF Gene_by_biology ?Species #Evidence Potential_model ?DO_term XREF Gene_by_orthology ?Species #Evidence Disease_relevance ?Text ?Species #Evidence
We do not fill in Potential_model tag, Sanger does.
The example is lov-1 in the disease OA in the sandbox:
Model tag: ?Gene
Use value: WBGene (take ID only)
Eg: WBGene00003058
Model tag: DB_info Database ?Database ?Database_field Text
Use value(s) in 'xref Database' and in 'OMIM database'
Eg: OMIM:173900 and OMIM:601313, do not take OMIM:173900 again from 'OMIM database' since it is a duplicateof that in 'xref Database'.
.ace: Database "OMIM" "disease" "173900" Repeat line for each value if there are multiple values
Model tag: Experimental_model ?DO_term XREF Gene_by_biology ?Species #Evidence
Use value in 'Experimental Model for'
Eg:autosomal dominant polycystic kidney (DOID:5937); take ID only
Use value in 'Species' for ?Species
Eg: Homo sapiens
Use value(s) in 'Paper for Disease Rel' for #Evidence
Eg.WBPaper00038373
Repeat .ace line for every paper if multiple papers are present.
.ace:
Experimental_model DOID:5937 "Homo sapiens" Paper_evidence "WBPaper00038373"
Model tag: Disease_relevance ?Text ?Species #Evidence
Use value in 'Disease Relevance' for ?Text
Eg:lov-1 and pkd-2 encode the orthologs of human Polycystin-1 and Polycystin-2, which are mutated in autosomal dominant polycystic kidney disease; the polycystins regulate signaling involved in normal renal tubular structure and function; studies in the worm C. elegans have contributed extensively to the finding that cystic kidney diseases can be considered ciliopathies; in elegans lov-1 and pkd-2 are expressed in male ciliary neurons, are required for normal male mating behavior, do not seem to be required for ciliogenesis, and each polycystin may actually have a potential inhibitory function on the other for ciliary function; lov-1 and pkd-1 interact with a single-pass transmembrane protein, CWP-5, though the significance of this interaction for polycystic kidney disease is unknown.
Use value in 'Species' for ?Species
Eg. Homo sapiens
Use value in 'Paper for Disease Rel' for #Evidence
Eg: WBPaper00038373
.ace: Disease_relevance "lov-1 and pkd-2 encode the orthologs of human Polycystin-1 and Polycystin-2, which are mutated in autosomal dominant polycystic kidney disease; the polycystins regulate signaling involved in normal renal tubular structure and function; studies in the worm C. elegans have contributed extensively to the finding that cystic kidney diseases can be considered ciliopathies; in elegans lov-1 and pkd-2 are expressed in male ciliary neurons, are required for normal male mating behavior, do not seem to be required for ciliogenesis, and each polycystin may actually have a potential inhibitory function on the other for ciliary function; lov-1 and pkd-1 interact with a single-pass transmembrane protein, CWP-5, though the significance of this interaction for polycystic kidney disease is unknown." "Homo sapiens" Paper_evidence "WBPaper00038373" (Repeat this line for every paper, if multiple papers are present).
So put together, .ace file for lov-1 looks like:
Gene : "WBGene00003058" Database "OMIM" "disease" "173900" Database "OMIM" "disease" "601313" Experimental_model DOID:5937 "Homo sapiens" Paper_evidence "WBPaper00038373" Disease_relevance "lov-1 and pkd-2 encode the orthologs of human Polycystin-1 and Polycystin-2, which are mutated in autosomal dominant polycystic kidney disease; the polycystins regulate signaling involved in normal renal tubular structure and function; studies in the worm C. elegans have contributed extensively to the finding that cystic kidney diseases can be considered ciliopathies; in elegans lov-1 and pkd-2 are expressed in male ciliary neurons, are required for normal male mating behavior, do not seem to be required for ciliogenesis, and each polycystin may actually have a potential inhibitory function on the other for ciliary function; lov-1 and pkd-1 interact with a single-pass transmembrane protein, CWP-5, though the significance of this interaction for polycystic kidney disease is unknown." "Homo sapiens" Paper_evidence "WBPaper00038373"
When to dump data
If data is present in Field 4-- (dis_expmodelfor) Experimental model for, dump this field and the related fields:
Field 5 Name:(dis_paperexpmod) Paper for Exp Mod
Field 6 Name:(dis_xrefdb) Database for Exp Mod
Field 7 Name:(dis_species) Species
If data is present in Field 9 Name:(dis_diseaserelevance) Disease relevance, dump this and the related fields:
Field 10 Name:(dis_paperdisrel) Paper for Disease Rel
Field 11 Name:(dis_omimdb) Database for Disease Rel
Field 7 Name:(dis_species) Species
Code annotation
For get_dis_disease_ace.pm
package get_dis_disease_ace; require Exporter; our @ISA = qw(Exporter); our @EXPORT = qw( getDisease ); our $VERSION = 1.00; # Dumper module to dump Ranjana's dis_ disease data. 2013 01 18 use strict; use diagnostics; use LWP; use LWP::Simple; use DBI; my $dbh = DBI->connect ( "dbi:Pg:dbname=testdb", "", "") or die "Cannot connect to database!\n";# connecting to postres dtabase my $result; my %theHash;# read all the tables from line 25 and store them in a hash; will store all postgres data to parse into .ace output my @tables = qw( wbgene humandoid paperexpmod dbexpmod species diseaserelevance paperdisrel dbdisrel ); #list of postgrestables, as dis_wbgene, dis_humandoid, all begin wiht the prefix dis my $all_entry = ''; #defining all the variables, .ace and the error text my $err_text = ''; my %nameToIDs; # type -> name -> ids -> count; maps WBGenes to PGids my %ids; #just all the PGIDs that are relevant my %deadObjects; #hash of all the dead objects my %dataType; $dataType{humandoid} = 'multi'; $dataType{paperexpmod} = 'multi'; $dataType{paperdisrel} = 'multi'; $dataType{dbexpmod} = 'comma'; $dataType{dbdisrel} = 'comma'; 1; sub populateDeadObjects { $result = $dbh->prepare( "SELECT * FROM pap_status WHERE pap_status = 'invalid';" ); $result->execute(); while (my @row = $result->fetchrow) { $deadObjects{paper}{invalid}{"WBPaper$row[0]"} = $row[1]; } $result = $dbh->prepare( "SELECT * FROM gin_dead;" ); $result->execute(); while (my @row = $result->fetchrow) { # Ranjana doesn't care about hierarchy, just show her an error message if ($row[1]) { $deadObjects{gene}{"WBGene$row[0]"} = $row[1]; } } } # sub populateDeadObjects # we are getting the genes and the papers that are invalid, storing them in the dead objects hash sub getDisease { my ($flag) = shift; #use all or specify the geneID if ( $flag eq 'all' ) { $result = $dbh->prepare( "SELECT * FROM dis_wbgene; " ); } # get all entries for type; # get all entries for all WBGenes else { $result = $dbh->prepare( "SELECT * FROM dis_wbgene WHERE dis_wbgene = '$flag';" ); } # get all entries for type of object intid; #get all entries for WBGenes with the object name being the same as flag $result->execute(); while (my @row = $result->fetchrow) { if ($deadObjects{gene}{$row[1]}) { $err_text .= "pgid $row[0] has $row[1] which is $deadObjects{gene}{$row[1]}\n"; } # add dead wbgenes to error out else { $theHash{object}{$row[0]} = $row[1]; $nameToIDs{object}{$row[1]}{$row[0]}++; $ids{$row[0]}++; } } # add non-dead genes to hashes my $ids = ''; my $qualifier = ''; #now we are checking for dead genes, if dead, gives an error message, if not it is doing what it was doing before, that is dumping. if ($flag ne 'all') { $ids = join"','", sort keys %ids; $qualifier = "WHERE joinkey IN ('$ids')"; } # for all of the tables lsited before, we will restrict it to those PGids, so if we're getting a specific query, we only want the data for that specific set of pgids [1/23/13 3:35:35 PM] j chan: and we do that by adding to the postgres query the qualifier WHERE joinkey IN ('$ids') #query for all tables foreach my $table (@tables) { #for each of those tables we will do this query, $theHash{$table}{$row[0]} = $row[1]; $result = $dbh->prepare( "SELECT * FROM dis_$table $qualifier;" ); # get data for table with qualifier (or not if not) $result->execute(); #query results stored in this hash, %theHash, the hash maps to DOID, $theHash{humandoid}{1} = 'DO:1234' while (my @row = $result->fetchrow) { $theHash{$table}{$row[0]} = $row[1]; } } # foreach my $table (@tables) foreach my $objName (sort keys %{ $nameToIDs{object} }) {# getting each of the objects from the nameTOID hash my $entry = ''; my $has_data; #storing the .ace entry for .ace object $entry .= "\nGene : \"$objName\"\n"; #will dump empty gene objects, if no data present foreach my $pgid (sort {$a<=>$b} keys %{ $nameToIDs{object}{$objName} }) { #for each PGID that has that object name the data will be dumped my $species = ''; if ($theHash{species}{$pgid}) { $species = $theHash{species}{$pgid}; } #will get species value my %omim = (); # filter OMIM results so no duplicates if ($theHash{humandoid}{$pgid}) { #if human DOID my (@doids) = $theHash{humandoid}{$pgid} =~ m/(DOID:\d+)/g;#match for DOID: numbers, DOID:\d+ my @papers; if ($theHash{paperexpmod}{$pgid}) { (@papers) = $theHash{paperexpmod}{$pgid} =~ m/(WBPaper\d+)/g; } #match for WBPaper, WBPaper\d+ foreach my $doid (@doids) { # for each DOID if (scalar @papers > 0) { foreach my $paper (@papers) { $entry .= qq(Experimental_model\t"$doid"\t"$species"\tPaper_evidence\t"$paper"\n); } } #there are papers,Experimental_model\t"$doid"\t"$species"\tPaper_evidence\t"$paper" else { $entry .= qq(Experimental_model\t"$doid"\t"$species"\n); } } #there are no papers, Experimental_model\t"$doid"\t"$species" if ($theHash{dbexpmod}{$pgid}) { my (@om) = $theHash{dbexpmod}{$pgid} =~ m/OMIM:(\d+)/g; foreach (@om) { $omim{$_}++; } } #if there is data in dis_dbexpmod, we are going to match for OMIM:(\d+),but only capture the number, not the OMIM:, store in the OMIM hash } if ($theHash{diseaserelevance}{$pgid}) { # if there is disease relevance, dis_diseaserelevance, convert '-->" for acedb, my $disrel = $theHash{diseaserelevance}{$pgid}; if ($disrel =~ m/\'/) { $disrel =~ s/\'/''/g; } if ($disrel =~ m/\n/) { $disrel =~ s/\n/ /g; } #converts line breaks into spaces my @papers; my @all_papers; if ($theHash{paperexpmod}{$pgid}) { (@all_papers) = $theHash{paperexpmod}{$pgid} =~ m/(WBPaper\d+)/g; } foreach my $paper (@all_papers) { # get all papers and send error message for invalid papers, and add valid to list of papers if ($deadObjects{paper}{invalid}{$paper}) { $err_text .= "pgid $pgid has invalid paper $paper\n"; } else { push @papers, $paper; } } my @papers; my @all_papers; if ($theHash{paperdisrel}{$pgid}) { (@all_papers) = $theHash{paperdisrel}{$pgid} =~ m/(WBPaper\d+)/g; } foreach my $paper (@all_papers) { # get all papers and send error message for invalid papers, and add valid to list of papers if ($deadObjects{paper}{invalid}{$paper}) { $err_text .= "pgid $pgid has invalid paper $paper\n"; } else { push @papers, $paper; } } if ($theHash{paperdisrel}{$pgid}) { (@papers) = $theHash{paperdisrel}{$pgid} =~ m/(WBPaper\d+)/g; } #same as line 73, matching for papers, for the table dis_paperdisrel if (scalar @papers > 0) { foreach my $paper (@papers) { $entry .= qq(Disease_relevance\t"$disrel"\t"$species"\tPaper_evidence\t"$paper"\n); } } #same as 75 and 76, for disease relevance as opposed to DOID else { $entry .= qq(Disease_relevance\t"$disrel"\t"$species"\n); } if ($theHash{dbdisrel}{$pgid}) { my (@om) = $theHash{dbdisrel}{$pgid} =~ m/OMIM:(\d+)/g; foreach (@om) { $omim{$_}++; } } # for disease relevance as opposed to dbexpmod } foreach my $omim (sort keys %omim) { $entry .= qq(Database\t"OMIM"\t"disease"\t"$omim"\n); } #print all the unique OMIM IDs if ($entry) { $has_data++; } # if .ace object has a phenotype, append to whole list } # foreach my $pgid (sort {$a<=>$b} keys %{ $nameToIDs{$type}{$objName} }) if ($has_data) { $all_entry .= $entry; } } # foreach my $objName (sort keys %{ $nameToIDs{$type} }) return( $all_entry, $err_text );# returns all entries, no error checking in place for now; } # sub getDisease __END__ sub getData { my ($cur_entry, $table, $joinkey, $tag, $objName, $goodGenes_ref) = @_; if ($theHash{$table}{$joinkey}) { my $data = $theHash{$table}{$joinkey}; if ($data =~ m/^\"/) { $data =~ s/^\"//; } if ($data =~ m/\"$/) { $data =~ s/\"$//; } if ($data =~ m/ /) { $data =~ s/ //g; } if ($data =~ m/\n/) { $data =~ s/\n/ /g; } if ($data =~ m/^\s+/) { $data =~ s/^\s+//g; } if ($data =~ m/\s+$/) { $data =~ s/\s+$//g; } my @data; if ($data =~ m/\",\"/) { @data = split/\",\"/, $data; } elsif ($pipeSplit{$table}) { @data = split/ \| /, $data; } else { push @data, $data; } foreach my $value (@data) { if ($value =~ m/\"/) { $value =~ s/\"/\\\"/g; } } # foreach my $value (@data) } return $cur_entry; } # sub getData
use_package.pl
#!/usr/bin/perl # use the get_paper_ace.pm module from /home/postgres/work/citace_upload/papers/ # to dump the papers, abstracts (LongText objects), and errors associated with # them. 2005 07 13 # # Change to default get all papers, not just valid ones. 2005 11 10 use strict; use Jex; my $date = &getSimpleSecDate(); my $start_time = time; my $estimate_time = time + 697; my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime($estimate_time); # get time if ($sec < 10) { $sec = "0$sec"; } # add a zero if needed print "START $date -> Estimate $hour:$min:$sec\n"; $date = &getSimpleDate(); use lib qw( /home/postgres/work/citace_upload/dis_disease/ ); use get_dis_disease_ace; #tells script where to get the perl module packages my $outfile = 'disease_' . $date . '.ace'; my $errfile = 'err.out.' . $date; #has two outputs, .ace file and error files, changed file name to open (OUT, ">$outfile") or die "Cannot create $outfile : $!\n"; open (ERR, ">$errfile") or die "Cannot create $errfile : $!\n"; my ($all_entry, $err_text) = &getDisease('all'); # all, gets all objects, need to specify WBGene if only that needed print OUT "$all_entry\n"; if ($err_text) { print ERR "$err_text"; } #will print error file, if errors are found, otherwise not close (OUT) or die "Cannot close $outfile : $!"; close (ERR) or die "Cannot close $errfile : $!"; $date = &getSimpleSecDate(); my $end_time = time; my $diff_time = $end_time - $start_time; print "DIFF $diff_time\n"; print "END $date\n";
Counting script specifications
Counting script counts numbers in Postgres at any given instance and not from the .ace file.
Script at : /home/acedb/ranjana/human_disease/count_disease.pl
1. No. of genes (dis_wbgene): Counts all genes including duplicates, lists PGIDs of duplicate genes
2. No. of unique genes : Counts all genes, only once
3. No. of Experimental Models or DO_terms (dis_humandoid): counts all DO_terms
4. No. of unique Experimental models or DO_terms: does not count repeated DO_terms
5. No. of papers for Experimental models or DO_terms (dis_paperexpmod): counts all papers
6. No. of papers for Disease Relevance (dis_paperdisrel)
7. No. of unique papers in all of disease curation: no. of papers in dis_paperexpmod + no. of papers in dis_paperdisrel, counts a paper only once in both categories, no duplicates
8. No. of disease relevance descriptions (dis_diseaserelevance)
9. No. of OMIM genes connected to (WB)genes: from field 12 in OA-'OMIM gene for Disease Rel' entries look like 'OMIM:607485' or just '607485'; entries are comma separated (What is the Postgres table name? -- dis_genedisrel)
10. No. of OMIM diseases connected to WB genes: from OA Field 'OMIM disease for Exp Mod (dis_dbexpmod) plus (dis_dbdisrel) OA field-OMIM disease for Disease Rel, counts a disease only once, if it appears in both categories; entries look like 'OMIM:607485' or just '607485'; entries are comma separated
Ontology Annotator for Disease Term
Dumping data for citace upload
--All scripts are under: /home/acedb/ranjana/human_disease
--A symlink to the script has been created: ln -s /home/postgres/work/citace_upload/dis_disease/use_package.pl
--disease ontology file for the OA is updated by a cron job that runs at 8pm every day.
(Script:0 20 * * * /home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl
Source:http://www.berkeleybop.org/ontologies/doid.obo, 08.08.2013)
1. Ontology file:
Run parseHuman.pl:
Downloads the HumanDO.obo from http://diseaseontology.svn.sourceforge.net/viewvc/diseaseontology/trunk/HumanDO.obo and converts it to HumanDO.ace. Upload to Spica under Data_for_citace/Data_from_Ranjana/. Change name to HumanDO_WSXXX.ace (Source URL now changed to http://www.berkeleybop.org/ontologies/doid.obo, 08.08.2013)
2. Gene-disease annotation file
Run use_package.pl at /home/acedb/ranjana/human_disease:
Dumps disease data from the disease OA, into disease_<date>.ace, scp file to local machine, change name to disease_WSXXX.ace Upload to Spica at citpub, under Data_for_citace/Data_from_Ranjana/.
Also checks whether all DOIDs in postgres are valid, outputs invalid DOIDs to err.out.<date> file. Note that invalid DOIDs cannot be seen in the OA, identify by PGID and then add the valid DOID to annotation, as the invalid one will not show.
3. DO_term-Worm_model_description annotation file
Run use_package.pl at /home/acedb/ranjana/human_disease/diseaseterm
Dumps disease data from the disease term OA, into diseaseterm_<date>.ace,scp file to local machine, change name to diseaseterm_WSXXX.ace Upload to Spica at citpub, under Data_for_citace/Data_from_Ranjana/.
4. Download the HumanDO.obo file from http://www.berkeleybop.org/ontologies/doid.obo and rename as disease_ontology.WSXXX.obo.
All files should be deposited to:
/home/citpub/Data_for_citace/
/home/citpub/Data_for_Ontology/
Changes required/issues by release
- for the WS251 release: use_package.pl script reports that WBGene00004724 is dead and merged into WBGene00013742, need to query out by using pgid 347, which is sas-1 and then move data to the right gene
Changes to OA May 2013
- Database for Exp Mod changes to 'OMIM disease for Exp Mod', data can be entered as IDs without the 'OMIM:' as prefix, multiple values comma-separated.
- 'Database for Disease Rel' changes to 'OMIM disease for Disease Rel', multiple values are comma-separated, data be entered as IDs without the 'OMIM:' prefix.
- Extra free-text field called 'OMIM gene for Disease Rel' added, data can be entered as IDs without the 'OMIM:' prefix, multiple values comma-separated.
- When data is present in either the 'OMIM disease for Disease Rel' or 'OMIM gene for Disease Rel' fields, script dumps the following line in .ace for each entry as:
Database "OMIM" "disease" "456789"
Database "OMIM" "gene" "456789"
Changes to gene-disease dumper, Sept 2014: moving OMIM Ids to Accession_evidence
- Reenable part of script that dumps OMIM ids under the 'Database' tag
- Start dumping the 'Accession_evidence' tag:
- for the Experimental_model tag, look at the Ids either entered as 'OMIM:XXXXX', or just 'XXXXX' in the 'OMIM disease for Exp Mod (dis_dbexpmod)'
- For the Disease_relevance tag, look at the OMIM Ids either as 'OMIM:XXXXX' or just 'XXXXX' in 'OMIM disease for Disease Rel (dis_dbdisrel)' and 'OMIM gene for Disease Relevance (gene_disrel)'
- For each unique OMIM ID the .ace syntax for the gene would be:
Gene : "WBGene00003052" Database "OMIM" "disease" "115200" Database "OMIM" "disease" "151660" Database "OMIM" "disease" "159001" Database "OMIM" "disease" "176670" Database "OMIM" "disease" "181350" Database "OMIM" "disease" "212112" Database "OMIM" "disease" "248370" Database "OMIM" "disease" "275210" Database "OMIM" "disease" "605588" Database "OMIM" "disease" "610140" Database "OMIM" "disease" "613205" Experimental_model "DOID:3911" "Homo sapiens" Accession_evidence "OMIM" "176670" Experimental_model "DOID:0050557" "Homo sapiens" Accession_evidence "OMIM" "613205" Experimental_model "DOID:11726" "Homo sapiens" Accession_evidence "OMIM" "181350" Disease_relevance "Mutations in human lamin, LMNA, are found in several diseases referred to as the laminopathic diseases, which include Emery-Dreifuss muscular dystrophy (EDMD), LMNA-related congenital muscular dystrophy (L-CMD), limb-girdle muscular dystrophy (L-CMD), Hutchison-Gilford progeria syndrome (HGPS), dilated cardiomyopathy (DCM), Charcot-Marie-Tooth disorder and atypical Werner syndrome; elegans B-type lamin, lmn-1, performs both A and B-type vertebrate lamin functions; similar to A-type lamins, it has roles in development, organization of nuclear pore complexes, and interacts with lamina and nuclear components; similar to B-type lamins, it is expressed widely throughout development, except for sperm, and interacts with B-type lamin-binding proteins; much of the knowledge of the organization and assembly of the nuclear lamina has come from studies in elegans; disease-causing mutations in human LMNA when introduced into elegans lmn-1/lamin alter nuclear lamina organization and dynamics, leading to phenotypes such as decreased fertility and muscle lesions; a mutation found in Hutchison-Gilford progeria syndrome disrupts the supramolecular structure of the lamin filaments in elegans; LMNA mutations that are found in EDMD, DCM and HGPS, when introduced into elegans lmn-1/lamin cause disruption in lamin filament assembly and nuclear localization; also, work in elegans has revealed that lamins are involved in the normal aging process, as worms mutant for lamin age faster." "Homo sapiens" Accession_evidence "OMIM" "115200" (will be repeated for the rest of the 10 OMIM Ids in 'OMIM disease for Disease Rel (dis_dbdisrel)', no genes in 'OMIM gene for Disease Relevance (gene_disrel)').
Old way of dumping OMIM IDs for genes:
Gene : "WBGene00003052" Database "OMIM" "disease" "176670" Database "OMIM" "disease" "613205" Database "OMIM" "disease" "181350"
To do
- Need to tell the EBI team that from the WS239 upload (mid-July) we will be dumping Date_last_updated and Curator_confirmed data into citace and they should pick up.
- Disease ontology file location has changed, need to alert JC to change the locations for OA and scripts (done, 08.08.2013):
DO group lists two locations: Sourceforge: http://sourceforge.net/p/diseaseontology/code/2599/tree/trunk/
OBO Foundry: http://www.berkeleybop.org/ontologies/doid.obo (will use this source)
Disease Model Annotation OA (Feb 2017)
Tab1:
- Curator
- Curator History
- Disease Name:Auto-complete drop-down with Disease Ontology (DO) terms
- single value, constrain
- Disease of Species
- Auto-complete drop-down with controlled vocabulary of species list
- Single value, constrain
- Section Heading: Modeled by
- Disease relevant gene:this is the causative gene of the disease, the disease_relevant_gene in ace model, DB_Object_ID in the DAF
- Variation:Autocomplete drop-down with WB Variation list
- single value, constrain
- Strain:Autocomplete drop-down with WB Strain list
- single value, constrain
- Transgene
- single value, constrain
- Inferred gene:Autocomplete drop-down with WBGene list
- To indicate the gene that the Variation or Strain refers to, if known, in elegans usually authors state this
- Can be multiple values, eg. if the Strain or Trangene that models the disease has more than one gene.
- For the 'Modeled by' section at least one of Disease_relevant_gene, Variation, Strain or Transgene is required, constrain
- Association Type:Relationship between the genetic entity (disease_relevant_gene, variation, transgene, or strain in ace model; DB Object in DAF) and the disease.
- multi-value drop-down with the following controlled vocabulary:is_model_of, causes_or_contributes_to_condition, causes_condition, contributes_to_condition, is_marker_for
- single value, constrain
- If genetic entity dumped for DB Object ID is 'Disease_relevant_gene' than is_model_of not allowed, constrain
- Evidence Code:Autocomplete drop-down with GO codes for now (will adopt ECO later)
- allow multiple values
- multiple evidence codes allowed only from one publication, for one model
- Qualifier:Autocomplete dropdown with only one value 'NOT"
- Indicates that the genetic entity is 'NOT' a model for disease.
- the default value is blank, with 'NOT' as the only drop-down choice
- Genetic sex:Autocomplete dropdown with the following values:hermaphrodite, male, female
- single value
- have 'hermaphrodite' as the default vlaue
- Reference: Autocomplete drop-down with WB Paper list
- single value only, constrain
- Date_last updated: Date of original annotation or date last modified
- single value only, constrain
#Section Heading: Experimental Conditions
- Inducing Chemical:Drop-down with WB Molecule list/ontology
- Allow multiple values
- for curator: enter multiple values if only multiple molecules are used to induce the same disease in one experiment.
- Inducing agent: Free text, for inducers not in WB molecule ontology
- multiple values, will we comma separate?
- for curator: enter multiple values only if multiple agents were used as inducers for the same disease in one experiment/model.
- Section Heading: Modifiers of disease
- Modifier_transgene:Autocomplete dropdown with WB Transgene list
- multiple values
- for curator: enter multiple values, only if multiple transgenes were used as modifiers in one experiment.
- Modifier_variation:Autocomplete dropdown with WB Transgene list
- multiple values
- Modifier_strain:Autocomplete dropdown with WB Strain list
- multiple values
- Modifier_gene:Autocomplete dropdown with WB Gene list
- multiple values
- to indicate the gene in the modifying Transgene, Variation, Strain.
- Modifier_molecule:Autocomplete dropdown with WB Molecule list
- multiple values
- Other_modifier: Free Text to indicate other modifiers of the disease eg., diet, radiation, surgery
- multiple values
- will we comma separate multiple values, or have separate boxes?
- Modifier_association_type:Autocomplete dropdown with the following values:condition_ameliorated_by, condition_exacerbated_by
- for curators: Use multiple values for each type of modifier, only if used in a single experiment to model a single disease from a single paper, they should all be consistant with the modifier_association_type chosen
- have condition_ameliorated_by as default value, as this is the most common
- Disease phenotype
- New Disease phenotype comment (free big text)
- ChangeTransgene to Interacting Transgene (multi-choice WB transgene list): a transgene is introduced in addition to the genetic mutation that is the disease model. See WBPaper00039877: unc-29(e193)+ Abeta expressing transgene
- Paper for Exp Mod
- OMIM disease for Exp Mod
- Last updated for Exp Mod
- Species
- Disease relevance
- Paper for disease relevance
- OMIM disease for Disease Rel
- OMIM gene for Disease Rel
- Last Updated for Disease Rel
- Comment
- pgid
- Molecule Type
- Molecule
- New Other treatment (free text): free text field for now, use for other non-molecule treatments such as radiation etc.
- Affected Phenotype
- New Affected Phenotype comment: free text field
- New Molecule 2: (single value, WB molecule ontology) When there is a second molecule treatment together with the first one.
- New Affected phenotype comment 2 (free text field): To explain either exacerbation or amelioration of the phenotype caused by the first molecule, by the second molecule, or no effect.
- Other Strain (free-text): to hold a non-WB strain until it becomes one.
Changes to expand the Disease-Gene OA
Tab1:
- WBGene: this is the causative gene of the disease, the disease_relevant_gene in ace model, DB_Object_ID in the DAF,
- Is unique, so don't need to change
- Curator
- Curator History
- Experimental Model For
- Change to 'Experimental Model For Disease' (value:DO term)
- Needs to be unique (single value), constrain
- Strain: WB Strain list
- Needs to be unique (single value)
- Variation:
- Needs to be unique (single value)
- New: Inferred gene: To indicate the gene that the Variation or Strain refers to, if known, in elegans usually authors give this
- Can be multiple values, need to think about
- Association Type: Relationship between the genetic entity (disease_relevant_gene, variation, transgene, or strain in ace model; DB Object in DAF) and the disease.
- Should be a multi-value drop-down with the following controlled vocabulary:is_model_of, causes_or_contributes_to_condition, causes_condition, contributes_to_condition, is_marker_for
- Allow only single value (constrain)
- Evidence Code: GO codes for now, multi-value drop-down, allow multiple values.
- Qualifier: Indicates that the genetic entity is 'NOT' a model for disease.
- will have two values, blank value is default with 'NOT' as allowed drop-down choice
- Allow zero or only one value, constrain
- Experimental Conditions: Could we have this as a section heading?
- Disease phenotype
- New Disease phenotype comment (free big text)
- ChangeTransgene to Interacting Transgene (multi-choice WB transgene list): a transgene is introduced in addition to the genetic mutation that is the disease model. See WBPaper00039877: unc-29(e193)+ Abeta expressing transgene
- Paper for Exp Mod
- OMIM disease for Exp Mod
- Last updated for Exp Mod
- Species
- Disease relevance
- Paper for disease relevance
- OMIM disease for Disease Rel
- OMIM gene for Disease Rel
- Last Updated for Disease Rel
- Comment
- pgid
- Molecule Type
- Molecule
- New Other treatment (free text): free text field for now, use for other non-molecule treatments such as radiation etc.
- Affected Phenotype
- New Affected Phenotype comment: free text field
- New Molecule 2: (single value, WB molecule ontology) When there is a second molecule treatment together with the first one.
- New Affected phenotype comment 2 (free text field): To explain either exacerbation or amelioration of the phenotype caused by the first molecule, by the second molecule, or no effect.
- Other Strain (free-text): to hold a non-WB strain until it becomes one.
Changes to expand the Disease Term OA
Each row in this OA should represent one experiment, which is considered as one model. Fields: Lines with 'New' or 'Change' in bold are being considered.
- DO term
- Curator
- Curator history
- Model_type: Experimental, Predicted, NOT_model (could be Experimental by default)
- Variation: multivalue
- Transgene: together with Strain and Variation, and change to multivalue
- Strain: multivalue autocomplete strain drop-down.
- New Other strain (free text): to hold a non-WB strain until it becomes one.
- Disease phenotype, multi-value ontology
- Disease phenotype comment (free text field)
- Species
- Evidence code: will be the GO codes for now (single value drop-down)
- Worm model description
- Paper
- OMIM disease: will be OMIM ids, so multivalue, comma separated
- OMIM gene: will be OMIM ids, so mulitvalue, comma separated
- Last updated
- Comment (free text field)
- pgid
Have a heading for this section of the OA, call it 'Treatment'.
#Change Molecule type to Treatment Type: Exacerbates, Ameliorates, Toxic, No effect, Does_not_exacerbate, Does_not_ameliorate, Not_toxic (These qualifiers will be used to the entire treatment whether to 1 or 2 or 3 treatments together, or should these be to each treatment individually)
# Change Molecule to Molecule 1(autocomplete drop-down, WB molecule ontology)
- Other treatment 1:free text field for now, use for other non-moleucle treatments such as radiation etc.
- Affected phenotype: WB Phenotype Ontology field, multivalue
- Affected phenotype comment:free text field, for comments about the affected phenotype
Have a 'Add molecule button', creates field 'Molecule 2' (WB molecule ontology), if pressed again creates field 'Molecule 3' (WB molecule ontology).
If adding molecule fields on the fly is not possibl, will have to have at least 3 molecule fields.
- New Molecule 2: (single value, WB molecule ontology) When there is a second molecule treatment together with the first one.
- New Affected phenotype comment 2 (free text field): To explain either exacerbation or amelioration of the first molecule caused phenotype by the second molecule, or no effect.
The below fields are still being worked out, will not be added to the OA as yet:
- Interacting gene:
- Interacting gene effect:
- Interacting other gene:
- Interacting other gene effect:
- Interacting variation:
- Interacting variation effect:
Back To Disease and Drugs