Difference between revisions of "Source and maintenance of non-WBGene info"

From WormBaseWiki
Jump to navigationJump to search
m
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
[[Caltech_documentation]]
 +
 +
__TOC__
 +
==Geneace dump from Hinxton==
 +
Information for gene, variation, clone, strain, rearrangement, and laboratory are provided in a nightly json dump from Hinxton.  The gene information is discussed [[WBGene_information_and_status_pipeline | over here]]. This current page outlines the processing of all non gene information supplied through the dump.
 +
 +
==nightly_geneace.pl==
 +
/home/postgres/work/pgpopulation/obo_oa_ontologies/geneace/nightly_geneace.pl<br>
 +
<pre>
 +
For variations:
 +
-populates obo_name/data_<datatype> tables where <datatype> is variation, clone, strain, or rearrangement
 +
-adds any WBVarID not in the geneace nightly dump but on obo_tempfile at /home/azurebrd/public_html/cgi-bin/data/obo_tempfile and
 +
-compares WBVar to Public_name mapping in both files by both public_name and WBVar, emails curator (Karen) if it's different. The curator needs to edit the obo_tempfile to resolve the differences, otherwise an email will continue to be sent.
 +
-NOTE: objects added to obo_tempfile are immediately available in the variation dropdown and will remain on obo_tempfile until the object shows up in the nightly geneace dump.
 +
</pre>
 +
 +
<pre>
 +
For Clone:
 +
-script extracts Type: plasmid only from ftp://ftp.sanger.ac.uk/pub/consortia/wormbase/STAFF/mh6/nightly_geneace/clones2.ace.gz
 +
-Need to change this to take all clones.  Construct curation will require annotating using cDNAs, Fosmids, Cosmids, etc.
 +
</pre>
 +
===Variations===
 +
For each variation with specific Method (listed in table below), the following information will be retrieved:
 +
*WBVar ID
 +
*public_name
 +
*gene association
 +
*references
 +
*method
 +
*status
 +
<pre>
 +
If the variation does not have one of the following attached methods, it is not retrieved.
 +
  Allele
 +
  Deletion_allele
 +
  Deletion_and_insertion_allele
 +
  Deletion_polymorphism
 +
  Insertion_allele
 +
  Insertion_polymorhism
 +
  KO_consortium_allele
 +
  Mos_insertion
 +
  NBP_knockout_allele
 +
  NemaGENETAG_consortium_allele
 +
  Substitution_allele
 +
  Transposon_insertion
 +
  Engineered_allele
 +
</pre>
 +
 +
These data will populate /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace files <br>
 +
obo_name_variation<br>
 +
obo_data_variation <br>
 +
 +
in obo_name_variation, entries are like:
 +
<pre>
 +
WBVar00000020 ad487 2015-11-30 20:01:08
 +
</pre>
 +
in obo_data_variation, entries are like:
 +
<pre>
 +
WBVar00088136 id: WBVar00088136\nname: "ju2"\nspecies: "Caenorhabditis elegans"\nstatus: "Live"\ngene: "WBGene00006363 syd-1"\nreference: "WBPaper00005543" 2015-11-30 20:01:08
 +
</pre>
 +
 +
If a variation does not exist in the geneace dump, and hence not in obo_name/data_variation tables
 +
* retrieve a WBVarID from the variation nameserver at http://www.sanger.ac.uk/sanger/Worm_NameServer (you will need a login and password, which may take a while to be assigned)
 +
* enter the public name and WBVarId, separated by a space OR tab, into the TempVariationObo http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo  * * note: The form can take columns of data as long as it is in the format of <allele public name> <WBVarID>. 
 +
The information is added immediately to the obo_name_variation and should be available through the OA variation field (a form reload may be necessary).
 +
 +
If the allele already has a WBVarID but does not exist in the nightly geneace dump, curators should still enter the object through the generic.cgi.  <br>
 +
When the object comes through during the geneace dump,
 +
* the objects in geneace are compared against the objects on the obo_temp_variation
 +
* if there is no discrepancy the information will be captured and overwritten in obo_data_variation.
 +
* if there is a discrepancy, that is the WBVarID and public name on obo_temp_variation does not match the WBVarID and mapped public_name from geneace an alert will be sent with the discrepancy
 +
  nightly_geneace.pl@tazendra.caltech.edu
 +
  8:01 PM (19 hours ago)
 +
  to kyook
 +
  WBVar00604189 in obo_tempfile_variation says mn688 geneace says m688
 +
  WBVar00296275 in obo_tempfile_variation says otn567 geneace says ot567
 +
 +
* If a variation needs correcting, go to /home/azurebrd/public_html/cgi-bin/data/obo_tempfile_variation
 +
 +
===Clone===
 +
*clone
 +
*type
 +
*strain
 +
*general_remark
 +
*location
 +
*accession_number
 +
 +
===Strain===
 +
*strain
 +
*genotype
 +
*location
 +
 +
===Laboratory===
 +
*laboratory
 +
*representative
 +
*registered_lab_members
 +
*allele_designation
 +
*strain_designatin
 +
*mail*
 +
 +
===Rearrangement===
 +
*rearrangenment
 +
*gene_inside
 +
*gene_outside
 +
*map
 +
 +
==Non-WBGene objects retrieved through geneace==
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
!AceDB tag
 
!AceDB tag
Line 82: Line 187:
 
|no
 
|no
 
|no
 
|no
|Only take in data from Variation objects with these Methods:<br>"Allele"<br>"Deletion_allele"<br>"Deletion_and_insertion_allele"<br>"Deletion_polymorphism<br>"Insertion_allele"<br>"Insertion_polymorhism"<br>"KO_consortium_allele"<br>"Mos_insertion"<br>"NBP_knockout_allele"<br>"NemaGENETAG_consortium_allele"<br>"Substitution_allele"<br>"Transposon_insertion"
+
|Only take in data from Variation objects with these Methods:<br>"Allele"<br>"Deletion_allele"<br>"Deletion_and_insertion_allele"<br>"Deletion_polymorphism<br>"Insertion_allele"<br>"Insertion_polymorhism"<br>"KO_consortium_allele"<br>"Mos_insertion"<br>"NBP_knockout_allele"<br>"NemaGENETAG_consortium_allele"<br>"Substitution_allele"<br>"Transposon_insertion"<br>"Engineered_allele"
 
|-
 
|-
 
|Status  
 
|Status  

Revision as of 22:29, 20 May 2018

Caltech_documentation

Geneace dump from Hinxton

Information for gene, variation, clone, strain, rearrangement, and laboratory are provided in a nightly json dump from Hinxton. The gene information is discussed over here. This current page outlines the processing of all non gene information supplied through the dump.

nightly_geneace.pl

/home/postgres/work/pgpopulation/obo_oa_ontologies/geneace/nightly_geneace.pl

For variations:
-populates obo_name/data_<datatype> tables where <datatype> is variation, clone, strain, or rearrangement 
-adds any WBVarID not in the geneace nightly dump but on obo_tempfile at /home/azurebrd/public_html/cgi-bin/data/obo_tempfile and 
-compares WBVar to Public_name mapping in both files by both public_name and WBVar, emails curator (Karen) if it's different. The curator needs to edit the obo_tempfile to resolve the differences, otherwise an email will continue to be sent.
-NOTE: objects added to obo_tempfile are immediately available in the variation dropdown and will remain on obo_tempfile until the object shows up in the nightly geneace dump. 
For Clone:
-script extracts Type: plasmid only from ftp://ftp.sanger.ac.uk/pub/consortia/wormbase/STAFF/mh6/nightly_geneace/clones2.ace.gz
-Need to change this to take all clones.  Construct curation will require annotating using cDNAs, Fosmids, Cosmids, etc. 

Variations

For each variation with specific Method (listed in table below), the following information will be retrieved:

  • WBVar ID
  • public_name
  • gene association
  • references
  • method
  • status
If the variation does not have one of the following attached methods, it is not retrieved. 
   Allele
   Deletion_allele
   Deletion_and_insertion_allele
   Deletion_polymorphism
   Insertion_allele
   Insertion_polymorhism
   KO_consortium_allele
   Mos_insertion
   NBP_knockout_allele
   NemaGENETAG_consortium_allele
   Substitution_allele
   Transposon_insertion
   Engineered_allele

These data will populate /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace files
obo_name_variation
obo_data_variation

in obo_name_variation, entries are like:

WBVar00000020	ad487	2015-11-30 20:01:08

in obo_data_variation, entries are like:

WBVar00088136	id: WBVar00088136\nname: "ju2"\nspecies: "Caenorhabditis elegans"\nstatus: "Live"\ngene: "WBGene00006363 syd-1"\nreference: "WBPaper00005543"	2015-11-30 20:01:08

If a variation does not exist in the geneace dump, and hence not in obo_name/data_variation tables

The information is added immediately to the obo_name_variation and should be available through the OA variation field (a form reload may be necessary).

If the allele already has a WBVarID but does not exist in the nightly geneace dump, curators should still enter the object through the generic.cgi.
When the object comes through during the geneace dump,

  • the objects in geneace are compared against the objects on the obo_temp_variation
  • if there is no discrepancy the information will be captured and overwritten in obo_data_variation.
  • if there is a discrepancy, that is the WBVarID and public name on obo_temp_variation does not match the WBVarID and mapped public_name from geneace an alert will be sent with the discrepancy
 nightly_geneace.pl@tazendra.caltech.edu
 8:01 PM (19 hours ago)
 to kyook 
 WBVar00604189 in obo_tempfile_variation says mn688 geneace says m688
 WBVar00296275 in obo_tempfile_variation says otn567 geneace says ot567
  • If a variation needs correcting, go to /home/azurebrd/public_html/cgi-bin/data/obo_tempfile_variation

Clone

  • clone
  • type
  • strain
  • general_remark
  • location
  • accession_number

Strain

  • strain
  • genotype
  • location

Laboratory

  • laboratory
  • representative
  • registered_lab_members
  • allele_designation
  • strain_designatin
  • mail*

Rearrangement

  • rearrangenment
  • gene_inside
  • gene_outside
  • map

Non-WBGene objects retrieved through geneace

AceDB tag Postgres table Current - Nameserver nightly dump Current - WS bimonthly release Future - Geneace nightly dump Future - WS bimonthly release Use - Paper or meeting abstract gene connection Use - OA data type curation Use - OA term info Use - Dumping scripts Use -Text mining/SVM Use - Updating GSA Lexicon Comment
Variation obo_name_variation
obo_data_variation
yes yes yes no no yes yes yes no no WBVariationID
Variation public_name obo_name_variation
obo_data_variation
no yes yes no no yes yes no For Mary Ann's Variation first pass/SVM For Variation lexicon In multiple OAs
Variation- Gene obo_data_variation no yes yes no no no yes
Display WBGeneID and gin_locus
no no no
Variation -Reference obo_data_variation no yes yes no no no yes no yes? for MA's scripts?? no
Variation -Method obo_data_variation no no
used to query for Variation type Allele and Transposon
yes no no no yes no no no Only take in data from Variation objects with these Methods:
"Allele"
"Deletion_allele"
"Deletion_and_insertion_allele"
"Deletion_polymorphism
"Insertion_allele"
"Insertion_polymorhism"
"KO_consortium_allele"
"Mos_insertion"
"NBP_knockout_allele"
"NemaGENETAG_consortium_allele"
"Substitution_allele"
"Transposon_insertion"
"Engineered_allele"
Status obo_data_variation yes yes yes no no no yes no no no
Rearrangement obo_name_rearrangement
obo_data_rearrangement
no yes yes no no yes yes no no yes
Rearrangement -map obo_data_rearrangement no yes yes no no no yes no no no
gene_inside obo_data_rearrangement no yes yes no no no yes
display gin_locus (do not need WBGeneID)
no no no
gene_outside obo_data_rearrangement no yes yes no no no yes
display gin_locus (do not need WBGeneID)
no no no
Strain obo_name_strain
obo_data_strain
no yes yes no no yes yes no no yes
Strain -genotype obo_data_strain no yes yes no no no yes no no no
Strain- location obo_data_strain no yes yes no no no yes no no no
Clone obo_name_clone
obo_data_clone
no yes yes no no yes (expr_pattern) yes no? no yes
Clone -Type Not sure you need a table for this. All clones that populate the clone tables will be of one type = PLASMID no yes yes no no no yes no no no
Clone -Transgene obo_data_clone no yes yes no no no yes no no no I don't think there is any data in this tag in the ftp cloness.ace
Clone -strain obo_data_clone no yes yes no no no yes no no no
Clone -general_remark obo_data_clone no yes yes no no no yes no no no
Clone -location obo_data_clone no yes yes no no no yes no no no
Clone -accession_number obo_data_clone no yes yes no no no yes no no no
Laboratory obo_name_laboratory
obo_data_laboratory
no yes yes no no yes yes no no no
Laboratory -Representative obo_name_laboratory
obo_data_laboratory
no yes yes no no yes yes no no no
Laboratory -Registered_lab_members obo_data_laboratory - actually I don't know if this needs to be displayed in the term info no yes yes no no no yes no no no
Laboratory - allele_designation obo_data_laboratory no yes yes no no no yes no yes MA's script yes? use for text markup regex?
Laboratory - strain_designation obo_data_laboratory no yes yes no no no yes no no no
Laboratory -Mail obo_data_laboratory no yes yes no no no yes no no no