Difference between revisions of "WBGene information and status pipeline"

From WormBaseWiki
Jump to navigationJump to search
Line 223: Line 223:
  
 
='''How this all works'''=
 
='''How this all works'''=
 +
 +
'''For updating from nightly gene nameserver dumps:'''
 +
 +
 +
'''For updating from nightly geneace dumps:'''
  
  

Revision as of 16:55, 1 November 2013

Table Summarizing Current/Future Postgres Population

AceDB tag Postgres table Current - Nameserver nightly dump Current - WS bimonthly release Future - Nameserver nightly dump Future - Geneace nightly dump Future - WS bimonthly release Use - Paper or meeting abstract gene connection Use - OA data type curation Use - Dumping scripts -- could be wrong, but I don't think any gin_ tables are used in dumping scripts since we store WBGene IDs. except maybe gin_dead if people want those suppressed or to have some kind of error message or to map to Historical_gene or something like that) Use - Protein2GO data conversion Use - GSA Markup Comment
WBGene identifier gin_wbgene Yes Yes (First line in each entry) Yes Yes Yes
CGC_name gin_locus Yes (CGC) If it has this tag, gene is considered good (What does 'good' mean?) Yes (CGC) No No Yes No
Other_name gin_synonyms Yes (checking for different CGC name - see lines 132-137 in script) Yes Yes (checking for different CGC name) Yes - add to what is already populated from nightly nameserver No Yes No
Sequence_name gin_seqname Yes (Sequence) No Yes (Sequence) No No Yes No
Status gin_dead Yes (0) only if value is dead and species ~ elegans$ Yes (0) only if value is dead in the nameserver nightly; populate with Merged_into and Split_into values No Yes Yes Yes
Suppressed gin_dead If Status from nightly geneace = Suppressed, populate gin_dead with Dead Suppressed (if Status in nightly nameserver is 0 (dead)) or populate gin_dead with Suppressed (if Status in nightly nameserver is 1 (live)). Also add Merged_into and/or Split_into if nightly nameserver Status is 0 (dead).
Merged_into gin_dead No Yes No Yes - add only when status is dead from nightly nameserver No Historical_gene tag uses this when dumping files
Split_into gin_dead No Yes No Yes - add only when status is dead from nightly nameserver No
Corresponding_transcript gin_sequence No Yes No No Yes
Corresponding_CDS gin_sequence + gin_seqprot No Yes No No Yes
Corresponding_protein gin_protein, gin_seqprot No Yes No No Yes Yes, but we'll need isoform data in WB
Molecular_name gin_molname No Yes No No Yes No Maybe
Species This could perhaps be used to populate a future species tag for papers, but this is not an immediate need. Other use cases?
Version_change No Yes, to make sure we don't attach GO annotations to pseudogenes. One use case would be to know when genes change class, e.g. CDS ->Pseudogene. We may not need to actually store this in postgres, though.
Public_name gin_wbgene Yes (but only when no CGC_name or Sequence_name) If it has this tag, gene is considered good Don't need (Public_name also in Other_name - confirm this is always the case) No Not if also in Other_name Not if also in Other_name Not if also in Other_name No I think we can now ignore the Public_name tag as long as there's always an Other_name value as well -- so if there is no Other_name then we'd look at Public_name ? looking at the script, we're not doing anything with this value)

Previous (pre-nameserver move) Scripts:

  1. /home/acedb/cron/populate_gin_locus.pl - updates information from nameserver nightly dumps
  2. /home/acedb/cron/populate_gin.pl - updates information from WS releases

New (post-nameserver move) Scripts:

  1. /home/postgres/work/pgpopulation/obo_oa_ontologies/ws_updates/populate_pg_from_ws.pl

How this all works

For updating from nightly gene nameserver dumps:


For updating from nightly geneace dumps:


For updating from latest WS release:

There are 2 cronjobs and 3 scripts:

in the acedb account:

cronjob 1 on script 1 (runs daily at 3am PST) :

 0 3 * * * /home/acedb/cron/update_ws_tazendra.pl

Also triggers to run 2nd script :

 /home/acedb/cron/dump_from_ws.sh          # dump .ace files from WS to update postgres tables

and write to a timestamp file

 my $dateOfWsDumpFile = /home3/acedb/cron/dump_from_ws/files/latestDate;
 echo "$date" >> $dateOfWsDumpFile;


in the postgres account :

cronjob 2 on script 3 (runs daily at 5am PST) :

 0 5 * * * /home/postgres/work/pgpopulation/obo_oa_ontologies/ws_updates/populate_pg_from_ws.pl

generates files to populate postgres tables for some gin_ tables and obo_ tables for exprcluster when the date in the timestamp file is more recent than the latest timestamp in gin_molname


We'll know when WS141 comes out (should be mid-to-late December 2013) and cronjob 1 picks it up, if this pipeline works automatically.

Some Relevant Postgres Queries:

SELECT * FROM gin_dead WHERE gin_dead ~ 'merged' AND gin_dead ~ 'split';

SELECT * FROM gin_dead WHERE gin_dead ~ 'split';



Back to Caltech documentation