Revision as of 16:55, 1 November 2013

Table Summarizing Current/Future Postgres Population

AceDB tag	Postgres table	Current - Nameserver nightly dump	Current - WS bimonthly release	Future - Nameserver nightly dump	Future - Geneace nightly dump	Future - WS bimonthly release	Use - Paper or meeting abstract gene connection	Use - OA data type curation	Use - Dumping scripts -- could be wrong, but I don't think any gin_ tables are used in dumping scripts since we store WBGene IDs. except maybe gin_dead if people want those suppressed or to have some kind of error message or to map to Historical_gene or something like that)	Use - Protein2GO data conversion	Use - GSA Markup
WBGene identifier	gin_wbgene	Yes		Yes (First line in each entry)			Yes	Yes		Yes
CGC_name	gin_locus	Yes (CGC)	If it has this tag, gene is considered good (What does 'good' mean?)	Yes (CGC)	No	No	Yes		No
Other_name	gin_synonyms	Yes (checking for different CGC name - see lines 132-137 in script)	Yes	Yes (checking for different CGC name)	Yes - add to what is already populated from nightly nameserver	No	Yes		No
Sequence_name	gin_seqname	Yes (Sequence)	No	Yes (Sequence)	No	No	Yes		No
Status	gin_dead	Yes (0)	only if value is dead and species ~ elegans$	Yes (0)	only if value is dead in the nameserver nightly; populate with Merged_into and Split_into values	No	Yes	Yes	Yes
Suppressed	gin_dead				If Status from nightly geneace = Suppressed, populate gin_dead with Dead Suppressed (if Status in nightly nameserver is 0 (dead)) or populate gin_dead with Suppressed (if Status in nightly nameserver is 1 (live)). Also add Merged_into and/or Split_into if nightly nameserver Status is 0 (dead).
Merged_into	gin_dead	No	Yes	No	Yes - add only when status is dead from nightly nameserver	No		Historical_gene tag uses this when dumping files
Split_into	gin_dead	No	Yes	No	Yes - add only when status is dead from nightly nameserver	No
Corresponding_transcript	gin_sequence	No	Yes	No	No	Yes
Corresponding_CDS	gin_sequence + gin_seqprot	No	Yes	No	No	Yes
Corresponding_protein	gin_protein, gin_seqprot	No	Yes	No	No	Yes			Yes, but we'll need isoform data in WB
Molecular_name	gin_molname	No	Yes	No	No	Yes	No		Maybe
Species											This could perhaps be used to populate a future species tag for papers, but this is not an immediate need. Other use cases?
Version_change			No						Yes, to make sure we don't attach GO annotations to pseudogenes.		One use case would be to know when genes change class, e.g. CDS ->Pseudogene. We may not need to actually store this in postgres, though.
Public_name	gin_wbgene	Yes (but only when no CGC_name or Sequence_name)	If it has this tag, gene is considered good	Don't need (Public_name also in Other_name - confirm this is always the case)	No	Not if also in Other_name	Not if also in Other_name	Not if also in Other_name	No		I think we can now ignore the Public_name tag as long as there's always an Other_name value as well -- so if there is no Other_name then we'd look at Public_name ? looking at the script, we're not doing anything with this value)

Previous (pre-nameserver move) Scripts:

/home/acedb/cron/populate_gin_locus.pl - updates information from nameserver nightly dumps
/home/acedb/cron/populate_gin.pl - updates information from WS releases

New (post-nameserver move) Scripts:

/home/postgres/work/pgpopulation/obo_oa_ontologies/ws_updates/populate_pg_from_ws.pl

How this all works

For updating from nightly gene nameserver dumps:

For updating from nightly geneace dumps:

For updating from latest WS release:

There are 2 cronjobs and 3 scripts:

in the acedb account:

cronjob 1 on script 1 (runs daily at 3am PST) :

 0 3 * * * /home/acedb/cron/update_ws_tazendra.pl

Also triggers to run 2nd script :

 /home/acedb/cron/dump_from_ws.sh          # dump .ace files from WS to update postgres tables

and write to a timestamp file

 my $dateOfWsDumpFile = /home3/acedb/cron/dump_from_ws/files/latestDate;
 echo "$date" >> $dateOfWsDumpFile;

in the postgres account :

cronjob 2 on script 3 (runs daily at 5am PST) :

 0 5 * * * /home/postgres/work/pgpopulation/obo_oa_ontologies/ws_updates/populate_pg_from_ws.pl

generates files to populate postgres tables for some gin_ tables and obo_ tables for exprcluster when the date in the timestamp file is more recent than the latest timestamp in gin_molname

We'll know when WS141 comes out (should be mid-to-late December 2013) and cronjob 1 picks it up, if this pipeline works automatically.

Some Relevant Postgres Queries:

SELECT * FROM gin_dead WHERE gin_dead ~ 'merged' AND gin_dead ~ 'split';

SELECT * FROM gin_dead WHERE gin_dead ~ 'split';

Back to Caltech documentation

@@ Line 223: / Line 223: @@
 ='''How this all works'''=
+'''For updating from nightly gene nameserver dumps:'''
+'''For updating from nightly geneace dumps:'''

Difference between revisions of "WBGene information and status pipeline"

Revision as of 16:55, 1 November 2013

Contents

Table Summarizing Current/Future Postgres Population

Previous (pre-nameserver move) Scripts:

New (post-nameserver move) Scripts:

How this all works

Some Relevant Postgres Queries:

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools