WBGene information and status pipeline
From WormBaseWiki
Revision as of 18:46, 18 October 2013 by Vanaukenk (talk | contribs) (→Table Summarizing Current/Future Postgres Population)
Contents
Table Summarizing Current/Future Postgres Population
AceDB tag | Postgres table | Current - Nameserver nightly dump | Current - WS bimonthly release | Future - Nameserver nightly dump | Future - Geneace nightly dump | Future - WS bimonthly release | Use - Paper or meeting abstract gene connection | Use - OA data type curation | Use - Dumping scripts -- could be wrong, but I don't think any gin_ tables are used in dumping scripts since we store WBGene IDs. except maybe gin_dead if people want those suppressed or to have some kind of error message or to map to Historical_gene or something like that) | Use - Protein2GO data conversion | Use - GSA Markup | Comment | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WBGene identifier | gin_wbgene | Yes | Yes (First line in each entry) | Yes | Yes | Yes | ||||||||||||||||
CGC_name | gin_locus | Yes (CGC) | If it has this tag, gene is considered good (What does 'good' mean?) | Yes (CGC) | No | No | Yes | No | ||||||||||||||
Other_name | gin_synonyms | Yes (checking for different CGC name - see lines 132-137 in script) | Yes | Yes (checking for different CGC name) | Yes - add to what is already populated from nightly nameserver | No | Yes | No | ||||||||||||||
Sequence_name | gin_seqname | Yes (Sequence) | No | Yes (Sequence) | No | No | Yes | No | ||||||||||||||
Status | gin_dead | Yes (0) | only if value is dead and species ~ elegans$ | Yes (0) | only if value is dead in the nameserver nightly; populate with Merged_into and Split_into values | No | Yes | Yes | Yes | |||||||||||||
Suppressed | _ | Merged_into | gin_dead | No | Yes | No | Yes - add only when status is dead from nightly nameserver | No | Historical_gene tag uses this when dumping files | |||||||||||||
Split_into | gin_dead | No | Yes | No | Yes - add only when status is dead from nightly nameserver | No | ||||||||||||||||
Corresponding_transcript | gin_sequence | No | Yes | No | No | Yes | ||||||||||||||||
Corresponding_CDS | gin_sequence + gin_seqprot | No | Yes | No | No | Yes | ||||||||||||||||
Corresponding_protein | gin_protein, gin_seqprot | No | Yes | No | No | Yes | Yes, but we'll need isoform data in WB | |||||||||||||||
Molecular_name | gin_molname | No | Yes | No | No | Yes | No | Maybe | ||||||||||||||
Species | This could perhaps be used to populate a future species tag for papers, but this is not an immediate need. Other use cases? | |||||||||||||||||||||
Version_change | No | Yes, to make sure we don't attach GO annotations to pseudogenes. | One use case would be to know when genes change class, e.g. CDS ->Pseudogene. We may not need to actually store this in postgres, though. | |||||||||||||||||||
Public_name | gin_wbgene | Yes (but only when no CGC_name or Sequence_name) | If it has this tag, gene is considered good | Don't need (Public_name also in Other_name - confirm this is always the case) | No | Not if also in Other_name | Not if also in Other_name | Not if also in Other_name | No | I think we can now ignore the Public_name tag as long as there's always an Other_name value as well -- so if there is no Other_name then we'd look at Public_name ? looking at the script, we're not doing anything with this value) |
Previous (pre-nameserver move) Scripts:
- /home/acedb/cron/populate_gin_locus.pl - updates information from nameserver nightly dumps
- /home/acedb/cron/populate_gin.pl - updates information from WS releases
New (post-nameserver move) Scripts:
- /home/postgres/work/pgpopulation/obo_oa_ontologies/ws_updates/populate_pg_from_ws.pl
Some Relevant Postgres Queries:
SELECT * FROM gin_dead WHERE gin_dead ~ 'merged' AND gin_dead ~ 'split';
SELECT * FROM gin_dead WHERE gin_dead ~ 'split';
Back to Caltech documentation