Difference between revisions of "WBGene information and status pipeline"
From WormBaseWiki
Jump to navigationJump to searchLine 11: | Line 11: | ||
!Use - Dumping scripts -- could be wrong, but I don't think any gin_ tables are used in dumping scripts since we store WBGene IDs. except maybe gin_dead if people want those suppressed or to have some kind of error message or to map to Historical_gene or something like that) | !Use - Dumping scripts -- could be wrong, but I don't think any gin_ tables are used in dumping scripts since we store WBGene IDs. except maybe gin_dead if people want those suppressed or to have some kind of error message or to map to Historical_gene or something like that) | ||
!Use - Protein2GO data conversion | !Use - Protein2GO data conversion | ||
+ | !Use - GSA Markup | ||
!Comment | !Comment | ||
|- | |- | ||
Line 23: | Line 24: | ||
| | | | ||
| Yes | | Yes | ||
+ | | | ||
| | | | ||
|- | |- | ||
Line 35: | Line 37: | ||
| | | | ||
| No | | No | ||
+ | | | ||
| | | | ||
|- | |- | ||
Line 47: | Line 50: | ||
| | | | ||
| No | | No | ||
+ | | | ||
| | | | ||
|- | |- | ||
Line 59: | Line 63: | ||
| | | | ||
| No | | No | ||
+ | | | ||
| | | | ||
|- | |- | ||
Line 71: | Line 76: | ||
| Yes | | Yes | ||
| Yes | | Yes | ||
+ | | | ||
| | | | ||
|- | |- | ||
Line 84: | Line 90: | ||
| | | | ||
| | | | ||
+ | | | ||
|- | |- | ||
| Split_into | | Split_into | ||
Line 95: | Line 102: | ||
| Historical_gene tag? | | Historical_gene tag? | ||
| | | | ||
+ | | | ||
| | | | ||
|- | |- | ||
Line 104: | Line 112: | ||
| Yes | | Yes | ||
| ''Confirm'' | | ''Confirm'' | ||
+ | | | ||
| | | | ||
| | | | ||
Line 116: | Line 125: | ||
| Yes | | Yes | ||
| ''Confirm'' | | ''Confirm'' | ||
+ | | | ||
| | | | ||
| | | | ||
Line 131: | Line 141: | ||
| | | | ||
| Yes, but we'll need isoform data in WB | | Yes, but we'll need isoform data in WB | ||
+ | | | ||
| | | | ||
|- | |- | ||
Line 143: | Line 154: | ||
| | | | ||
| Maybe | | Maybe | ||
+ | | | ||
| | | | ||
|- | |- | ||
Line 151: | Line 163: | ||
| | | | ||
| | | | ||
+ | | | ||
| | | | ||
| | | | ||
Line 167: | Line 180: | ||
| | | | ||
| Yes, to make sure we don't attach GO annotations to pseudogenes. | | Yes, to make sure we don't attach GO annotations to pseudogenes. | ||
+ | | | ||
| One use case would be to know when genes change class, e.g. CDS ->Pseudogene. We may not need to actually store this in postgres, though. | | One use case would be to know when genes change class, e.g. CDS ->Pseudogene. We may not need to actually store this in postgres, though. | ||
|- | |- | ||
Line 179: | Line 193: | ||
| Not if also in Other_name | | Not if also in Other_name | ||
| No | | No | ||
+ | | | ||
| I think we can now ignore the Public_name tag as long as there's always an Other_name value as well -- so if there is no Other_name then we'd look at Public_name ? looking at the script, we're not doing anything with this value) | | I think we can now ignore the Public_name tag as long as there's always an Other_name value as well -- so if there is no Other_name then we'd look at Public_name ? looking at the script, we're not doing anything with this value) | ||
|- | |- |
Revision as of 19:48, 25 September 2013
Contents
Table Summarizing Current/Future Postgres Population
AceDB tag | Postgres table | Current - Nameserver nightly dump | Current - WS bimonthly release | Future - Geneace nightly dump | Future - WS bimonthly release | Use - Paper or meeting abstract gene connection | Use - OA data type curation | Use - Dumping scripts -- could be wrong, but I don't think any gin_ tables are used in dumping scripts since we store WBGene IDs. except maybe gin_dead if people want those suppressed or to have some kind of error message or to map to Historical_gene or something like that) | Use - Protein2GO data conversion | Use - GSA Markup | Comment |
---|---|---|---|---|---|---|---|---|---|---|---|
WBGene identifier | gin_wbgene | Yes | Yes | Yes | Yes | Yes | |||||
CGC_name | gin_locus | Yes | If it has this tag, gene is considered good (What does 'good' mean?) | Yes | Yes | Yes | No | ||||
Other_name | gin_synonyms | No | Yes | Yes | No | Yes | Yes | No | |||
Sequence_name | gin_seqname | Yes | No | Yes | Yes | Yes | No | ||||
Status | gin_dead | Yes | only if value is dead and species ~ elegans$ | Yes | only if value is dead | Yes | Yes | Yes | Yes | ||
Merged_into | gin_dead | No | Yes | Yes | No | Historical_gene tag? | |||||
Split_into | gin_dead | No | Yes | Yes | No | Historical_gene tag? | |||||
Corresponding_transcript | gin_sequence | No | Yes | No | Yes | Confirm | |||||
Corresponding_CDS | gin_sequence + gin_seqprot | No | Yes | No | Yes | Confirm | |||||
Corresponding_protein | gin_protein, gin_seqprot | No | Yes | No | Yes | Confirm | Yes, but we'll need isoform data in WB | ||||
Molecular_name | gin_molname | No | Yes | No | Yes | Yes | No | Maybe | |||
Species | This could perhaps be used to populate a future species tag for papers, but this is not an immediate need. Other use cases? | ||||||||||
Version_change | No | Yes, to make sure we don't attach GO annotations to pseudogenes. | One use case would be to know when genes change class, e.g. CDS ->Pseudogene. We may not need to actually store this in postgres, though. | ||||||||
Public_name | gin_wbgene | Yes (but only when no CGC_name or Sequence_name) | If it has this tag, gene is considered good | Don't need (Public_name also in Other_name - confirm this is always the case) | No | Not if also in Other_name | Not if also in Other_name | Not if also in Other_name | No | I think we can now ignore the Public_name tag as long as there's always an Other_name value as well -- so if there is no Other_name then we'd look at Public_name ? looking at the script, we're not doing anything with this value) |
Current Scripts:
- /home/acedb/cron/populate_gin_locus.pl
- /home/acedb/cron/populate_gin.pl
New Scripts:
Some Relevant Postgres Queries:
SELECT * FROM gin_dead WHERE gin_dead ~ 'merged' AND gin_dead ~ 'split';
SELECT * FROM gin_dead WHERE gin_dead ~ 'split';
Back to Caltech documentation