Difference between revisions of "WBGene information and status pipeline"

From WormBaseWiki
Jump to navigationJump to search
Line 11: Line 11:
 
!Use - Dumping scripts -- could be wrong, but I don't think any gin_ tables are used in dumping scripts since we store WBGene IDs. except maybe gin_dead if people want those suppressed or to have some kind of error message or to map to Historical_gene or something like that)
 
!Use - Dumping scripts -- could be wrong, but I don't think any gin_ tables are used in dumping scripts since we store WBGene IDs. except maybe gin_dead if people want those suppressed or to have some kind of error message or to map to Historical_gene or something like that)
 
!Use - Protein2GO data conversion
 
!Use - Protein2GO data conversion
 +
!Use - GSA Markup
 
!Comment
 
!Comment
 
|-
 
|-
Line 23: Line 24:
 
|  
 
|  
 
| Yes
 
| Yes
 +
|
 
|
 
|
 
|-
 
|-
Line 35: Line 37:
 
|     
 
|     
 
| No
 
| No
 +
|
 
|
 
|
 
|-
 
|-
Line 47: Line 50:
 
|  
 
|  
 
| No
 
| No
 +
|
 
|
 
|
 
|-
 
|-
Line 59: Line 63:
 
|  
 
|  
 
| No
 
| No
 +
|
 
|
 
|
 
|-
 
|-
Line 71: Line 76:
 
| Yes
 
| Yes
 
| Yes
 
| Yes
 +
|
 
|
 
|
 
|-
 
|-
Line 84: Line 90:
 
|  
 
|  
 
|  
 
|  
 +
|
 
|-
 
|-
 
| Split_into
 
| Split_into
Line 95: Line 102:
 
| Historical_gene tag?
 
| Historical_gene tag?
 
|  
 
|  
 +
|
 
|  
 
|  
 
|-
 
|-
Line 104: Line 112:
 
| Yes
 
| Yes
 
| ''Confirm''
 
| ''Confirm''
 +
|
 
|
 
|
 
|
 
|
Line 116: Line 125:
 
| Yes
 
| Yes
 
| ''Confirm''
 
| ''Confirm''
 +
|
 
|
 
|
 
|
 
|
Line 131: Line 141:
 
|
 
|
 
| Yes, but we'll need isoform data in WB
 
| Yes, but we'll need isoform data in WB
 +
|
 
|
 
|
 
|-
 
|-
Line 143: Line 154:
 
|
 
|
 
| Maybe
 
| Maybe
 +
|
 
|
 
|
 
|-
 
|-
Line 151: Line 163:
 
|
 
|
 
|  
 
|  
 +
|
 
|
 
|
 
|
 
|
Line 167: Line 180:
 
|
 
|
 
| Yes, to make sure we don't attach GO annotations to pseudogenes.
 
| Yes, to make sure we don't attach GO annotations to pseudogenes.
 +
|
 
| One use case would be to know when genes change class, e.g. CDS ->Pseudogene.  We may not need to actually store this in postgres, though.
 
| One use case would be to know when genes change class, e.g. CDS ->Pseudogene.  We may not need to actually store this in postgres, though.
 
|-
 
|-
Line 179: Line 193:
 
| Not if also in Other_name
 
| Not if also in Other_name
 
| No
 
| No
 +
|
 
| I think we can now ignore the Public_name tag as long as there's always an Other_name value as well -- so if there is no Other_name then we'd look at Public_name ? looking at the script, we're not doing anything with this value)
 
| I think we can now ignore the Public_name tag as long as there's always an Other_name value as well -- so if there is no Other_name then we'd look at Public_name ? looking at the script, we're not doing anything with this value)
 
|-
 
|-

Revision as of 19:48, 25 September 2013

Table Summarizing Current/Future Postgres Population

AceDB tag Postgres table Current - Nameserver nightly dump Current - WS bimonthly release Future - Geneace nightly dump Future - WS bimonthly release Use - Paper or meeting abstract gene connection Use - OA data type curation Use - Dumping scripts -- could be wrong, but I don't think any gin_ tables are used in dumping scripts since we store WBGene IDs. except maybe gin_dead if people want those suppressed or to have some kind of error message or to map to Historical_gene or something like that) Use - Protein2GO data conversion Use - GSA Markup Comment
WBGene identifier gin_wbgene Yes Yes Yes Yes Yes
CGC_name gin_locus Yes If it has this tag, gene is considered good (What does 'good' mean?) Yes Yes Yes No
Other_name gin_synonyms No Yes Yes No Yes Yes No
Sequence_name gin_seqname Yes No Yes Yes Yes No
Status gin_dead Yes only if value is dead and species ~ elegans$ Yes only if value is dead Yes Yes Yes Yes
Merged_into gin_dead No Yes Yes No Historical_gene tag?
Split_into gin_dead No Yes Yes No Historical_gene tag?
Corresponding_transcript gin_sequence No Yes No Yes Confirm
Corresponding_CDS gin_sequence + gin_seqprot No Yes No Yes Confirm
Corresponding_protein gin_protein, gin_seqprot No Yes No Yes Confirm Yes, but we'll need isoform data in WB
Molecular_name gin_molname No Yes No Yes Yes No Maybe
Species This could perhaps be used to populate a future species tag for papers, but this is not an immediate need. Other use cases?
Version_change No Yes, to make sure we don't attach GO annotations to pseudogenes. One use case would be to know when genes change class, e.g. CDS ->Pseudogene. We may not need to actually store this in postgres, though.
Public_name gin_wbgene Yes (but only when no CGC_name or Sequence_name) If it has this tag, gene is considered good Don't need (Public_name also in Other_name - confirm this is always the case) No Not if also in Other_name Not if also in Other_name Not if also in Other_name No I think we can now ignore the Public_name tag as long as there's always an Other_name value as well -- so if there is no Other_name then we'd look at Public_name ? looking at the script, we're not doing anything with this value)

Current Scripts:

  1. /home/acedb/cron/populate_gin_locus.pl
  2. /home/acedb/cron/populate_gin.pl


New Scripts:

Some Relevant Postgres Queries:

SELECT * FROM gin_dead WHERE gin_dead ~ 'merged' AND gin_dead ~ 'split';

SELECT * FROM gin_dead WHERE gin_dead ~ 'split';


Back to Caltech documentation