Difference between revisions of "Textpresso-based automated extraction of concise descriptions"
Line 52: | Line 52: | ||
Note: Not all descriptions may follow the exact order or choice of words. | Note: Not all descriptions may follow the exact order or choice of words. | ||
− | ==Data mining (mining data from Postgres and/or Acedb)== | + | ==Data mining (mining data from Postgres and/or Acedb) for the semantic categories== |
1. Molecular identity and Orthology data (Homology, Orthology and Paralog data) | 1. Molecular identity and Orthology data (Homology, Orthology and Paralog data) | ||
*Non-caltech data, curated at Hinxton | *Non-caltech data, curated at Hinxton | ||
Line 65: | Line 65: | ||
*Topic OA: | *Topic OA: | ||
− | *OA field: Gene PG table name: pro_wbgene | + | *OA field:Gene, PG table name:pro_wbgene |
− | *OA field: WBPaper PG table name: pro_paper | + | *OA field:WBPaper, PG table name:pro_paper |
*Contact Person: Karen | *Contact Person: Karen | ||
4. Pathway | 4. Pathway | ||
− | + | (No database source for now?) | |
− | + | ||
− | |||
5. Mutant Phenotypes | 5. Mutant Phenotypes | ||
Line 78: | Line 77: | ||
*Phenotype OA, PG table name: | *Phenotype OA, PG table name: | ||
*Contact Person: Karen | *Contact Person: Karen | ||
+ | |||
+ | (Maybe leave out mutant phenotypes, as hard to tell which mutant phenotypes go with which processes?) | ||
+ | |||
6. Genetic Interaction and 7. Physical Interaction | 6. Genetic Interaction and 7. Physical Interaction |
Revision as of 22:21, 27 May 2014
Contents
Generating gene sets with and without concise descriptions
Set of genes with a concise description
Query for all genes with a concise description from Postgres: Relevant postgres table names:
- con_wbgene: Stores the WBGene ID and gene names
- con_desctype: Type of description (relevant for us: Concise_description)
- con_desctext: Text of the concise description
Query for all WBGenes that have a concise description (in con_desctext AND con_desctype):
SELECT DISTINCT(con_wbgene) FROM con_wbgene WHERE joinkey IN (SELECT joinkey FROM con_desctext WHERE con_desctext IS NOT NULL) AND joinkey IN (SELECT joinkey FROM con_desctype WHERE con_desctype IS NOT NULL) ORDER BY con_wbgene;
- Number of genes with a concise description (as of 05.07.2014)=6,624
Set of genes with no concise description
Set of genes with no concise description and at least one published paper
Semantic categories in a Concise Description
1. Molecular identity
2. Orthology/Similarity
3. Processes
4. Pathways
5. Mutant Phenotypes
6. Genetic Interaction
7. Physical Interaction
8. Molecular Function
9. Tissue expression (may include life-stage)
10. Sub-cellular localization (may include life-stage)
Template for a Concise Description
Molecular identity <Gene> encodes .....; Orthology/Similarity <Gene> is (orthologous, similar) to .....; Process/Pathway <Gene> is (required, functions, regulates, is involved in, is part of) ....., as mutants of <gene> exhibit <phenotypes>; Genetic interaction with respect to Process or Pathway <Gene> interacts genetically with <gene1, gene2> ..... in <Process, Pathway>; Physical interaction <Protein> physically interacts with (protein, DNA, RNA) .....; Molecular Function <Protein> has ..... activity in (in vitro, in vivo) assays; Tissue Expression <Gene/Protein> is expressed in ..... and expression in ..... is (positively, negatively) regulated by <Gene/Protein>.....; Sub-cellular localization <Protein> is localized to <cellular component> and expression in <cellular component> is <positively, negatively> regulated by .....
Note: Not all descriptions may follow the exact order or choice of words.
Data mining (mining data from Postgres and/or Acedb) for the semantic categories
1. Molecular identity and Orthology data (Homology, Orthology and Paralog data)
- Non-caltech data, curated at Hinxton
- Ace tags: ?Gene Ortholog_other, Paralog
- Contact: Michael Paulini
3. Processes
- Caltech data: GO Biological Process and Topic
- 2 sources: GO data and Topic data
- GO OA, Postgres (PG) table name:
- Contact Person: Kimberly, Ranjana
- Topic OA:
- OA field:Gene, PG table name:pro_wbgene
- OA field:WBPaper, PG table name:pro_paper
- Contact Person: Karen
4. Pathway (No database source for now?)
5. Mutant Phenotypes
- Caltech data
- Phenotype OA, PG table name:
- Contact Person: Karen
(Maybe leave out mutant phenotypes, as hard to tell which mutant phenotypes go with which processes?)
6. Genetic Interaction and 7. Physical Interaction
8. Molecular Function
- Caltech data: GO Molecular Function
- File for GO data (for protein-encoding genes only):
- GO OA, PG table name:
- Contact Person: Kimberly, Ranjana
9. Tissue expression and life stage
- Caltech data
- Expression OA, PG table name:
10. Sub-cellular localization
- Caltech data
- Automatically generating gene summaries from biomedical literature.
Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B.
Pac Symp Biocomput. 2006:40-51.
PMID:17094226
- Generating gene summaries from biomedical literature: A study of semi-structured summarization
Xu Ling *, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, Bruce Schatz
Information Processing and Management 43 (2007) 1777–1791
Back To Concise Descriptions