Difference between revisions of "Textpresso-based automated extraction of concise descriptions"
Line 29: | Line 29: | ||
9. Tissue expression (may include life-stage) <br \> | 9. Tissue expression (may include life-stage) <br \> | ||
10. Sub-cellular localization (may include life-stage) <br \> | 10. Sub-cellular localization (may include life-stage) <br \> | ||
+ | |||
+ | ==Data mining (mining data from Postgres and/or Acedb)== | ||
+ | 1. Molecular identity and Orthology data (Homology, Orthology and Paralog data) | ||
+ | *Non-caltech data, curated at Hinxton | ||
+ | *Ace tags: ?Gene Ortholog_other, Paralog | ||
+ | *Contact: Michael Paulini | ||
+ | |||
+ | 3. Processes | ||
+ | *Caltech data: GO Biological Process and Topic | ||
+ | *File for GO data (for protein-encoding genes only): | ||
+ | *GO OA, Postgres (PG) table name: | ||
+ | *Contact Person: Kimberly, Ranjana | ||
+ | |||
+ | *Topic OA, table name: | ||
+ | *Contact Person: Karen | ||
+ | |||
+ | 4. Pathway | ||
+ | *Caltech data: Pathways | ||
+ | *Pathway OA, PG Table name: | ||
+ | *Contact Person: Karen | ||
+ | |||
+ | 5. Mutant Phenotypes | ||
+ | *Caltech data | ||
+ | *Phenotype OA, PG table name: | ||
+ | Contact Person: Karen | ||
+ | |||
+ | 6. Genetic Interaction and 7. Physical Interaction | ||
+ | |||
+ | 8. Molecular Function | ||
+ | *Caltech data: GO Molecular Function | ||
+ | *File for GO data (for protein-encoding genes only): | ||
+ | *GO OA, PG table name: | ||
+ | *Contact Person: Kimberly, Ranjana | ||
+ | |||
+ | 9. Tissue expression and life stage | ||
+ | *Caltech data | ||
+ | *Expression OA, PG table name: | ||
+ | |||
+ | 10. Sub-cellular localization | ||
+ | *Caltech data | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
====Publications related to Text-mining methods==== | ====Publications related to Text-mining methods==== |
Revision as of 20:08, 27 May 2014
Contents
Generating gene sets with and without concise descriptions
Set of genes with a concise description
Query for all genes with a concise description from Postgres: Relevant postgres table names:
- con_wbgene: Stores the WBGene ID and gene names
- con_desctype: Type of description (relevant for us: Concise_description)
- con_desctext: Text of the concise description
Query for all WBGenes that have a concise description (in con_desctext AND con_desctype):
SELECT DISTINCT(con_wbgene) FROM con_wbgene WHERE joinkey IN (SELECT joinkey FROM con_desctext WHERE con_desctext IS NOT NULL) AND joinkey IN (SELECT joinkey FROM con_desctype WHERE con_desctype IS NOT NULL) ORDER BY con_wbgene;
- Number of genes with a concise description (as of 05.07.2014)=6,624
Set of genes with no concise description
Set of genes with no concise description and at least one published paper
Semantic categories targeted for extraction from the literature
1. Molecular identity
2. Orthology/Similarity
3. Processes
4. Pathways
5. Mutant Phenotypes
6. Genetic Interaction
7. Physical Interaction
8. Molecular Function
9. Tissue expression (may include life-stage)
10. Sub-cellular localization (may include life-stage)
Data mining (mining data from Postgres and/or Acedb)
1. Molecular identity and Orthology data (Homology, Orthology and Paralog data)
- Non-caltech data, curated at Hinxton
- Ace tags: ?Gene Ortholog_other, Paralog
- Contact: Michael Paulini
3. Processes
- Caltech data: GO Biological Process and Topic
- File for GO data (for protein-encoding genes only):
- GO OA, Postgres (PG) table name:
- Contact Person: Kimberly, Ranjana
- Topic OA, table name:
- Contact Person: Karen
4. Pathway
- Caltech data: Pathways
- Pathway OA, PG Table name:
- Contact Person: Karen
5. Mutant Phenotypes
- Caltech data
- Phenotype OA, PG table name:
Contact Person: Karen
6. Genetic Interaction and 7. Physical Interaction
8. Molecular Function
- Caltech data: GO Molecular Function
- File for GO data (for protein-encoding genes only):
- GO OA, PG table name:
- Contact Person: Kimberly, Ranjana
9. Tissue expression and life stage
- Caltech data
- Expression OA, PG table name:
10. Sub-cellular localization
- Caltech data
- Automatically generating gene summaries from biomedical literature.
Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B.
Pac Symp Biocomput. 2006:40-51.
PMID:17094226
- Generating gene summaries from biomedical literature: A study of semi-structured summarization
Xu Ling *, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, Bruce Schatz
Information Processing and Management 43 (2007) 1777–1791
Back To Concise Descriptions