Difference between revisions of "Textpresso-based automated extraction of concise descriptions"

From WormBaseWiki
Jump to navigationJump to search
(Blanked the page)
 
(148 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Generating gene sets with and without concise descriptions==
 
  
====Set of genes with a concise description====
 
Query for all genes with a concise description from Postgres:
 
Relevant postgres table names:
 
*con_wbgene: Stores the WBGene ID and gene names
 
*con_desctype: Type of description (relevant for us: Concise_description)
 
*con_desctext: Text of the concise description
 
 
Query for all WBGenes that have a concise description (in con_desctext AND con_desctype):
 
 
SELECT DISTINCT(con_wbgene) FROM con_wbgene WHERE joinkey IN (SELECT joinkey FROM con_desctext WHERE con_desctext IS NOT NULL) AND joinkey IN (SELECT joinkey FROM con_desctype WHERE con_desctype IS NOT NULL) ORDER BY con_wbgene;
 
 
*Number of genes with a concise description (as of 05.07.2014)=6,624
 
 
====Set of genes with no concise description====
 
====Set of genes with no concise description and at least one published paper====
 
 
==Semantic categories targeted for extraction from the literature==
 
 
1. Molecular identity <br \>
 
2. Orthology/Similarity <br \>
 
3. Processes <br \>
 
4. Pathways <br \>
 
5. Mutant Phenotypes <br \>
 
6. Genetic Interaction<br \>
 
7. Physical Interaction <br \>
 
8. Molecular Function <br \>
 
9. Tissue expression  (may include life-stage) <br \>
 
10. Sub-cellular localization  (may include life-stage) <br \>
 
 
==Data mining (mining data from Postgres and/or Acedb)==
 
1. Molecular identity and Orthology data (Homology, Orthology and Paralog data)
 
*Non-caltech data, curated at Hinxton
 
*Ace tags: ?Gene Ortholog_other, Paralog
 
*Contact: Michael Paulini
 
 
3. Processes
 
*Caltech data: GO Biological Process and Topic
 
*2 sources: GO data and Topic data
 
*GO OA, Postgres (PG) table name:
 
*Contact Person: Kimberly, Ranjana
 
 
*Topic OA:
 
*OA field: Gene PG table name: pro_wbgene
 
*OA field: WBPaper PG table name: pro_paper
 
*Contact Person: Karen
 
 
4. Pathway
 
*Caltech data: Pathways
 
*Pathway OA, PG Table name:
 
*Contact Person: Karen
 
 
5. Mutant Phenotypes
 
*Caltech data
 
*Phenotype OA, PG table name:
 
*Contact Person: Karen
 
 
6. Genetic Interaction and 7. Physical Interaction
 
 
8. Molecular Function
 
*Caltech data: GO Molecular Function
 
*File for GO data (for protein-encoding genes only):
 
*GO OA, PG table name:
 
*Contact Person: Kimberly, Ranjana
 
 
9. Tissue expression and life stage
 
*Caltech data
 
*Expression OA, PG table name:
 
 
10. Sub-cellular localization
 
*Caltech data
 
 
 
 
 
 
 
 
====Publications related to Text-mining methods====
 
*Automatically generating gene summaries from biomedical literature.
 
 
Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B.
 
 
Pac Symp Biocomput. 2006:40-51.
 
 
PMID:17094226
 
 
*Generating gene summaries from biomedical literature: A study of semi-structured summarization
 
 
Xu Ling *, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, Bruce Schatz
 
 
Information Processing and Management 43 (2007) 1777–1791
 
 
 
Back To [[Concise Descriptions]]
 

Latest revision as of 23:01, 10 September 2014