Difference between revisions of "Textpresso-based automated extraction of concise descriptions"

From WormBaseWiki
Jump to navigationJump to search
Line 30: Line 30:
 
10. Sub-cellular localization  (may include life-stage) <br \>
 
10. Sub-cellular localization  (may include life-stage) <br \>
  
====Patterns of occurence====
+
====Publications related to Text-mining methods====
('....' denotes some words).
+
*Automatically generating gene summaries from biomedical literature.
  
'''Molecular identity''' <br \>
+
Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B.
<Gene> encodes ....
 
  
'''Orthology/Similarity''' <br \>
+
Pac Symp Biocomput. 2006:40-51.
<Gene> is (orthologous, similar) to ....
 
  
'''Process/Pathway''' <br \>
+
PMID:17094226
<Gene> is (required, functions, regulates, is involved in, is part of) ....
 
  
'''Genetic interaction with respect to Process or Pathway''' <br \>
+
*Generating gene summaries from biomedical literature: A study of semi-structured summarization
<Gene> interacts genetically with (gene1, gene2) in <Process, Pathway>
 
  
'''Physical interaction''' <br \>
+
Xu Ling *, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, Bruce Schatz
<Protein> physically interacts with (<protein>, DNA, RNA)
 
  
'''Molecular Function''' <br \>
+
Information Processing and Management 43 (2007) 1777–1791
<Protein> has .... activity in (in vitro, in vivo) assays
 
 
 
'''Tissue Expression''' <br \>
 
<Gene/Protein> is expressed in .... and expression in .... is (positively, negatively) regulated by <Gene/Protein>
 
 
 
'''Sub-cellular localization''' <br \>
 
<Gene/Protein> is localized to <cellular component> and expression in <cellular component> is (positively, negatively) regulated by ....
 

Revision as of 19:10, 13 May 2014

Generating gene sets with and without concise descriptions

Set of genes with a concise description

Query for all genes with a concise description from Postgres: Relevant postgres table names:

  • con_wbgene: Stores the WBGene ID and gene names
  • con_desctype: Type of description (relevant for us: Concise_description)
  • con_desctext: Text of the concise description

Query for all WBGenes that have a concise description (in con_desctext AND con_desctype):

SELECT DISTINCT(con_wbgene) FROM con_wbgene WHERE joinkey IN (SELECT joinkey FROM con_desctext WHERE con_desctext IS NOT NULL) AND joinkey IN (SELECT joinkey FROM con_desctype WHERE con_desctype IS NOT NULL) ORDER BY con_wbgene;

  • Number of genes with a concise description (as of 05.07.2014)=6,624

Set of genes with no concise description

Set of genes with no concise description and at least one published paper

Semantic categories targeted for extraction from the literature

1. Molecular identity
2. Orthology/Similarity
3. Processes
4. Pathways
5. Mutant Phenotypes
6. Genetic Interaction
7. Physical Interaction
8. Molecular Function
9. Tissue expression (may include life-stage)
10. Sub-cellular localization (may include life-stage)

Publications related to Text-mining methods

  • Automatically generating gene summaries from biomedical literature.

Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B.

Pac Symp Biocomput. 2006:40-51.

PMID:17094226

  • Generating gene summaries from biomedical literature: A study of semi-structured summarization

Xu Ling *, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, Bruce Schatz

Information Processing and Management 43 (2007) 1777–1791