Difference between revisions of "Textpresso-based automated extraction of concise descriptions"

From WormBaseWiki
Jump to navigationJump to search
Line 78: Line 78:
 
*Contact Person: Karen
 
*Contact Person: Karen
  
(Maybe leave out mutant phenotypes, as hard to tell which mutant phenotypes go with which processes?)
+
 
  
  

Revision as of 16:16, 28 May 2014

Generating gene sets with and without concise descriptions

Set of genes with a concise description

Query for all genes with a concise description from Postgres: Relevant postgres table names:

  • con_wbgene: Stores the WBGene ID and gene names
  • con_desctype: Type of description (relevant for us: Concise_description)
  • con_desctext: Text of the concise description

Query for all WBGenes that have a concise description (in con_desctext AND con_desctype):

SELECT DISTINCT(con_wbgene) FROM con_wbgene WHERE joinkey IN (SELECT joinkey FROM con_desctext WHERE con_desctext IS NOT NULL) AND joinkey IN (SELECT joinkey FROM con_desctype WHERE con_desctype IS NOT NULL) ORDER BY con_wbgene;

  • Number of genes with a concise description (as of 05.07.2014)=6,624

Set of genes with no concise description

Set of genes with no concise description and at least one published paper

Semantic categories in a Concise Description

1. Molecular identity
2. Orthology/Similarity
3. Processes
4. Pathways
5. Mutant Phenotypes
6. Genetic Interaction
7. Physical Interaction
8. Molecular Function
9. Tissue expression (may include life-stage)
10. Sub-cellular localization (may include life-stage)

Template for a Concise Description

Molecular identity
<Gene> encodes .....;
Orthology/Similarity
<Gene> is (orthologous, similar) to .....;
Process/Pathway
<Gene> is (required, functions, regulates, is involved in, is part of) ....., as mutants of <gene> exhibit <phenotypes>;
Genetic interaction with respect to Process or Pathway
<Gene> interacts genetically with <gene1, gene2> ..... in <Process, Pathway>;
Physical interaction
<Protein> physically interacts with (protein, DNA, RNA) .....;
Molecular Function
<Protein> has ..... activity in (in vitro, in vivo) assays;
Tissue Expression
<Gene/Protein> is expressed in ..... and expression in ..... is (positively, negatively) 
regulated by <Gene/Protein>.....;
Sub-cellular localization 
<Protein> is localized to <cellular component> and expression in <cellular component> 
is <positively, negatively> regulated by .....

Note: Not all descriptions may follow the exact order or choice of words.

Data mining (mining data from Postgres and/or Acedb) for the semantic categories

1. Molecular identity and Orthology data (Homology, Orthology and Paralog data)

  • Non-caltech data, curated at Hinxton
  • Ace tags: ?Gene Ortholog_other, Paralog
  • Contact: Michael Paulini

3. Processes

  • Caltech data: GO Biological Process and Topic
  • 2 sources: GO data and Topic data
  • GO OA, Postgres (PG) table name:
  • Contact Person: Kimberly, Ranjana
  • Topic OA:
  • OA field:Gene, PG table name:pro_wbgene
  • OA field:WBPaper, PG table name:pro_paper
  • Contact Person: Karen

4. Pathway (No database source for now?)


5. Mutant Phenotypes

  • Caltech data
  • Phenotype OA, PG table name:
  • Contact Person: Karen



6. Genetic Interaction and 7. Physical Interaction

8. Molecular Function

  • Caltech data: GO Molecular Function
  • File for GO data (for protein-encoding genes only):
  • GO OA, PG table name:
  • Contact Person: Kimberly, Ranjana

9. Tissue expression and life stage

  • Caltech data
  • Expression OA, PG table name:

10. Sub-cellular localization

  • Caltech data




Publications related to Text-mining methods

  • Automatically generating gene summaries from biomedical literature.

Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B.

Pac Symp Biocomput. 2006:40-51.

PMID:17094226

  • Generating gene summaries from biomedical literature: A study of semi-structured summarization

Xu Ling *, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, Bruce Schatz

Information Processing and Management 43 (2007) 1777–1791


Back To Concise Descriptions