Difference between revisions of "Textpresso-based automated extraction of concise descriptions"

From WormBaseWiki
Jump to navigationJump to search
(Blanked the page)
 
(132 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Generating gene sets with and without concise descriptions==
 
  
====Set of genes with a concise description====
 
Query for all genes with a concise description from Postgres:
 
Relevant postgres table names:
 
*con_wbgene: Stores the WBGene ID and gene names
 
*con_desctype: Type of description (relevant for us: Concise_description)
 
*con_desctext: Text of the concise description
 
 
Query for all WBGenes that have a concise description (in con_desctext AND con_desctype):
 
 
SELECT DISTINCT(con_wbgene) FROM con_wbgene WHERE joinkey IN (SELECT joinkey FROM con_desctext WHERE con_desctext IS NOT NULL) AND joinkey IN (SELECT joinkey FROM con_desctype WHERE con_desctype IS NOT NULL) ORDER BY con_wbgene;
 
 
*Number of genes with a concise description (as of 05.07.2014)=6,624
 
 
====Set of genes with no concise description====
 
====Set of genes with no concise description and at least one published paper====
 
 
==Location of project-related files on Textpresso==
 
http://textpresso-dev.caltech.edu/concise_descriptions/
 
 
 
==Semantic categories in a Concise Description==
 
1. Molecular identity <br \>
 
2. Orthology/Similarity <br \>
 
3. Mutant Phenotypes <br \>
 
4. Processes <br \>
 
5. Pathways <br \>
 
6. Genetic Interaction<br \>
 
7. Physical Interaction <br \>
 
8. Gene regulation data <br \>
 
9. Molecular Function <br \>
 
10. Tissue expression  (may include life-stage) <br \>
 
11. Sub-cellular localization  (may include life-stage) <br \>
 
 
==Template for a Concise Description==
 
'''Molecular identity'''
 
<Gene> encodes .....;
 
'''Orthology/Similarity'''
 
<Gene> is (orthologous, similar) to .....;
 
Phenotypes
 
<Gene> mutants exhibit the following phenotypes, <phenotypes>.
 
'''Process/Pathway'''
 
<Gene> is (required, functions, regulates, is involved in, is part of) ....., as mutants of <gene> exhibit <phenotypes>;
 
'''Genetic interaction with respect to Process or Pathway'''
 
<Gene> interacts genetically with <gene1, gene2> ..... in <Process, Pathway>;
 
'''Physical interaction'''
 
<Protein> physically interacts with (protein, DNA, RNA) .....;
 
'''Molecular Function'''
 
<Protein> has ..... activity in (in vitro, in vivo) assays;
 
'''Tissue Expression'''
 
<Gene/Protein> is expressed in ..... and expression in ..... is (positively, negatively)
 
regulated by <Gene/Protein>.....;
 
'''Sub-cellular localization'''
 
<Protein> is localized to <cellular component> and expression in <cellular component>
 
is <positively, negatively> regulated by .....
 
 
Note: Not all descriptions may follow the exact order or choice of words.
 
 
==Data mining (mining data from Postgres and/or Acedb) for the semantic categories==
 
1. Molecular identity and Orthology data (Homology, Orthology and Paralog data)
 
*Non-caltech data, curated at Hinxton
 
*Ace tags: ?Gene Ortholog_other, Paralog
 
*Contact: Michael Paulini
 
 
3. Processes
 
*Caltech data
 
*Source 1: GO data in postgres
 
**Paper -- gop_paper
 
**WBGene -- gop_wbgene
 
**GO -- gop_goontology
 
**GO Term -- gop_goid
 
*Source 2: GO file from Protein2GO called gp_association.ace
 
**
 
*Source 3: gene_association file
 
**ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/ONTOLOGY/
 
**File name:gene_association.WS243.wb.c_elegans
 
**Rows with a 'P' in column 9 indicates GO Biological Process associated with a gene.
 
*Contact Person: Kimberly, Ranjana
 
 
*Source 4: Topic data
 
**OA field: Gene, PG table name:pro_wbgene
 
**OA field:WBPaper, PG table name:pro_paper
 
*Contact Person: Karen
 
 
4. Pathway
 
(No database source for now?)
 
 
 
5. Mutant Phenotypes
 
*Caltech data
 
*Source 1: Phenotype OA, PG table name:(Phenotypes are added to variation and not genes)
 
*Source 2: Acedb tag: Under ?Gene, Reference_allele ?Variation and Allele ?Variation and Under ?Variation Phenotype
 
*Source 3: phenotype_association file:ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/ONTOLOGY/
 
**File name:phenotype_association.WS243.wb
 
*Contact Person: Karen
 
 
6. Genetic Interaction and 7. Physical Interaction
 
*Caltech data
 
*Source 1: gene_association.wb
 
**ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/ONTOLOGY/
 
**File name:gene_association.WS243.wb.c_elegans
 
**Rows with a 'IGI' in column 7 indicate a physical interaction between the WBgenes in column 2/3 and column 8
 
**Rows with a 'IPI" in column 7 indicate a genetic interaction between the WBgenes in column 2/3 and column 8
 
*Source 2: Interaction OA and tables
 
**"Field Name" = Postgres Table:
 
**"Paper" = int_paper
 
**"Interaction Type" = int_type
 
**"Bait overlapping gene" = int_genebait
 
**"Target overlapping gene" = int_genetarget
 
**"Non-directional Gene(s)" = int_genenondir
 
**"Effector Gene(s)" = int_geneone
 
**"Affected Gene(s)" = int_genetwo
 
 
Example statements:
 
 
If int_type = "Physical"
 
 
<int_genebait> interacts physically with <int_genetarget> (and vice versa)
 
 
 
If int_type = "Genetic - Synthetic ( Synthetic )"
 
 
<int_genenondir> interacts with <other int_genenondir(s)> in a synthetic
 
genetic interaction
 
 
 
If int_type = "Genetic - Suppression ( Suppression )"
 
 
<int_geneone> genetically suppresses <int_genetwo>
 
 
 
8. Gene regulation data
 
*Caltech data
 
*Source: Gene regulation data in genereg OA
 
**Positive_regulate Anatomy_term "<grg_pos_anatomy>"
 
**Positive_regulate Life_stage "<grg_pos_lifestage>"
 
**Positive_regulate Subcellular_localization "<grg_pos_scl>"
 
**Positive_regulate Subcellular_localization_text "<grg_pos_scltext>"
 
**Negative_regulate Anatomy_term "<grg_neg_anatomy>"
 
**Negative_regulate Life_stage "<grg_neg_lifestage>"
 
**Negative_regulate Subcellular_localization "<grg_neg_scl>"
 
**Negative_regulate Subcellular_localization_text "<grg_neg_scltext>"
 
**Does_not_regulate Anatomy_term "<grg_not_anatomy>"
 
**Does_not_regulate Life_stage "<grg_not_lifestage>"
 
**Does_not_regulate Subcellular_localization "<grg_subcellloc>"
 
**Does_not_regulate Subcellular_localization_text "<grg_not_scltext>"
 
**Trans_regulated_gene "<grg_transregulated>"
 
**Trans_regulator_gene "<grg_transregulator>"
 
**No Subdata Result "<grg_result>"
 
 
9. Molecular Function
 
*Caltech data: GO Molecular Function
 
*Source 1: GO OA, PG table name:
 
*Source 3: gene_association.WSXXX.c_elegans file:
 
**ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/ONTOLOGY/
 
**Rows with a 'F' in column 9 indicates GO molecular function associated with a gene.
 
*Contact Person: Kimberly, Ranjana
 
 
10. Tissue expression and life stage
 
*Caltech data
 
*Source 1: Expression data
 
*OA (exprpat), PG table names:
 
**exp_anatomy for anatomy terms
 
**exp_goid for subcell localization
 
**exp_lifestage for life stage
 
**exp_paper for paper
 
**exp_gene for gene
 
*Contact Person: Daniela
 
 
11. Sub-cellular localization
 
*Caltech data
 
*Source 1: GO cellular component
 
*Source 2: gene_association file
 
**ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/ONTOLOGY/
 
**File name:gene_association.WS243.wb.c_elegans
 
 
==Publications related to Text-mining methods==
 
*Automatically generating gene summaries from biomedical literature.
 
 
Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B.
 
 
Pac Symp Biocomput. 2006:40-51.
 
 
PMID:17094226
 
 
*Generating gene summaries from biomedical literature: A study of semi-structured summarization
 
 
Xu Ling *, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, Bruce Schatz
 
 
Information Processing and Management 43 (2007) 1777–1791
 
 
 
Back To [[Concise Descriptions]]
 

Latest revision as of 23:01, 10 September 2014