|
|
(142 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
− | ==Generating gene sets with and without concise descriptions==
| |
| | | |
− | ====Set of genes with a concise description====
| |
− | Query for all genes with a concise description from Postgres:
| |
− | Relevant postgres table names:
| |
− | *con_wbgene: Stores the WBGene ID and gene names
| |
− | *con_desctype: Type of description (relevant for us: Concise_description)
| |
− | *con_desctext: Text of the concise description
| |
− |
| |
− | Query for all WBGenes that have a concise description (in con_desctext AND con_desctype):
| |
− |
| |
− | SELECT DISTINCT(con_wbgene) FROM con_wbgene WHERE joinkey IN (SELECT joinkey FROM con_desctext WHERE con_desctext IS NOT NULL) AND joinkey IN (SELECT joinkey FROM con_desctype WHERE con_desctype IS NOT NULL) ORDER BY con_wbgene;
| |
− |
| |
− | *Number of genes with a concise description (as of 05.07.2014)=6,624
| |
− |
| |
− | ====Set of genes with no concise description====
| |
− | ====Set of genes with no concise description and at least one published paper====
| |
− |
| |
− | ==Semantic categories in a Concise Description==
| |
− |
| |
− | 1. Molecular identity <br \>
| |
− | 2. Orthology/Similarity <br \>
| |
− | 3. Processes <br \>
| |
− | 4. Pathways <br \>
| |
− | 5. Mutant Phenotypes <br \>
| |
− | 6. Genetic Interaction<br \>
| |
− | 7. Physical Interaction <br \>
| |
− | 8. Molecular Function <br \>
| |
− | 9. Tissue expression (may include life-stage) <br \>
| |
− | 10. Sub-cellular localization (may include life-stage) <br \>
| |
− |
| |
− | ==Template for a Concise Description==
| |
− | '''Molecular identity'''
| |
− | <Gene> encodes .....;
| |
− | '''Orthology/Similarity'''
| |
− | <Gene> is (orthologous, similar) to .....;
| |
− | '''Process/Pathway'''
| |
− | <Gene> is (required, functions, regulates, is involved in, is part of) ....., as mutants of <gene> exhibit <phenotypes>;
| |
− | '''Genetic interaction with respect to Process or Pathway'''
| |
− | <Gene> interacts genetically with <gene1, gene2> ..... in <Process, Pathway>;
| |
− | '''Physical interaction'''
| |
− | <Protein> physically interacts with (protein, DNA, RNA) .....;
| |
− | '''Molecular Function'''
| |
− | <Protein> has ..... activity in (in vitro, in vivo) assays;
| |
− | '''Tissue Expression'''
| |
− | <Gene/Protein> is expressed in ..... and expression in ..... is (positively, negatively)
| |
− | regulated by <Gene/Protein>.....;
| |
− | '''Sub-cellular localization'''
| |
− | <Protein> is localized to <cellular component> and expression in <cellular component>
| |
− | is <positively, negatively> regulated by .....
| |
− |
| |
− | Note: Not all descriptions may follow the exact order or choice of words.
| |
− |
| |
− | ==Data mining (mining data from Postgres and/or Acedb) for the semantic categories==
| |
− | 1. Molecular identity and Orthology data (Homology, Orthology and Paralog data)
| |
− | *Non-caltech data, curated at Hinxton
| |
− | *Ace tags: ?Gene Ortholog_other, Paralog
| |
− | *Contact: Michael Paulini
| |
− |
| |
− | 3. Processes
| |
− | *Caltech data
| |
− | *2 sources: GO data and Topic data
| |
− | *Sources for GO data: GO OA and GO file from Protein2GO
| |
− | *GO OA, Postgres (PG) table name:
| |
− | *Easier to use the gene_association file:ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/ONTOLOGY/
| |
− | *File name:gene_association.WS243.wb.c_elegans
| |
− | *Rows with a 'P' in column 9 indicates GO Biological Process associated with a gene.
| |
− | *Contact Person: Kimberly, Ranjana
| |
− |
| |
− | *Topic OA:
| |
− | *OA field:Gene, PG table name:pro_wbgene
| |
− | *OA field:WBPaper, PG table name:pro_paper
| |
− | *Contact Person: Karen
| |
− |
| |
− | 4. Pathway
| |
− | (No database source for now?)
| |
− |
| |
− |
| |
− | 5. Mutant Phenotypes
| |
− | *Caltech data
| |
− | *Phenotype OA, PG table name:(Phenotypes are added to variation and not genes)
| |
− | *Acedb tag: Under ?Gene, Reference_allele ?Variation and Allele ?Variation and Under ?Variation Phenotype
| |
− | *Might be easier to use the phenotype_association file:ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/ONTOLOGY/
| |
− | *File name:phenotype_association.WS243.wb
| |
− |
| |
− | *Contact Person: Karen
| |
− |
| |
− | 6. Genetic Interaction and 7. Physical Interaction
| |
− | *Caltech data
| |
− | *Source: Interaction OA and tables
| |
− | *"Field Name" = Postgres Table:
| |
− | *"Paper" = int_paper
| |
− | *"Interaction Type" = int_type
| |
− | *"Bait overlapping gene" = int_genebait
| |
− | *"Target overlapping gene" = int_genetarget
| |
− | *"Non-directional Gene(s)" = int_genenondir
| |
− | *"Effector Gene(s)" = int_geneone
| |
− | *"Affected Gene(s)" = int_genetwo
| |
− |
| |
− | Example statements:
| |
− |
| |
− | If int_type = "Physical"
| |
− |
| |
− | <int_genebait> interacts physically with <int_genetarget> (and vice versa)
| |
− |
| |
− |
| |
− | If int_type = "Genetic - Synthetic ( Synthetic )"
| |
− |
| |
− | <int_genenondir> interacts with <other int_genenondir(s)> in a synthetic
| |
− | genetic interaction
| |
− |
| |
− |
| |
− | If int_type = "Genetic - Suppression ( Suppression )"
| |
− |
| |
− | <int_geneone> genetically suppresses <int_genetwo>
| |
− |
| |
− |
| |
− | 8. Molecular Function
| |
− | *Caltech data: GO Molecular Function
| |
− | *File for GO data (for protein-encoding genes only):
| |
− | *GO OA, PG table name:
| |
− | *Might be easier to use the gene_association.WSXXX.c_elegans file:Index of ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/ONTOLOGY/
| |
− | *Rows with a 'F' in column 9 indicates GO molecular function associated with a gene.
| |
− | *Contact Person: Kimberly, Ranjana
| |
− |
| |
− |
| |
− | 9. Tissue expression and life stage
| |
− | *Caltech data
| |
− | *Expression OA (exprpat), PG table names:
| |
− | *tables are:
| |
− | *exp_anatomy for anatomy terms
| |
− | *exp_goid for subcell localization
| |
− | *exp_lifestage for life stage
| |
− | *exp_paper for paper
| |
− | *exp_gene for gene
| |
− | *Contact Person: Daniela
| |
− |
| |
− |
| |
− | 10. Sub-cellular localization
| |
− | *Caltech data
| |
− | *Source: GO cellular component
| |
− | *Source 1-ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/ONTOLOGY/
| |
− | *File name:gene_association.WS243.wb.c_elegans
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | ====Publications related to Text-mining methods====
| |
− | *Automatically generating gene summaries from biomedical literature.
| |
− |
| |
− | Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B.
| |
− |
| |
− | Pac Symp Biocomput. 2006:40-51.
| |
− |
| |
− | PMID:17094226
| |
− |
| |
− | *Generating gene summaries from biomedical literature: A study of semi-structured summarization
| |
− |
| |
− | Xu Ling *, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, Bruce Schatz
| |
− |
| |
− | Information Processing and Management 43 (2007) 1777–1791
| |
− |
| |
− |
| |
− | Back To [[Concise Descriptions]]
| |