Difference between revisions of "VariationConciseDescriptions"
Line 161: | Line 161: | ||
</pre> | </pre> | ||
− | ===Phenotype=== | + | ===Building Phenotype summary sentences=== |
'''Sources''' | '''Sources''' | ||
*phenotype ontology ftp://ftp.wormbase.org/pub/wormbase/releases/WS251/ONTOLOGY/phenotype_ontology.WS251.obo (correct for latest release) | *phenotype ontology ftp://ftp.wormbase.org/pub/wormbase/releases/WS251/ONTOLOGY/phenotype_ontology.WS251.obo (correct for latest release) |
Revision as of 04:34, 25 December 2015
Contents
- 1 Useful links
- 2 Variation concise descriptions
- 3 Building the summaries
- 4 Examples for concise descriptions for variations of different types
- 5 Prioritizing variations for automated descriptions
- 6 Semantic categories in an Automated Description
- 7 Protein domain mutations
- 8 Location of project-related files
- 9 Orthology/Homology
- 10 Order of sentences
- 11 Postgres sources
- 12 Preliminary results
- 13 Mapping of automated variation concise description data to OA fields
- 14 Tab-delimited file for OA insert
- 15 Directory structure for project
- 16 Inserting automated descriptions into postgres
- 17 Dumping to .ace
- 18 Tracking progress
- 19 Changes/Updates for each release
- 20 Issues to address
- 21 Automated descriptions software
- 22 Publications related to Text-mining methods
Useful links
app tables
variation model
Ranjana's wiki for creating automated concise descriptions of genes.
geneace upload info
geneace upload of nongene info
Variation concise descriptions
Human-readable summaries of alleles that include a description of its lesion, its effect on the gene's function, and resulting phenotypes. These descriptions aim to recreate summaries like those in the C. elegans I & II books and enhance them with up to date data. A first step is to make and display summaries for each variation; second step is to extract info and combine the summaries of a gene's variations and display them on the corresponding gene pages.
From C. elegans II e51 : paralysed kinky small irregular pharyngeal pumping able to lay eggs. ES3 ME0. NA > 30 (e450amber e312amber (non-null) e309 (see sup- 6) etc.; all similar to e51 or slightly weaker). See also e51, e328, e450, e973, e985, e2208, e2274 [C.elegansII] e51 : paralysed, kinky, small, irregular pharyngeal pumping; able to lay eggs. Ric, high acetylcholine levels; variable neuroanatomical defects.ES3 ME0. OA>30: e450amb, e312amb (non-null),e309 (suppressed by sup-6), s69, s178 etc. All alleles similar to e51 or slightly weaker.
MGI produces these summaries (do not know if they are automated): for MGI:95294
Mutations widely affect epithelial development. Null homozygote survival is strain dependent, with defects observed in skin, eye, brain, viscera, palate, tongue and other tisses. Other mutations produce an open eyed, curly whisker phenotype, while a dominant hypermorph yields a thickened epidermis.
Sample allele summaries:
ju2 is a null allele of syd-1(F32D2.5). The ju2 lesion is a nonsense point mutation that results in a truncation of all 3 SYD-1 isoforms. ju2 results in defects in axodendritic polarity of ASI and L1 DDs, neuron morphology of ASI but not DD or VD neurons, presynaptic component localization, synaptic remodeling of VDs in adults, and backward movement resulting in coiling. ju2 animals do not show defects in neurite development or postsynaptic component localization.
e1368 is a reduction-of-function/hypomorphic allele of the insulin/IGF receptor ortholog daf-2. The e1368 lesion is a missense mutation affecting 5 of 6 coding transcripts daf-2. e1368 affects many, but not all DAF-2-activity requiring processes. Specifically, e1368 disrupts DAF-2 processes of embryonic and larval development, formation of the developmentally arrested dauer larval stage (diapause), adult longevity, fat storage, salt chemotaxis learning, and stress resistance, including response to high temperature. e1368 mutants are temperature sensitive and are dauer constitutive at 22.5 deg. In addition, e1368 animals have extended life spans. e1368 animals do not show any defects in acetylcholine esterase activity, carbon dioxide avoidance, diacetyl chemotaxis, and DMPP response.
Building the summaries
Source files for project
- obo_name_variation tazendra home/postgres/work/pgpopulation/obo_oa_ontologies/geneace/obo_name_variation
- obo_data_variation tazendra home/postgres/work/pgpopulation/obo_oa_ontologies/geneace/obo_data_variation
- phenotype_ontology ftp://ftp.wormbase.org/pub/wormbase/releases/WS251/ONTOLOGY/phenotype_ontology.WS251.obo (correct for latest release)
- phenotype_association ftp://ftp.wormbase.org/pub/wormbase/releases/WS251/ONTOLOGY/phenotype_association.WS251.wb (correct for latest release)
- gene_association ftp://ftp.wormbase.org/pub/wormbase/releases/WS251/ONTOLOGY/gene_association.WS251.wb.c_elegans (correct for latest release)
- gin_seqname tazendra /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace
Building molecular summaries of variations
Sources for molecular summary
object | example | source file | as appears in source file | path to source file | instructions | |
variation | ju2 | obo_data_variation | \nname: "ju2"\ | tazendra /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace | ||
variation ID | WBVar00088136 | obo_data_variation | id: WBVar00088136\ | tazendra /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace | ||
variation-genes | syd-1 | obo_data_variation | \ngene: "WBGene00006363 syd-1"\ | tazendra /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace | ||
geneID | WBGene00006363 | obo_data_variation | \ngene: "WBGene00006363 syd-1"\ | tazendra /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace | ||
gene CDS | F35D2.5 | gin_seqname.pg | 00006363 F35D2.5 2015-12-09 20:00:37 | tazendra /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace | ||
variation-nature | loss-of-function | app_function | Loss_of_function_undetermined_extent | postgres based on WBVarID | will need to pool, shorten and comma separate when multiple values present | |
Molecular_change | nonsense | variation model tag Molecular_change | Nonsense | Mary Ann is helping with a source file | ||
phenotype | dauer constitutive | app_phenotype or GFF? | WBPhenotype:0000012 | postgres of GFF? | will need to 1. remove 'variant' from public name 2.choose closest parent term from phenotype.obo when multiple related child terms 3. comma separate multiple phenotypes |
|
transcript count | 3 | variation model tag Affects->Transcript | Mary Ann is helping with a source file | |||
Does not exhibit phenotype | neurite development | app_phenotype or GFF? | WBPhenotype:0000944 |
"ju2 is a null allele of syd-1(F32D2.5)"
<obo_data_variation:public_name> is a <app_nature> allele of <obo_data_variation:associated gene>.
The <allele> is a <Variation->Molecular_change> that results in <unsure of source> of <count of <Variation->Affects->"Transcript">.
<Allele> results in defects in <app_phenotype> where <app_not> is false>. <Allele> does not show defects in <app_not; app_phenotype>.
Variation description tags needed for the above text:
- Molecular_change tags:
Nonsense UNIQUE Amber_UAG Text #Evidence Ochre_UAA Text #Evidence Opal_UGA Text #Evidence Missense Text #Evidence // text fields stored details of codon change Silent Text #Evidence Splice_site Donor Text #Evidence Acceptor Text #Evidence Frameshift Text #Evidence // added sdm Readthrough Text #Evidence // klh WS228
e1368 is a reduction-of-function/hypomorphic allele of the insulin/IGF receptor ortholog daf-2. The e1368 lesion is a missense mutation affecting 5 of 6 coding transcripts daf-2. e1368 affects many, but not all DAF-2-activity requiring processes. Specifically, e1368 disrupts DAF-2 processes of embryonic and larval development, formation of the developmentally arrested dauer larval stage (diapause), adult longevity, fat storage, salt chemotaxis learning, and stress resistance, including response to high temperature. e1368 mutants are temperature sensitive and are dauer constitutive at 22.5 deg. In addition, e1368 animals have extended life spans. e1368 animals do not show any defects in acetylcholine esterase activity, carbon dioxide avoidance, diacetyl chemotaxis, and DMPP response.
Building gene process affect of allele
Sources
- obo_name_variation tazendra home/postgres/work/pgpopulation/obo_oa_ontologies/geneace/obo_name_variation
- gene_association ftp://ftp.wormbase.org/pub/wormbase/releases/WS251/ONTOLOGY/gene_association.WS251.wb.c_elegans (correct for latest release)
- gin_seqname tazendra /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace
Example gene_association file entry, column numbers are in < >
<1>DB <2>DB objct ID <WBGene> <3>DB object symbol <gene name> <4>Qualifier <5>GO ID <6>DB reference <Paper> <7>Evidence code <8>With of from <9>Aspect <10>DB object name <11>DB object synonym <12>DB object type <13>Taxon <14>Date <15>Assigned by Annotation extension <16>Gene Product Form ID WB WBGene00006831 unc-104 GO:0048490 WB_REF:WBPaper00045884|PMID:25329901 IMP WB:WBVar02141295 P C52E12.2|klp-1 gene taxon:6239 20141212 WB WB WBGene00019126 sam-4 GO:1903744 WB_REF:WBPaper00045884|PMID:25329901 IMP WB:WBVar02125688 P F59E12.11 gene taxon:6239 20141212 WB WB WBGene00017696 polk-1 GO:0042276 WB_REF:WBPaper00041255|PMID:22761594 IMP WB:WBVar01473736 P F22B7.6 gene taxon:6239 20150611 WB WB WBGene00013595 atg-4.1 GO:0006914 WB_REF:WBPaper00041282|PMID:22767594 IMP WB:WBVar01473704 P Y87G2A.3 gene taxon:6239 20140724 WB WB WBGene00006652 ttx-1 GO:0009792 WB_REF:WBPaper00040681|PMID:22298710 IMP WB:WBVar00603928 P Y113G7A.6 gene taxon:6239 20140408 WB WB WBGene00006652 ttx-1 GO:0045944 WB_REF:WBPaper00040681|PMID:22298710 IMP WB:WBVar00603924 P Y113G7A.6 gene taxon:6239 20140408 WB has_regulation_target<WB:WBGene00006894>,occurs_in<WBbt:0006754>,happens_during<GO:0009408> WB WBGene00011333 nrde-2 GO:0031048 WB_REF:WBPaper00040602|PMID:22231482 IMP WB:WBVar00601048 P T01E8.5 gene taxon:6239 20150715 WB WB WBGene00017066 maco-1 GO:0006935 WB_REF:WBPaper00038428|PMID:21589894 IMP WB:WBVar00597666|WB:WBVar00597667 P D2092.5 gene taxon:6239 20110823 WB WB WBGene00017066 maco-1 GO:0023041 WB_REF:WBPaper00038428|PMID:21589894 IMP WB:WBVar00597666|WB:WBVar00597667 P D2092.5 gene taxon:6239 20110824 WB
For all lines with "IMP" in column<7> Map column <8>WBVariation to <allele public name> - use obo_name_variation.pg at tazendra /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace WBVarID is in first column of geneace file, public_name is second column of geneace file when there are more than one WBVarIDs in column <8>, map other WBVarIDs and create a separate summary for those objects Map column <5>GO:ID to GO name - use obo_goidprocess at tazendra /home/postgres/work/pgpopulation/obo_oa_ontologies GO:ID in line id: <GO:ID#######> name in line name: <GO name> If GO:term name starts with a qualifiers "negative" or "positive", replace qualifier with "the" Display Template: "<allele public name affects <3> function in <GO name>." Italicize <allele public name> and <3> object Example: *''js901'' affects ''unc-104'' function in anterograde synaptic vesicle transport. Based on <8>WBVar02141295 ->js901, <3> unc-104, <5>GO:00048490 <br> *''js415'' affects ''sam-4'' function in the regulation of anterograde synaptic vesicle transport. Based on <8>WBVar02125688 -> js415, <3> sam-4, <5>GO:1903744 -> positive regulation of anterograde synaptic vesicle transport -> Replace "positive" with "the" -> the regulation of anterograde synaptic vesicle transport<br> *'''lf29''' affects polk-1 function in error-prone translesion synthesis. *'''bp501''' affects '''atg-4.1''' function in autophagy. *'''ns260''' affects '''ttx-1''' function in embryo development ending in birth or egg hatching *'''ns235''' affects '''ttx-1''' function in the regulation of transcription from RNA polymerase II promoter *'''gg91''' affects '''nrde-2''' function in chromatin silencing by small RNA For the last two lines in the gff file above, <br> 1. there are two variations listed in column <8>: in this case make a summary for each variation<br> 2. the genes and alleles are the same in each line: in this case concatenate GO:IDs, comma separate or join with “and”<br> *'''nj21''' affects '''maco-1''' function in chemotaxis and neuronal signal transduction *'''nj34''' affects '''maco-1''' function in chemotaxis and neuronal signal transduction
Building Phenotype summary sentences
Sources
- phenotype ontology ftp://ftp.wormbase.org/pub/wormbase/releases/WS251/ONTOLOGY/phenotype_ontology.WS251.obo (correct for latest release)
- phenotype association file ftp://ftp.wormbase.org/pub/wormbase/releases/WS251/ONTOLOGY/phenotype_association.WS251.wb
- obo_name_variation tazendra home/postgres/work/pgpopulation/obo_oa_ontologies/geneace/obo_name_variation
From phenotype association file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 WB WBGene00000898 daf-2 WBPhenotype:0001682 WB:WBVar00143949 IMP WB:WBPerson261 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 NOT WBPhenotype:0001688 WB:WBVar00143949 IMP WB:WBPerson261 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 WBPhenotype:0000190 WB_REF:WBPaper00002149 IMP WB:WBVar00088561 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 NOT WBPhenotype:0001660 WB_REF:WBPaper00006052 IMP WB:WBVar00088561 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 WBPhenotype:0000136 WB_REF:WBPaper00046188 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 WBPhenotype:0000631 WB_REF:WBPaper00036280 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 WBPhenotype:0000637 WB_REF:WBPaper00038179 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 WBPhenotype:0001184 WB_REF:WBPaper00038379 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 WBPhenotype:0001351 WB_REF:WBPaper00038379 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 WBPhenotype:0001861 WB_REF:WBPaper00041295 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 NOT WBPhenotype:0000114 WB_REF:WBPaper00039871 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 NOT WBPhenotype:0000481 WB_REF:WBPaper00037970 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 NOT WBPhenotype:0000637 WB_REF:WBPaper00038179 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 NOT WBPhenotype:0001470 WB_REF:WBPaper00037970 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 NOT WBPhenotype:0001987 WB_REF:WBPaper00001039 IMP WB:WBVar00143947 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 WBPhenotype:0001740 WB_REF:WBPaper00032067 IMP WB:WBVar00143947|WB:WBVar00143949 P Y55D5A.5 gene taxon:6239 20151027 WB WB WBGene00000898 daf-2 WBPhenotype:0001999 WB_REF:WBPaper00037970 IMP WB:WBVar00143947|WB:WBVar00143949 P Y55D5A.5 gene taxon:6239 20151027 WB
Instructions
1. For each line, extract
- variation(column 6 or 8),
- phenotype (column 5)
- paper(column 6)
- NOT(column 4)- if present.
- ex. "WB:WBVar00143947" "WBPhenotype:0001861" "WBPaper00041295"
- ex. "WB:WBVar00143947" "NOT" "WBPhenotype:0000114" "WBPaper00039871"
2. When more than one variation exists in a line create a new line and use only one variation, with same phenotype, NOT(if exists), and paper as original line
3. Pool all non-NOT phenotypes for a given variation
4. Pool all NOT phenotype for a given variation
5. Map WBVariationID to Variation public_name using obo_name_variation
6. Map WBPhenotype to Phenotype name using phenotype_ontology.WS251.obo
Examples for concise descriptions for variations of different types
- Classic alleles
notes: can add: mutagen, history of isolation
- Alleles with molecular data
e1368 is a reduction-of-function/hypomorphic allele of the insulin/IGF receptor ortholog daf-2. The e1368 lesion is a missense mutation affecting 5 of 6 coding transcripts daf-2. e1368 affects many, but not all DAF-2-activity requiring processes. Specifically, e1368 disrupts DAF-2 processes of embryonic and larval development, formation of the developmentally arrested dauer larval stage (diapause), adult longevity, fat storage, salt chemotaxis learning, and stress resistance, including response to high temperature. e1368 mutants are temperature sensitive and are dauer constitutive at 22.5 deg. In addition, e1368 animals have extended life spans. e1368 animals do not show any defects in acetylcholine esterase activity, carbon dioxide avoidance, diacetyl chemotaxis, and DMPP response.
- Alleles with only genetic data
- Other variation types
- Engineered alleles
- Integrated transgenes
other categories of alleles
- Most published alleles
- Alleles with most phenotypes
- Alleles of genes with concise descriptions (see concise description wiki to find these)
Prioritizing variations for automated descriptions
- Classic alleles
Variations with Variation_type Allele (see Variation model)
- Alleles with molecular data
***Variation_type Allele + Type_of_mutation exists
- Alleles with only genetic data
***Variation_type Allele and NO Type_of_mutation exists
- Variations of genes with concise descriptions
**Variation Affects Gene + gene exists with cns_summary table entry; acedb model Gene->Gene_info->Concise_description
- Other variation types
**Variation_type SNP, Confirmed_SNP **Variation_type Transposons, RFLPs
- Engineered variations
- Variation_type Engineered_allele
- Integrated transgenes with allele name
- Variations that have been most published
- textpresso search results
- Variations with most phenotypes
- app_variation count
Semantic categories in an Automated Description
Molecular
- Rationale: Molecular details
- Example:
- Model tags:
- Source files:
- Template sentence:
<Variation> is a(n) <Variation_type> in the <Species> <Gene>. This variation results in a <molecular summary> <Type_of_mutation> in the gene.
Genetic
- Gene function (null, hypomorph, etc. ->app_func)
- Allele nature (recessive, dominant, etc. ->app_nature)
- Phenotype
- Affects tissue expression, subcellular localization
- Affects gene regulation
- Affects gene interaction
- Orthology to human gene mutations related to disease
Protein domain mutations
see Caltech group meeting May 7, 2015
connections between mutations and protein domains, and predict affects on function
We currently do not capture mutations in the context of affecting a conserved amino acid - how and who would do this? Can Hinxton generate these?
Many examples can be found with the Textpresso search for 'mutation conserved'
Examples
id : WBPaper00037661 name : WBPaper00037661 title : Sequential action of Caenorhabditis elegans Rab GTPases regulates phagolysosome formation during apoptotic cell degradation. Sequencing of rab-14 in qx18 mutants revealed a C to T transition, which resulted in substitution of the Threonine at codon 67 with Methionine (ACG > ATG; T67M). This mutation affects the phosphate/Mg2+ binding domain PM3, which is conserved in all members of the Ras GTPase superfamily
title : SYD-1, a presynaptic protein with PDZ, C2 and rhoGAP-like domains, specifies axon identity in C. elegans. id : WBPaper00005543 pdf : 5543_Hallam02.pdf syd-1(GAPdeletion) mutation interferes with neurite outgrowth. construct with various missense mutations as well as a deletion in the conserved rhoGAP domain in syd-1 were made and assessed for a phenotype in transgenic animals.
id : WBPaper00027028 name : WBPaper00027028 title : Conditional dominant mutations in the Caenorhabditis elegans gene act-2 identify cytoplasmic and muscle roles for a redundant actin isoform. semidominant and embryonic-lethal mutations in the C. elegans act-2 gene. These mutations alter conserved amino acids in the predicted ATP binding pocket of actin and promote contractile instabilities and ectopic furrowing in early embryonic cells, implicating ACT-2 as a cytoplasmic actin.
Title: The Caenorhabditis elegans Iodotyrosine Deiodinase Ortholog SUP-18 Functions through a Conserved Channel SC-Box to Regulate the Muscle Two-Pore Domain Potassium Channel SUP-9 . Authors: de la Cruz IP ; Ma L ; Horvitz HR Journal: PLoS Genet Year: 2014-02 Doc ID: WBPaper00044940 SECTION: discussion. Five other sup-18 mutations affecting highly conserved residues in the NADH oxidase / flavin reductase domain also behave like null mutations , consistent with the hypothesis that SUP-18 enzymatic activity is essential for its function. SECTION: discussion. While Kvb2 knockout mice have seizures and reduced lifespans , mice carrying a catalytic null mutation in Kvb2 have a wild-type phenotype , suggesting that if an enzymatic activity for Kvb2 exists , it is functionally dispensable SUP-18 Interacts with a Two-Pore Domain K + Channel PLOS Genetics | www . plosgenetics . org 11 February 2014 | Volume 10 | Issue 2 | e1004175 in vivo [ 59 ] . IYDs across metazoan species share a similar enzymatic activity in reductive deiodination of diiodotyrosine [51], and it seems likely that SUP-18 acts similarly in C. elegans. Like mammalian IYDs, SUP-18 contains a presumptive N-terminal transmembrane domain that is required for full activity. Interestingly, the SUP-18 intracellular region lacking the transmembrane domain could still partially activate the SUP-9 channel, suggesting that membrane association is not absolutely required for SUP-9 activation by SUP-18. Membrane association is important for mammalian IYD enzymatic activities [5,52,53].
Reverse in vitro mutation analysis of elegans mutation on mammalian disease gene. Title: Introduction of a loss-of-function point mutation from the SH3 region of the Caenorhabditis elegans sem-5 gene activates the transforming ability of c-abl in vivo and abolishes binding of proline-rich ligands in vitro . Authors: Van Etten RA ; Debnath J ; Zhou H ; Casasnovas JM Journal: Oncogene Year: 1995-05-18 Doc ID: WBPaper00002191 When the n1619 mutation , which confers a lethal and highly penetrant vulvaless phenotype in C . elegans , is introduced into the c-abl SH3 domain , substituting a leucine for proline at AN amino acid number 131 , the resulting mutant transforms NIH3T3 fibroblasts with an efficiency about 10 % that of SH3-deleted c-abl .
Title: CED-9 and mitochondrial homeostasis in C . elegans muscle . Authors: Tan FJ ; Husain M ; Manlandro CM ; Koppenol M ; Fire AZ ; Hill RB Journal: J Cell Sci Year: 2008-10-15 Doc ID: WBPaper00032231 SECTION: results. This allele encodes a mutation where glycine 169 in the BH3 binding pocket is replaced with glutamate ( Fig . 4C ) ( Hengartner and Horvitz , 1994a ) , which inhibits EGL-1 from binding and triggering a conformational change in CED-9 ( del Peso et al . , 2000 ; Yan et al . , 2004 ) SECTION: results. In the gain-of-function ced-9 ( n1950sd ) allele , glycine 169 , which resides in the CED-9 BH3 binding pocket , is mutated to glutamate ( G169E ) . [Field: results, subscore: 3.00] SECTION: results. To test whether co-expression of DRP-1 modulates CED-9 via interactions with the BH3 binding pocket , we first created a construct corresponding to the ced- 9 ( n1950gf ) allele .
Schwartz, 2010, WBPaper00036020 “Since the HMT-1 polypeptide of gk161 allele lacked TMD and NBD that are required for the function of ABC transporters, we used this strain in our studies.” gk161 is hypersensitive to cadmium
WBPaper0002481 Wang 1996 identifies unc-86 binding sites in mec-3 promoter region, with accompanying evaluation of phenotypes resulting from mec-3 mutations in these regions “UNC-86 binding is blocked by certain mutations, as described above. When met-3-lacZ fusions with UNC-86 site mutations were introduced into C. elegans, mutations in Region III had a strong effect on expression, mutations in Region II had a significant effect, and mutations in Region I had no detectable effect”
WBPaper00029156 Modzelewska 2007 “the sy262 mutation lies within the Rac GEF Dbl domain , it is possible that the mutation acts through one or both of the GTPases .” “In conjunction with molecular modeling , our data suggest that the C . elegans mutation as well as an equivalent mutation in human SOS1 activate the MAPK pathway by disrupting an auto-inhibitory function of the Dbl domain on Ras activation “ “A mutation equivalent to sy262 G322R activates hSOS1.” “...in every experiment we found that at some time point, hSOS1 C282R dis- played two- to fourfold more activity than wild-type hSOS1. In conjunction with our genetic data, these data suggest that the G322R change in C. elegans SOS-1, as well as the equivalent C282R change in hSOS1, does indeed enhance EGF-depen- dent MAPK activation.”
Title: The genetics of ivermectin resistance in Caenorhabditis elegans . Authors: Dent JA ; Smith M ; Vassilatis DK ; Avery L Journal: Proc Natl Acad Sci U S A Year: 2000-03-14 Doc ID: WBPaper00003954 results. The region surrounding the conserved valine ( bold ) that is mutated to a glutamate in the ad1302 allele ( resulting from a T to A mutation in the second base of the V60 codon ) is shown lined up with the corresponding region in other GluCl subunits and in the rat glycine and -aminobutyric acid ( GABA ) type A- channel subunits ( 29 , 30 ) . [Field: results]
Title: POP-1 controls axis formation during early gonadogenesis in C . elegans . Authors: Siegfried KR ; Kimble J Journal: Development Year: 2002-01 Doc ID: WBPaper00005116 SECTION: abstract. The pop-1 ( q624 ) allele is weakly penetrant for multiple defects and appears to be a partial loss-of-function mutation ; pop-1 ( q624 ) alters a conserved amino acid in the HMG-box DNA binding domain . [Field: abstract] SECTION: discussion. This mutation alters a conserved amino acid in the HMG box DNA binding domain which is conserved specifically in TCF / LEF- 1 type HMG proteins ( Laudet et al . , 1993 ) , suggesting that the pop-1 ( q624 ) mutation may affect either recognition of the TCF / LEF-1 consensus sequence or DNA binding affinity , thereby lowering POP-1 activity . [Field: discussion] SECTION: discussion. As the only mutation isolated thus far in a developmental system that changes a highly conserved amino acid in the -catenin binding domain , the pop-1 ( q645 ) missense mutation may shed new light on TCF / LEF-1 function during development . [Field: discussion] SECTION: introduction. The pop-1 ( q624 ) allele is weakly penetrant for multiple defects and appears to be a partial loss-offunction mutation ; pop-1 ( q624 ) alters a conserved amino acid in the HMG-box DNA binding domain . [Field: introduction] SECTION: results. The pop-1 ( q645 ) mutation carries a nucleotide substitution predicted to change an aspartic acid ( D ) to a glutamic acid ( E ) ( Fig . 2B ) ; this mutation resides within the pop-1 -catenin binding domain and alters an amino acid conserved in all known TCF / LEF-1 proteins , including nematode , fly , and vertebrate homologues . [Field: results] SECTION: results. The pop-1 ( q624 ) mutation possesses a nucleotide change in the region encoding the HMG box ; the predicted amino acid change in this case also affects a conserved amino acid ( Fig . 2C ) . [Field: results]
on Texptresso-dev
on postgres
- The latest dump:
- Variation concise description pipeline:
- Scripts:
- Output location:
Orthology/Homology
see Caltech group meeting May 7, 2015
- to connect conserved/syntenic mutations
- link elegans gene variations and phenotypes to homologous human disease gene variations
- to link elegans mutations as a disease model ex. pdr-1 mutations used to model juvenile parkinsons
use ortholog(s) of gene defined by Generation_of_automated_descriptions#Orthology.2FHomology Orthology, Homology and Paralog data in WormBase
Order of sentences
- Molecular
- Protein effect
- Relation to human gene disruptions
- Phenotype
- Function
Postgres sources
Source files for Tissue expression data for gene concise description *Source 1: Expression data *OA (exprpat), PG table names: **for all these tables where exp_endogenous table value 'endogenous', grab the exp_gene, WBGeneXXXXXXXX **exp_name, values look like Expr1005. **exp_anatomy for anatomy terms, in the form of Wbt:0003679, translate these IDs using the anatomy ontology file **anatomy ontology file from ftp site:ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS244/ONTOLOGY/anatomy_ontology.WS244.obo **exp_paper for paper **exp_qualifier for the qualifiers 'certain', 'uncertain' and 'partial'. *Contact Person: Daniela
Preliminary results
Mapping of automated variation concise description data to OA fields
OA field number |
OA field name | Data to be inserted | Example of data to be inserted |
Required or Not | OA table name |
---|---|---|---|---|---|
1 | WBVariation | Variation | WBVar00145853 OR gk448 | Required | vcd_variation |
2 | Species | Species | Onchocerca volvulus | Required | vcd_species |
3 | Curator | Name of Curator | James Done(first then replace with) Karen Yook (insert for all rows) |
Required | vcd_curator |
4 | Curator History | Name of Curator | same as pgid (insert for all rows) |
Required | vcd_curhistory |
5 | Description Type | Automated_concise_description (insert for all rows) |
Automated_concise_description | Required | vcd_desctype |
6 | Description Text | the automated concise description | asp-19 encodes an ortholog... | Required | vcd_desctext |
7 | Reference | WBPaper | WBPaper00026979 | Required | vcd_paper |
8 | Last Updated | Date when the descriptions were last generated |
2014-09-11 | Required | vcd_lastupdate |
9 | pgid | pgid | 1149 (Postgres will generate) |
Required |
Tab-delimited file for OA insert
(for gene concise desc, for reference)
- One tab-delimited file per species
- Order of the data will be: WBGene, Date, Paper, Accession_evidence, Automated_concise_description, Species, Inferred_automatically text
- Format: tab-delimited file, comma separate the values when multiple values are present
- Date is the last date that the script was run to generate the automated descriptions (eg. 2014-05-28)
- File will be placed on textpresso-dev to be picked up by a cron job by JC
Directory structure for project
Use same structure as for gene concise descriptions (which follows)
- http://textpresso-dev.caltech.edu/concise_descriptions/ Top level parent directory for project
- http://textpresso-dev.caltech.edu/concise_descriptions/production_release.txt Indicates what release the file corresponds to
- http://textpresso-dev.caltech.edu/concise_descriptions/species.txt Indicates the different species we are producing description files for
- http://textpresso-dev.caltech.edu/concise_descriptions/release/WS247/c_elegans/descriptions/OA_concise_descriptions.WS247.txt WS247 elegans file for import into OA
Inserting automated descriptions into postgres
Populating script
Scripts for automated concise descriptions (for reference)
- /home/acedb/ranjana/concise_testing/populate_automated_concise_descriptions.pl -> /home/postgres/work/pgpopulation/concise_description/20140909_automated_concise/populate_automated_concise_descriptions.pl
which look at
- http://textpresso-dev.caltech.edu/concise_descriptions/production_release.txt for release number
- http://textpresso-dev.caltech.edu/concise_descriptions/species.txt for the different species
For testing on Mangolassi
Dumping to .ace
Tracking progress
Generate a report for numbers and place on Textpresso-dev
- Report for each upload:
- Total number of automated variation descriptions =
- Number of automated descriptions with molecular details =
- Number of automated descriptions with gene function/GO information =
- Number of automated descriptions with phenotype information =
- Number of automated descriptions with human disease reference =
Changes/Updates for each release
Issues to address
Automated descriptions software
Follow gene concise description pipeline Documentation for workflow and scripts
- Automatically generating gene summaries from biomedical literature. Ling X, Jiang J, He X, Mei Q, Zhai C, Schatz B. Pac Symp Biocomput. 2006:40-51. PMID:17094226
- Generating gene summaries from biomedical literature: A study of semi-structured summarization. Xu Ling *, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, Bruce Schatz