back to WormBase_Models
Proposed model changes
Purpose: the technology of engineering mutations and gene replacement has been developed in C. elegans. With these new research tools, capture and display of the molecular information of these alleles needs to be updated. We propose a new class, Construct, to capture the specifics of the DNA tool used to perform the replacement or engineering, while the Variation model gets updated to record the engineering event itself and its impact on the genome. As a side benefit to the creation of the Construct model, we can use this new class to also capture the genomic arrays used create transgenes.
?Variation Variation_type Engineered_allele Variation_summary //to house final engineered construct Derived_from ?Construct XREF Variation Method Homologous_recombination NHEJ MosSci Cas9 Crispr Expr_pattern ?Expr_pattern XREF Variation #Evidence
notes on variation model changes
NOTE: the variation model currently has the following tags
- Nature_of_variation UNIQUE
- Polymorphic //would this be complex fusions and chimeras?
- Synthetic //Would this be simple fusions
Do you know what these were intended for? Could they be used to house engineered alleles?�
Mary Ann says "I do not know what the Nature_of_variation tag was intended for and, as I think you've noted, it is not populated. It might be that it was intended to be used to describe whether the variation is naturally occurring (polymorphic) or manmade (synthetic). If this is the case, then we have since adopted the use of the sub-tags SNP, Natural_variant or Allele (to the right of Variation_type) to indicate natural vs. manmade and I think it might be redundant to use the Nature_of_variation tag as well. As you've proposed to add the new Variation_type Engineered_allele I think this should be sufficient. We could then remove the Nature_of_variation tag. I would update e.g the Mos1 insertions to have Engineered_allele. They are currently have Variation_type Allele and Transposon_insertion."
?Construct Summary ?Text Sequence feature ?Feature XREF Construct Gene ?Gene Gene_site_targeted ?SO_term Fusion_reporter ?Text //fluorescent proteins GFP, RFP, mCherry, etc. Other_reporter ?Text //to add reporters, tags that aren’t included in model Purification_tag ?Text //FLAG, HA, Myc, TAP, etc. Type_of_construct Chimera Domain_swap Engineered mutation Fusion Complex // complex changes (e.g. GFP fusion plus point mutations) Transcriptional_fusion Translational_fusion N-terminal_translational_fusion C-terminal_translational_fusion Internal_coding_fusion Selection_marker ?Text //for unc-119(+), lin-15(+), drug selection Construction_summary ?Text //Backbone vector, mol bio Used_for Transgene_construction ?Transgene XREF Construct Variation ?Variation XREF Engineered_allele Reference ?Paper XREF Construct Person ?Person XREF Construct Laboratory ?Laboratory #Lab_Location Remark ?Text #Evidence
notes on construct model
Chris’ thoughts: We could add a tag to capture the entire DNA sequence of the object (DNA_text or something). Maybe we could also add a Species tag to capture which species’ sequences are incorporated. As per our discussion at group meeting, I think it would ultimately be good to distinguish sequences/features that drive transcription (e.g. promoter sequences, enhancers, etc.) from those that direct post-transcriptional regulation (3’ UTRs) and non-regulatory (backbone) sequences. This way ?Expr_pattern, for example, can pull the relevant sequence/feature info from the ?Construct object.
Daniela: for the backbone vector we could use the ?Clone class. I was looking up some example scenarios e.g. Expr1049_Ex and in the summary they use vector pPD95.67. Only problem is that I checked on the site and pPD95.67 exists but has no sequence info.
Mary Ann says
- Could Fusion_reporter be a controlled vocabulary? This makes searching easier. Likewise Purification_tag?
- Type_of_construct has Engineered mutation (I presume this should be Engineered_mutation). Could this be Engineered_allele so that it is consistent with the Variation_type in ?Variation.
- I do not understand the format of the Complex tag and the ones underneath (up to and including Internal_coding_fusion. Are these all different tags?
- Do you want to include the Strain in the Construct model? If so, then it might be good to have an Origin tag with the following sub-tags: Species, Laboratory and Person.
- I think it would be good to have Status.
- I like Chris' suggestion of capturing the entire DNA sequence. I was talking to Jonathan Ewbank at the Strasbourg meeting and he also suggested this (though he was thinking more of gbrowse representation). He suggested that if this was too big an overhead for us to do we could link to source databases which already display the DNA sequence. Something to think about.
- The model has no Name tag. Is this intentional?
Overall, I like the model and think it goes a long way to capturing the info. we need. I have no idea whether the model works! Have you tested it? Paul D will certainly want you to have done this prior to proposal - though I think he's seen earlier versions already.
dealing with precise ends
For mapping constructs to the genome - mainly for expression, but should also be used for rescuing constructs
Notes and thoughts for incorporation of precise ends objects into the construct class (Daniela):
We have approx 1000 objects with precise ends 'tags'.
Annotations have sometimes murky boundaries for sequences, especially very old annotations. no primer info. e.g.: Expr1265: All construct contains 3kb of 5'UTR. dys-1::gfp VIII: 3'end in exon 5. Other constructs end at exon 1 or 3. --precise ends.
Expr1275: [clk-1::gfp] translational fusion with clk-1 coding region and upstream gene toc-1 and 624bp 5' of the toc-1 start region. --precise ends no info on where the construct ends. Presumably stop codon?
Expr1049: [rgs-2::gfp] translational fusion. GFP reporter construct was constructed by inserting genomic DNA fragments from rgs-2 into the vector pPD95.67. The construct contained the promoter regions and 5' coding sequences of the RGS gene, such that a coding exon for the gene was fused in frame to the coding sequence for GFP. The rgs-2 transgene contained sequences from -4770 to +3592 (relative to the rgs-2 translation start), and thus included the large first intron of rgs-2. --precise ends.
They don't specify the promoter region. How can we map precisely to the sequence?
Issue of sequence coordinates varying with gene models. Something published 10 years ago should be remapped. Worth it? To cite Paul D: “..you need to establish what release of the database the data was generated against or have some other form of identifying how to correct the drift, once you can do that we can transform the coordinates forward if they are a large batch else you would have to do it manually.”
incorporating sequence features
Expr_pattern Feature Reporter gene Notes Expr11274 Feature : "ceh-13.enh450" WBsf919527 enh450 (23256 to 26172) was amplified by PCR using primers RP3Cel.H.do and RP3Cel.H.up for cloning into pMF1DH3 (pRK24) or pPD107.94 (pRK23), and primers RP3D.K.up and RP3D.B.do for cloning into pCb. Could define ceh-13.enh450::pMF1DH3 or ceh-13.enh450::pPD107.94. In this case in the construct model you need the WBsf -Feature-ceh-13.enh450- and the clone pMF1DH3 and pPD107.94 Expr11275 Feature : "ceh-13.enh3.4" WBsf919526 enh3.4 (nucleotide positions 23256 to 26644) was cut from pMF1 for cloning into pPD107.94 (pASF43), or PCR amplified using primers 3.3up and 3.3down for cloning into pCb. Could define ceh-13.enh3.4::pPD107.94. In this case in the construct model you need the WBsf -Feature-13.enh3.4- and the clone PD107.94 Expr11276 Feature : "ceh-13.enh740" WBsf919528 enh740 (nucleotide positions 24001 to 26644) was PCR amplified using primers CCCAAGCTTTCAGATCCCTCCACATGTC and TCTGGTAGACTGTGCAAGCAAC for cloning into pPD107.94 (pRK29) or primers GGGGTACCTCAGATCCCTCCACATGTC and CGGGATCCTGGATCTTAGGGAATTGTGG for cloning into pCb. Could define ceh-13.enh740::pRK29 and ceh-13.enh740::pCb. In this case in the construct model you need the WBsf -Feature-ceh-13.enh740::pRK29- and the clone pRK29 and pCb Expr11277 Feature : "ceh-22.proximal" ceh-22.proximal::(del)Pes-10::lacZ. Could define ceh-22.proximal::(del)Pes-10::lacZ. Expr11278 Feature : "ceh-22.PE1" WBTransgene00019185. [PE1::(del)pes-10::lacZ] In the construct model you need the WBsf -ceh-22.PE1- and the clone pPD95.21 ((del)Pes-10::lacZ). Expr11279 Feature : "ceh-22.pe39_pe41" WBTransgene00018710, WBTransgene00018711. [pe39::(del)pes-10::lacZ], [pe41::(del)pes-10::lacZ] In the construct model you need the WBsf -ceh-22.pe39_pe41- and the clone pPD95.21 ((del)Pes-10::lacZ). Expr11280 Feature : "ceh-22.pe27" WBTransgene00019186 ( WBTransgene00019186 ). [pe27::(del)pes-10::lacZ] In the construct model you need the WBsf -ceh-22.pe27- and the clone pPD95.21 ((del)Pes-10::lacZ). Expr11281 Feature : "ceh-24.vulval" The DNA sequence from Feature"ceh-24.vulval" was assayed upstream of a truncated pes-10 promoter fragment driving lacZ -pPD95.18. Could define ceh-24.vulval::pPD95.18. In this case in the construct model you need the WBsf -ceh-24.vulval- and the clone pPD95.18. Expr11282 Feature : "ceh-24.pm8" The DNA sequences from Feature"ceh-24.pm8" was assayed in front of a truncated myo-2 promoter -pPD95.62. Could define ceh-24.pm8::pPD95.62. In this case in the construct model you need the WBsf -ceh-24.pm8- and the clone pPD95.62. Expr11283 Feature : "egl-17.vulDC" A 64-bp fragment, located between 366 and 303 bp upstream of the egl-17 ATG was inserted into the pPD122.53 vector, which contains the minimal pes-10 promoter. Could define: [Feature-egl-17.vulDC::pPD122.53]. In this case in the construct model you need the WBsf -Feature-egl-17.vulDC- and the clone pPD122.53 Expr11284 Feature : "egl-17.distal" Distal enhancer inserted into the pPD122.53 vector, which contains the minimal pes-10 promoter. Could define: [egl-17.distal::pPD122.53]. In this case in the construct model you need the WBsf -Feature-egl-17.distal- and the clone pPD122.53 Expr11285 Feature : "egl-17.proximal" Proximal enhancer inserted into the pPD122.53 vector, which contains the minimal pes-10 promoter. Could define: [egl-17.proximal::pPD122.53]. In this case in the construct model you need the WBsf -Feature-egl-17.proximal- and the clone pPD122.53 Expr11335 Feature : "ges-1.WGATAR" Six or seven copies of WGATAR sites in either orientation were inserted into the test vector pJM77. The vector pJM77 used to test the enhancer activity of candidate sequences was constructed as follows: a 446-bp Sau3A fragment from the promoter of the C. elegans heat shock gene 16–48 was isolated from plasmid pPC16.48-1 (Stringham et al., 1992) and inserted in the correct orientation into BamHI-cleaved vector pPD96.04 (kindly provided by A. Fire, Carnegie Institute of Washington, Baltimore,MD). In this construct, the heat shock elements of the 16–48 gene are intact but can be removed either by PstI digestion or by double digestion with PstI and HindIII. pJM77 contains the transcription initiation site, the 5'-UTR, the ATG codon, and the first 15 aminoacids of the 16–48 heat shock protein fused to a GFP-lacZ reporter incorporating 15 synthetic introns. Sequence elements to be testedfor enhancer activity are first multimerized, cloned into the EcoRV site of pBluescript, and transferred as a HindIII–PstI fragment into HindIII–PstI-cleaved pJM77, thereby removing the original heatshock elements and preserving insert orientation. several copies of Feature : "ges-1.WGATAR" were cloned into pJM77. Expr11336 Feature : "ges-1.3prime" A single copy of the sequence from7840 to 8160 bp of Ce-ges-1 was cloned in the forward orientation into pJM77. The vector pJM77 used to test the enhancer activity of candidate sequences was constructed as follows: a 446-bp Sau3A fragment from the promoter of the C. elegans heat shock gene 16-48 was isolated from plasmid pPC16.48-1 (Stringham et al., 1992) and inserted in the correct orientation into BamHI-cleaved vector pPD96.04 (kindly provided by A. Fire, Carnegie Institute of Washington, Baltimore,MD). In this construct, the heat shock elements of the 16-48 gene are intact but can be removed either by PstI digestion or by double digestion with PstI and HindIII. pJM77 contains the transcription initiation site, the 5'-UTR, the ATG codon, and the first 15 aminoacids of the 16-48 heat shock protein fused to a GFP-lacZ reporter incorporating 15 synthetic introns. Sequence elements to be testedfor enhancer activity are first multimerized, cloned into the EcoRV site of pBluescript, and transferred as a HindIII-PstI fragment into HindIII-PstI-cleaved pJM77, thereby removing the original heatshock elements and preserving insert orientation. Could define: [ges-1.3prime::pJM77]. In this case in the construct model you need the WBsf -Feature-ges-1.3prime- and the clone pJM77
?Transgene Summary UNIQUE ?Text Synonym ?Text Construction //Strain_construction Construct ?Construct XREF Transgene_construction Fragment Text ?Text //Can this be replaced by Construct? Coinjection_marker ?Text //remove?, replaced by selection_marker in ?Construct Integration_method UNIQUE ?Text Laboratory ?Laboratory #Lab_Location Author ?Author Genetic_information Extrachromosomal Integrated Map ?Map #Map_position Phenotype ?Phenotype XREF Transgene #Phenotype_info Phenotype_not_observed ?Phenotype XREF Not_in_Transgene #Phenotype_info Used_for Expr_pattern ?Expr_pattern XREF Transgene Marker_for ?Text #Evidence Gene_regulation ?Gene_regulation XREF Transgene Interactor ?Interaction Topic_marker ?Process XREF Transgene Associated_with Marked_rearrangement ?Rearrangement XREF By_transgene Clone ?Clone XREF Transgene Text Strain ?Strain XREF Transgene Reference ?Paper XREF Transgene Species UNIQUE ?Species Remark ?Text #Evidence