Talk:WormBase Model:Construct

From WormBaseWiki
Jump to navigationJump to search

Variation

?Variation
Variation_type 
Engineered_allele 
Variation_summary //to house final engineered construct
    Derived_from ?Construct XREF Variation
Corresponding_transgene Unique ?Transgene XREF Identical_variation
Method 
    Homologous_recombination 
    NHEJ //Non-homologous DNA end-joining, imprecise DNA repair
    MosSci
    Cas9
    CRISPR 
    ZFN-NHEJ repair //Zinc-finger nuclease
    ZFN-HR repair
Expr_pattern ?Expr_pattern XREF Variation #Evidence

notes on variation model changes

NOTE: the variation model currently has the following tags

  • Nature_of_variation UNIQUE
    • Polymorphic //would this be complex fusions and chimeras?
    • Synthetic //Would this be simple fusions

Do you know what these were intended for? Could they be used to house engineered alleles?

Mary Ann says "I do not know what the Nature_of_variation tag was intended for and, as I think you've noted, it is not populated. It might be that it was intended to be used to describe whether the variation is naturally occurring (polymorphic) or manmade (synthetic). If this is the case, then we have since adopted the use of the sub-tags SNP, Natural_variant or Allele (to the right of Variation_type) to indicate natural vs. manmade and I think it might be redundant to use the Nature_of_variation tag as well. As you've proposed to add the new Variation_type Engineered_allele I think this should be sufficient. We could then remove the Nature_of_variation tag. I would update e.g the Mos1 insertions to have Engineered_allele. They are currently have Variation_type Allele and Transposon_insertion."


<Karen> Ok, I will ignore the Nature_of_variation.

Mary Ann says: NHEJ. Are we expecting a large number of these? This seems like a really specific Method. Likewise ZFN-NHEJ and ZFN-HR. Is the "repair" meant to be part of the Method?

Have changed Crispr to CRISPR

If Variation_summary is proposed to only ever house information about final engineered construct then I think the tag name should be more specific. Otherwise curators may see this as a place to add any other information.

Have changed Identical_transgene to Corresponing_transgene nb have not updated Transgene model.

Construct (new)

?Construct //WBConstructID
Public_name ?Text //variation name (cp7); transgene name (oxIs12, oxEx432)
Other_name ?Text
Summary   ?Text //genotype [Pmyo-2::UNC-46::GFP]
Sequence_feature ?Feature XREF Construct //WBsf object, created by Hinxton when needed, with precise ends for mapping to genome 
Driven_by ?Gene XREF Drives_construct //this tag is currently Drives_Transgene in the Gene model, it should be changed to Drives_construct
Gene ?Gene  
Fusion_reporter ?Text //fluorescent proteins GFP, RFP, mCherry, etc.
Other_reporter ?Text //to add reporters, e.g., reporters not in Fusion_reporter list or genes from other species
Purification_tag ?Text //FLAG, HA, Myc, TAP, etc.  
Type_of_construct 
      Chimera
      Domain_swap
      Engineered mutation
      Fusion
         Complex_fusion // complex changes (e.g. GFP fusion plus point mutations)  
         Transcriptional_fusion
         Translational_fusion 
         N-terminal_translational_fusion
         C-terminal_translational_fusion
         Internal_coding_fusion
Selection_marker     ?Text    //for unc-119(+), lin-15(+), drug selection
Construction_summary  ?Text    //Backbone vector, mol bio 
Used_for
    Transgene_construction ?Transgene XREF Construct
    Variation ?Variation XREF Engineered_allele    
Reference ?Paper XREF Construct  
Person ?Person XREF Construct
Laboratory ?Laboratory #Lab_Location 
Remark ?Text #Evidence

notes on construct model

Chris’ thoughts: We could add a tag to capture the entire DNA sequence of the object (DNA_text or something). Maybe we could also add a Species tag to capture which species’ sequences are incorporated. As per our discussion at group meeting, I think it would ultimately be good to distinguish sequences/features that drive transcription (e.g. promoter sequences, enhancers, etc.) from those that direct post-transcriptional regulation (3’ UTRs) and non-regulatory (backbone) sequences. This way ?Expr_pattern, for example, can pull the relevant sequence/feature info from the ?Construct object.

Xiaodong says

  • wouldn't adding DNA text be a redundant work to ?feature curation, which basically positioning sequences on genome?
  • aren't promoters/enhaners, and/or 3' UTRs...information already captured in transgene cuation?

--kjy (talk) 00:48, 8 March 2014 (UTC) yes, if you copy in the sequences that are used to define the seq_feature, but I think this field would be used for things that are not seq_feature curated objects.

Daniela: for the backbone vector we could use the ?Clone class. I was looking up some example scenarios e.g. Expr1049_Ex and in the summary they use vector pPD95.67. Only problem is that I checked on the site and pPD95.67 exists but has no sequence info.

Mary Ann says

  • Could Fusion_reporter be a controlled vocabulary? This makes searching easier. Likewise Purification_tag?

<karen>: they are a controlled vocabulary during curation. I don't like putting these lists into the model as they are not easy to update, but people disagree with me.

  • Type_of_construct has Engineered mutation (I presume this should be Engineered_mutation). Could this be Engineered_allele so that it is consistent with the Variation_type in ?Variation. <karen>: these are two different things. I've fixed "Engineered_mutation" in the model
  • I do not understand the format of the Complex tag and the ones underneath (up to and including Internal_coding_fusion). Are these all different tags? <karen>: yes, these are all different tags. They were proposed by Tim.
  • Do you want to include the Strain in the Construct model? If so, then it might be good to have an Origin tag with the following sub-tags: Species, Laboratory and Person.

<karen>: no, I don't think the strain should be in this model. The construct model describes the elements that make up the variation or the transgene, which are the objects that the strain should be containing.

  • I think it would be good to have Status. <karen>: what is would status signify?
  • I like Chris' suggestion of capturing the entire DNA sequence. I was talking to Jonathan Ewbank at the Strasbourg meeting and he also suggested this (though he was thinking more of gbrowse representation). He suggested that if this was too big an overhead for us to do we could link to source databases which already display the DNA sequence. Something to think about.
  • The model has no Name tag. Is this intentional? <karen>: no, added it

Overall, I like the model and think it goes a long way to capturing the info. we need. I have no idea whether the model works! Have you tested it? Paul D will certainly want you to have done this prior to proposal - though I think he's seen earlier versions already.


dealing with precise ends

For mapping constructs to the genome - mainly for expression, but should also be used for rescuing constructs

Notes and thoughts for incorporation of precise ends objects into the construct class (Daniela):

We have approx 1000 objects with precise ends 'tags'.

Annotations have sometimes murky boundaries for sequences, especially very old annotations. no primer info. e.g.: Expr1265: All construct contains 3kb of 5'UTR. dys-1::gfp VIII: 3'end in exon 5. Other constructs end at exon 1 or 3. --precise ends.


Expr1275: [clk-1::gfp] translational fusion with clk-1 coding region and upstream gene toc-1 and 624bp 5' of the toc-1 start region. --precise ends no info on where the construct ends. Presumably stop codon?


Expr1049: [rgs-2::gfp] translational fusion. GFP reporter construct was constructed by inserting genomic DNA fragments from rgs-2 into the vector pPD95.67. The construct contained the promoter regions and 5' coding sequences of the RGS gene, such that a coding exon for the gene was fused in frame to the coding sequence for GFP. The rgs-2 transgene contained sequences from -4770 to +3592 (relative to the rgs-2 translation start), and thus included the large first intron of rgs-2. --precise ends.

They don't specify the promoter region. How can we map precisely to the sequence?

Issue of sequence coordinates varying with gene models. Something published 10 years ago should be remapped. Worth it? To cite Paul D: “..you need to establish what release of the database the data was generated against or have some other form of identifying how to correct the drift, once you can do that we can transform the coordinates forward if they are a large batch else you would have to do it manually.”

incorporating sequence features

For the examples I came across until now it seems that Having in the ?Construct model the Sequence feature ?Feature XREF Construct and a ?Clone tag would suffice -as long as the ?Clone class is maintained DR

examples:

Expr_pattern	Feature	Reporter gene	Notes
Expr11274	Feature : "ceh-13.enh450" WBsf919527	enh450 (23256 to 26172) was amplified by PCR using primers RP3Cel.H.do and RP3Cel.H.up for cloning into pMF1DH3 (pRK24) or pPD107.94 (pRK23), and primers RP3D.K.up and RP3D.B.do for cloning into pCb.	Could define ceh-13.enh450::pMF1DH3 or ceh-13.enh450::pPD107.94.  In this case in the construct model you need the WBsf -Feature-ceh-13.enh450- and the clone pMF1DH3 and pPD107.94
Expr11275	Feature : "ceh-13.enh3.4" WBsf919526	enh3.4 (nucleotide positions 23256 to 26644) was cut from pMF1 for cloning into pPD107.94 (pASF43), or PCR amplified using primers 3.3up and 3.3down for cloning into pCb.	Could define ceh-13.enh3.4::pPD107.94. In this case in the construct model you need the WBsf -Feature-13.enh3.4- and the clone PD107.94 
Expr11276	Feature : "ceh-13.enh740" WBsf919528	enh740 (nucleotide positions 24001 to 26644) was PCR amplified using primers CCCAAGCTTTCAGATCCCTCCACATGTC and TCTGGTAGACTGTGCAAGCAAC for cloning into pPD107.94 (pRK29) or primers GGGGTACCTCAGATCCCTCCACATGTC and CGGGATCCTGGATCTTAGGGAATTGTGG for cloning into pCb.	Could define ceh-13.enh740::pRK29 and ceh-13.enh740::pCb.  In this case in the construct model you need the WBsf -Feature-ceh-13.enh740::pRK29- and the clone pRK29 and pCb 
Expr11277	Feature : "ceh-22.proximal"	ceh-22.proximal::(del)Pes-10::lacZ.	Could define ceh-22.proximal::(del)Pes-10::lacZ.
Expr11278	Feature : "ceh-22.PE1"	WBTransgene00019185. [PE1::(del)pes-10::lacZ]	In the construct model you need the WBsf -ceh-22.PE1- and the clone pPD95.21 ((del)Pes-10::lacZ).
Expr11279	Feature : "ceh-22.pe39_pe41"	WBTransgene00018710, WBTransgene00018711. [pe39::(del)pes-10::lacZ], [pe41::(del)pes-10::lacZ]	In the construct model you need the WBsf -ceh-22.pe39_pe41- and the clone pPD95.21 ((del)Pes-10::lacZ).
Expr11280	Feature : "ceh-22.pe27"	WBTransgene00019186 ( WBTransgene00019186 ). [pe27::(del)pes-10::lacZ]	In the construct model you need the WBsf -ceh-22.pe27- and the clone pPD95.21 ((del)Pes-10::lacZ).
Expr11281	Feature : "ceh-24.vulval"	The DNA sequence from Feature"ceh-24.vulval" was assayed upstream of a truncated pes-10 promoter fragment driving lacZ -pPD95.18.	Could define ceh-24.vulval::pPD95.18. In this case in the construct model you need the WBsf -ceh-24.vulval- and the clone pPD95.18. 
Expr11282	Feature : "ceh-24.pm8"	The DNA sequences from Feature"ceh-24.pm8" was assayed in front of a truncated myo-2 promoter -pPD95.62.	Could define ceh-24.pm8::pPD95.62. In this case in the construct model you need the WBsf -ceh-24.pm8- and the clone pPD95.62. 
Expr11283	Feature : "egl-17.vulDC"	A 64-bp fragment, located between 366 and 303 bp upstream of the egl-17 ATG was inserted into the pPD122.53 vector, which contains the minimal pes-10 promoter.	Could define: [Feature-egl-17.vulDC::pPD122.53]. In this case in the construct model you need the WBsf -Feature-egl-17.vulDC- and the clone pPD122.53 
Expr11284	Feature : "egl-17.distal"	Distal enhancer inserted into the pPD122.53 vector, which contains the minimal pes-10 promoter.	Could define: [egl-17.distal::pPD122.53]. In this case in the construct model you need the WBsf -Feature-egl-17.distal- and the clone pPD122.53 
Expr11285	Feature : "egl-17.proximal"	Proximal enhancer inserted into the pPD122.53 vector, which contains the minimal pes-10 promoter.	Could define: [egl-17.proximal::pPD122.53]. In this case in the construct model you need the WBsf -Feature-egl-17.proximal- and the clone pPD122.53 
Expr11335	Feature : "ges-1.WGATAR"	Six or seven copies of WGATAR sites in either orientation were inserted into the test vector pJM77. The vector pJM77 used to test the enhancer activity of candidate sequences was constructed as follows: a 446-bp Sau3A fragment from the promoter of the C. elegans heat shock gene 16–48 was isolated from plasmid pPC16.48-1 (Stringham et al., 1992) and inserted in the correct orientation into BamHI-cleaved vector pPD96.04 (kindly provided by A. Fire, Carnegie Institute of Washington, Baltimore,MD). In this construct, the heat shock elements of the 16–48 gene are intact but can be removed either by PstI digestion or by double digestion with PstI and HindIII. pJM77 contains the transcription initiation site, the 5'-UTR, the ATG codon, and the first 15 aminoacids of the 16–48 heat shock protein fused to a GFP-lacZ reporter incorporating 15 synthetic introns. Sequence elements to be testedfor enhancer activity are first multimerized, cloned into the EcoRV site of pBluescript, and transferred as a HindIII–PstI fragment into HindIII–PstI-cleaved pJM77, thereby removing the original heatshock elements and preserving insert orientation.	several copies of Feature : "ges-1.WGATAR" were cloned into pJM77.
Expr11336	Feature : "ges-1.3prime"	A single copy of the sequence from7840 to 8160 bp of Ce-ges-1 was cloned in the forward orientation into pJM77. The vector pJM77 used to test the enhancer activity of candidate sequences was constructed as follows: a 446-bp Sau3A fragment from the promoter of the C. elegans heat shock gene 16-48 was isolated from plasmid pPC16.48-1 (Stringham et al., 1992) and inserted in the correct orientation into BamHI-cleaved vector pPD96.04 (kindly provided by A. Fire, Carnegie Institute of Washington, Baltimore,MD). In this construct, the heat shock elements of the 16-48 gene are intact but can be removed either by PstI digestion or by double digestion with PstI and HindIII. pJM77 contains the transcription initiation site, the 5'-UTR, the ATG codon, and the first 15 aminoacids of the 16-48 heat shock protein fused to a GFP-lacZ reporter incorporating 15 synthetic introns. Sequence elements to be testedfor enhancer activity are first multimerized, cloned into the EcoRV site of pBluescript, and transferred as a HindIII-PstI fragment into HindIII-PstI-cleaved pJM77, thereby removing the original heatshock elements and preserving insert orientation.	Could define: [ges-1.3prime::pJM77]. In this case in the construct model you need the WBsf -Feature-ges-1.3prime- and the clone pJM77

Expression_pattern

proposed addition

Variation ?Variation XREF Expression_pattern

==Clone==

add tag 
<pre>
Construct ?Construct XREF Clone
?Clone	Evidence	#Evidence
	Remark	General_remark	?Text
		Y_remark	?Text
		PCR_remark	?Text
	Position	Map	?Map	XREF	Clone	#Map_position
		Pmap	UNIQUE	?Contig	XREF	Clone	UNIQUE	Int	UNIQUE	Int
		Clone_left_end	?Sequence
		Clone_right_end	?Sequence
		Pos_neg_data	?Pos_neg_data
	Positive	Positive_gene	?Gene	XREF	Positive_clone
		Positive_variation	?Variation	XREF	Positive_clone	?Author
		Inside_rearr	?Rearrangement	XREF	Clone_inside	?Author
		Hybridizes_to	?Clone	XREF	Positive_probe	?Grid
		Hybridizes_weak	?Clone	XREF	Pos_probe_weak	?Grid
		Positive_probe	?Clone	XREF	Hybridizes_to	?Grid
		Pos_probe_weak	?Clone	XREF	Hybridizes_weak	?Grid
	Negative	Negative_gene	?Gene	XREF	Negative_clone
		Negative_locus	?Locus	XREF	Negative_clone	?Author
		Outside_rearr	?Rearrangement	XREF	Clone_outside	?Author
		Negative_probe	?Clone	XREF	Does_not_hybridize_to	?Grid
		Does_not_hybridize_to	?Clone	XREF	Negative_probe	?Grid
	In_strain	?Strain	XREF	Clone
	Species	UNIQUE	?Species
	Sequence	?Sequence	XREF	Clone
	PCR_product	?PCR_product	XREF	Clone
	Length	Seq_length	UNIQUE	Int
		Gel_length	UNIQUE	Float
	Location	?Laboratory	#Lab_Location
	URL	Text
	Gridded	?Grid
	Grid_data	?Grid_data
	FingerPrint	Gel_Number	UNIQUE	Int
		Approximate_Match_to	UNIQUE	?Clone	XREF	Canonical_for
		Exact_Match_to	UNIQUE	?Clone	XREF	Canonical_for
		Funny_Match_to	UNIQUE	?Clone	XREF	Canonical_for
		Canonical_for	?Clone	UNIQUE	Int	UNIQUE	Int
		Bands	UNIQUE	Int	UNIQUE	Int
		Gel	?Motif	#Lane
	Contig9	Chromosome	UNIQUE	?Map
		Vaxmap	UNIQUE	Float
		In_situ	UNIQUE	Int	UNIQUE	Int
		Cosmid_grid
		Canon_for_cosmid
		Flag	UNIQUE	Int
		Autopos
	Expression_construct	Pattern	?Text
	Reference	?Paper	XREF	Clone
	cDNA_group	Contains	?Clone	XREF	Contains	Text
		Contained_in	?Clone	XREF	Contained_in
		Best_match	UNIQUE	?Text
	Expr_pattern	?Expr_pattern	XREF	Clone
	Sequence_status	Shotgun	UNIQUE	DateType
		Finished	UNIQUE	DateType
		Accession_number	UNIQUE	?Text
	DB_info	Database	?Database	?Database_field	?Accession_number	XREF	Clone
	Type	UNIQUE	Cosmid
			Fosmid
			YAC
			cDNA
			Plasmid
			Other	Text
	Transgene	?Transgene	XREF	Clone
	Derived_from	?Clone

Transgene

?Transgene      
Summary UNIQUE ?Text                               
Synonym ?Text       
Identical_variation Unique ?Variation XREF Identical_transgene    //put in to unambiguously associate the allele/transgene object -  NOTE: these objects will have the same public_name, and should be synchronized through the Engineered_allele tag in ?Variation - that is any variation that has tag Engineered_allele will have to have a corresponding transgeneID, so perhaps this XREF Identical_variation tag is not necessary      
Construction //Strain_construction
     Construct       ?Construct XREF Transgene_construction
     Fragment Text ?Text  //Can this be replaced by Construct?
     Coinjection_marker ?Text //remove?, replaced by selection_marker in ?Construct
     Integration_method UNIQUE ?Text                                                    
     Laboratory ?Laboratory #Lab_Location    
     Author ?Author                   
Genetic_information                            
     Extrachromosomal                
     Integrated
          Map ?Map  #Map_position  
Phenotype ?Phenotype XREF Transgene #Phenotype_info
Phenotype_not_observed ?Phenotype XREF Not_in_Transgene #Phenotype_info  
Used_for                                                     
     Expr_pattern ?Expr_pattern XREF Transgene  
     Marker_for   ?Text #Evidence 
     Gene_regulation ?Gene_regulation XREF Transgene 
     Interactor ?Interaction
     Topic_marker ?Process XREF Transgene
Associated_with                   
     Marked_rearrangement ?Rearrangement XREF By_transgene
     Clone ?Clone XREF Transgene Text 
     Strain ?Strain XREF Transgene 
Reference ?Paper XREF Transgene  
Species UNIQUE ?Species       
Remark ?Text #Evidence

sample transgene data

Transgene : "WBTransgene00000083"
Public_name	"axIs1140"
Summary	"[pie-1::gfp-mbk-2]"
Construction_summary	"Clone = pJP1.02. A genomic fragment spanning the ORF in mbk-2c was cloned into two vectors, pJH4.52 and pID3.01, to create amino-terminal GFP fusions. These vectors utilize pie-1 5' and 3' UTR sequences to drive expression of the transgene in the maternal germline. GFP lines were created by the complex array method or by the microparticle bombardment method (pID3.01). Several independent lines were established using each technique. Although expression levels varied from line to line, all lines showed the same overall distribution of GFP:MBK-2."
Reporter_product	"GFP"
Driven_by_gene	"WBGene00004027"
Gene	"WBGene00003150"
Integration_method	"Particle_bombardment"
Integrated
Strain	"JH1576"
Reference	"WBPaper00006085"
Reference	"WBPaper00026970"
Reference	"WBPaper00027048"
Reference	"WBPaper00028588"
Reference	"WBPaper00031014"
Reference	"WBPaper00031015"
Reference	"WBPaper00035426"