Sequence Feature

From WormBaseWiki
Jump to navigationJump to search

Flagging papers

send them to worm-bug@sanger.ac.uk This is where papers identified by svms/pattern matching are sent. We will be moving away from this ticketing system, but for the meantime they will all be in the same place.


Rules for marking up regions (from GW)

  • If a region is necessary and sufficient to drive a reporter gene, then mark it as an 'enhancer' or 'silencer'.

(I don't think these are the classic definitions for enhancer/silencer, RL)

  • If a region is both an enhancer and a silencer, then it should have the SO_term tags for both of these.
  • If mobility shift experiments or similar experimental evidence is available to assert that a short region is a TF binding site, then mark it as a TF_binding_site.
  • Similarity to a known binding motif is not evidence of being a TF_binding_site.
  • If there is no evidence for a TF binding site and it has an effect on expression when mutated or deleted, but is not sufficient to drive a reporter gene, then we cannot assert that it is an enhancer or a TF binding site. Mark it as an anonymous 'regulatory_region'.
  • If a region has the properties of being both a TF binding site and an enhancer then mark it up as two Features, one a TF_binding_site and one an enhancer.
  • If a region is asserted to be a promoter region in the paper and it is within 200bp (or thereabouts?) of the 5' of the target gene and it is neccessary and sufficient to promote a reporter gene, mark it as a promoter. If in doubt, consider marking it as an enhancer.


Example for sequence feature curation

the example is from WBPaper00003631


Feature : "egl-1_temp_1.1"
Sequence VF23B12L
Mapping_target VF23B12L
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg
DNA_text CTCCTAACCGGGTGGTC
Description "This is a TRA-1 binding site that represses egl-1."
Remark "This is the TF_binding_site for TRA-1 which silences egl-1. 
N.B. a 'silencer' Feature has also been made at this location to aid expression and interaction curation
[2013-07-23 gw3]"
Associated_with_gene WBGene00001170 // egl-1
Bound_by_product_of WBGene00006604 // tra-1
Transcription_factor WBTranscriptionFactor000029 // tra-1
Method  TF_binding_site
SO_term "SO:0000235" // TF_binding_site
Defined_by_paper WBPaper00003631
Public_name "TRA-1 binding site"

Feature : "egl-1_temp_1.2"
Sequence VF23B12L
Mapping_target VF23B12L
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg
DNA_text CTCCTAACCGGGTGGTC
Description "This is the silencer of egl-1, containing a single TF_binding_site bound by TRA-1."
Remark "Made this 'silencer' feature in addition to the TRA-1 TF_binding_site Feature to aid expression 
and interaction curation [2013-07-23 gw3]"
Associated_with_gene WBGene00001170 // egl-1
Method  silencer
SO_term "SO:0000625" // silencer
Defined_by_paper WBPaper00003631
Public_name "TRA-1 binding site silencer"

Most Expr_pattern and Interaction objects will be attached to the 'enhancer/silencer' Features rather than the TF_binding_site Features

Link to Gene Regulation/Regulatory interactions

A regulatory interaction object is generated and added to the Sequence feature object. In the specific example:

//Associated_with_interaction WBInteraction000520178

Link to Expression pattern

When do we link sequence features to Expression Pattern objects and how.


Example 1 -from WBPaper00003631:

"The egl-1 gene appears to be expressed in the HSNs in males." The construct used is [Pegl-1::gfp] transcriptional fusion.

  • Curator creates an Expression object for egl-1 in the male's HSN and links it to pegl-1::GFP transgene.
Expr_pattern : "Expr11092"
Anatomy_term	"WBbt:0004757" Certain //HSNR
Anatomy_term	"WBbt:0004758" Certain //HSNL
Anatomy_term	"WBbt:0007850" Certain //male
Gene	"WBGene00001170"//egl-1
Pattern	"The egl-1 gene appears to be expressed in the HSNs in males, in which the HSNs normally undergo 
programmed cell death, but not in hermaphrodites, in which the HSNs normally survive."
Reference	"WBPaper00003631"
Reporter_gene	"[Pegl-1::gfp] transcriptional fusion. To construct Pegl-1::gfp, bases +174 to +5820 (5'-3') 
downstream of the stop codon of the egl-1 gene and bases -1914 to -837 (5'-3') upstream of the stop codon were
amplified with appropriate primers and cloned into the SpeI-ApaI (5'-3') and PstI-BamHI (5'-3') sites of 
vector pPD95.69, respectively (A. Fire et al., personal communication). --precise ends."

  • Sequence curator creates a sequence feature for that object -we are not there yet but we should aim for it.
  • In the sequence feature object there will be a link to the expression.

note that in this expression object we have, as per the Expression_pattern model

Expr_pattern	Expression_of	Gene ?Gene XREF Expr_pattern
				
  • The Expression pattern object in this case is linked to the gene as the authors hypothesize that the transcriptional fusion expression is the endogenous egl-1 expression.


Example 2 from WBPaper00003631 (hypothetical made up example- in this specific paper there's not such evidence but might be a scenario):

"This specific sequence of 80bp is expressed in the HSNL. The construct used is [80bp-egl-1::gfp].

1) One way to go is to link the expression to the sequence, other than the gene. From the Expr_pattern model:

Expr_pattern	Expression_of	Gene ?Gene XREF Expr_pattern
				Sequence   ?Sequence XREF Expr_pattern 

Expr_pattern : "Expr11093"
Anatomy_term	"WBbt:0004758" Certain //HSNL
Sequence	"???"
Pattern	"This particular sequence::GFP was expressed in HSNL"
Reference	"WBPaper00003631"
Reporter_gene	"[80bp-egl-1::gfp]. To construct 80bp-egl-1::gfp..."
				
  • Sequence curator creates a sequence feature for that object.
  • In the sequence feature object there will be a link to the expression.
  • The Expression pattern object in this case is linked to the sequence as the artificial construct might not resemble the endogenous egl-1 expression.
  • It will be generally hard to determine where is the boundary between artificial and endogenous expression if no other experimental evidences -IHC, ish- are available.

* If we curate the objects this way we should determine how to display them on the site. Separate from other expression objects?

2) Another option would be to include those objects in Gene regulation other than expression. That specific sequence is responsible for expression in..

How were these kinds of objects curated in the past? Was it via gene_regulation Cis_regulated_seq?

Although 'Cis_regulated_seq' existed in old gene_regulation model, it was never used for any objects both in Wen and Xiaodong's hands. In new Interaction modle, this tag is gone. --XW

3) A third possibility is to add Drives_expression_in in the feature object

               Drives_expression_in
			
				Life_stage   ?Life_stage  
				Anatomy_term ?Anatomy_term 
				GO_term      ?GO_term     

This is a favorable way as it will not "contaminate" the expression pattern class and at the same time the info of expression of the enhancer is captured. In REDfly (Regulatory Element Database for Drosophila, http://redfly.ccr.buffalo.edu/) the enhancer region is annotated to the anatomy terms but that expression is not listed under the classic expression patterns. See for example the decapentaplegic gene (dpp) construct dpp_303lacZ.

In the example of Hwang and Sternberg, 2004 (WBPaper00006370), the feature object will be

Feature : 
Public_name "lin-3 enhancer"
Sequence F36H1
Description "lin-3 enhancer region, driving anchor cell (AC) specific expression"
Flanking_sequences "ctagaacttcccgtctctccctattcaatg" "cttaccaatgtctcaggcatttttggaaaa" 
Mapping_target F36H1
Associated_with_gene WBGene00002992 // lin-3
Species "Caenorhabditis elegans"
Defined_by_paper WBPaper00006370
SO_term SO:0000165 // enhancer
Method enhancer 
Associated_with_Interaction WBInteraction000501966// hlh-2 binds to lin-3
Associated_with_Interaction WBInteraction000520204// nhr-25 binds to lin-3
Anatomy_term	"WBbt:0004522"//Anchor cell

4) We could simply generate an Expr_pattern object and add the Associated_feature ?Feature. For display purposes on the site we can display objects that have Associated_feature in a separate section

Example 3 from WBPaper00003631:

"The egl-1 gene appears to be expressed in the HSNs in males (Pegl-1::GFP reporter)...if tra-1 is bound to egl-1 the expression in HSNs is repressed"

  • The region of tra-1 binding to egl-1 is known and 2 sequence features are created for it, one as TF_binding_site and one for silencer.
  • A gene regulation object is created -> egl-1 downregulation in HSN.
  • The object is added in the silencer sequence feature object.

Should we create an expression object for the tra-1 binding site? in this case should create a negative expression. egl-1 is NOT expressed in HSNs if bound by tra-1. This falls under gene regulation to me -DR

Should we link to the existing expression pattern Expr_pattern : "Expr11092" -see above? This might not be appropriate as Expr11092 depicts expression in male HSNs. If we want to pull out that info we could do it anyway through the gene regulation object -DR

Should we just leave the gene regulation association?

As of now few Expression Patterns are linked to the Genome Browser (Vancouver set is the only data set). The ultimate goal is to map, whenever we can, expression constructs to the genome browser.

Top down approach

We are brainstorming in order to develop a model that will be suitable for accommodating curation of all the above.


The potential model should contain the following info

for Expression
  • sequence - the sequence could be any stretch of DNA from few bp to kbs

(?Feature, 1 or more)

  • reporter -GFP, RFP, YFP, mCherry, Venus,...

(+ Other: text, including when endogenous gene is used as the (part of, e.g. gfp fused in) reporter)

  • gene (the gene immediately downstream of the sequence) non unique because it could be associated to more than one gene

(NOT annotate gene because 1. the base model is about describing the pattern of expression, 2. location information intrinsically informs possible cis-targets, 3. if author asserts relevant genes, that should go in some ?Regulation)

  • reflect endogenous expression of ?Gene #if the author assume that expression reflects the endogenous then we put it otherwise not
  • anatomy term
  • life stage
  • (sex will be encoded in life stage and anatomy)
  • WBPaper
  • experimental info?
  • other info will be textual


Next topics: capture regulation post-transcriptional regulation