Sequence Feature
Contents
Flagging papers
send them to worm-bug@sanger.ac.uk This is where papers identified by svms/pattern matching are sent. We will be moving away from this ticketing system, but for the meantime they will all be in the same place.
Rules for marking up regions (from GW)
- If a region is necessary and sufficient to drive a reporter gene, then mark it as an 'enhancer' or 'silencer'.
(I don't think these are the classic definitions for enhancer/silencer, RL)
- If a region is both an enhancer and a silencer, then it should have the SO_term tags for both of these.
- If mobility shift experiments or similar experimental evidence is available to assert that a short region is a TF binding site, then mark it as a TF_binding_site.
- Similarity to a known binding motif is not evidence of being a TF_binding_site.
- If there is no evidence for a TF binding site and it has an effect on expression when mutated or deleted, but is not sufficient to drive a reporter gene, then we cannot assert that it is an enhancer or a TF binding site. Mark it as an anonymous 'regulatory_region'.
- If a region has the properties of being both a TF binding site and an enhancer then mark it up as two Features, one a TF_binding_site and one an enhancer.
- If a region is asserted to be a promoter region in the paper and it is within 200bp (or thereabouts?) of the 5' of the target gene and it is neccessary and sufficient to promote a reporter gene, mark it as a promoter. If in doubt, consider marking it as an enhancer.
Example for sequence feature curation
the example is from WBPaper00003631
Feature : "egl-1_temp_1.1" Sequence VF23B12L Mapping_target VF23B12L Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg DNA_text CTCCTAACCGGGTGGTC Description "This is a TRA-1 binding site that represses egl-1." Remark "This is the TF_binding_site for TRA-1 which silences egl-1. N.B. a 'silencer' Feature has also been made at this location to aid expression and interaction curation [2013-07-23 gw3]" Associated_with_gene WBGene00001170 // egl-1 Bound_by_product_of WBGene00006604 // tra-1 Transcription_factor WBTranscriptionFactor000029 // tra-1 Method TF_binding_site SO_term "SO:0000235" // TF_binding_site Defined_by_paper WBPaper00003631 Public_name "TRA-1 binding site" Feature : "egl-1_temp_1.2" Sequence VF23B12L Mapping_target VF23B12L Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg DNA_text CTCCTAACCGGGTGGTC Description "This is the silencer of egl-1, containing a single TF_binding_site bound by TRA-1." Remark "Made this 'silencer' feature in addition to the TRA-1 TF_binding_site Feature to aid expression and interaction curation [2013-07-23 gw3]" Associated_with_gene WBGene00001170 // egl-1 Method silencer SO_term "SO:0000625" // silencer Defined_by_paper WBPaper00003631 Public_name "TRA-1 binding site silencer"
Most Expr_pattern and Interaction objects will be attached to the 'enhancer/silencer' Features rather than the TF_binding_site Features
Link to Gene Regulation/Regulatory interactions
A regulatory interaction object is generated and added to the Sequence feature object. In the specific example:
//Associated_with_interaction WBInteraction000520178
Link to Expression pattern
When do we link sequence features to Expression Pattern objects and how.
Example 1 -from WBPaper00003631:
"The egl-1 gene appears to be expressed in the HSNs in males." The construct used is [Pegl-1::gfp] transcriptional fusion.
- Curator creates an Expression object for egl-1 in the male's HSN and links it to pegl-1::GFP transgene.
Expr_pattern : "Expr11092" Anatomy_term "WBbt:0004757" Certain //HSNR Anatomy_term "WBbt:0004758" Certain //HSNL Anatomy_term "WBbt:0007850" Certain //male Gene "WBGene00001170"//egl-1 Pattern "The egl-1 gene appears to be expressed in the HSNs in males, in which the HSNs normally undergo programmed cell death, but not in hermaphrodites, in which the HSNs normally survive." Reference "WBPaper00003631" Reporter_gene "[Pegl-1::gfp] transcriptional fusion. To construct Pegl-1::gfp, bases +174 to +5820 (5'-3') downstream of the stop codon of the egl-1 gene and bases -1914 to -837 (5'-3') upstream of the stop codon were amplified with appropriate primers and cloned into the SpeI-ApaI (5'-3') and PstI-BamHI (5'-3') sites of vector pPD95.69, respectively (A. Fire et al., personal communication). --precise ends."
- Sequence curator creates a sequence feature for that object -we are not there yet but we should aim for it.
- In the sequence feature object there will be a link to the expression.
note that in this expression object we have, as per the Expression_pattern model
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern
- The Expression pattern object in this case is linked to the gene as the authors hypothesize that the transcriptional fusion expression is the endogenous egl-1 expression.
Example 2 from WBPaper00003631 (hypothetical made up example- in this specific paper there's not such evidence but might be a scenario):
"This specific sequence of 80bp is expressed in the HSNL. The construct used is [80bp-egl-1::gfp].
1) One way to go is to link the expression to the sequence, other than the gene. From the Expr_pattern model:
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern Sequence ?Sequence XREF Expr_pattern
Expr_pattern : "Expr11093" Anatomy_term "WBbt:0004758" Certain //HSNL Sequence "???" Pattern "This particular sequence::GFP was expressed in HSNL" Reference "WBPaper00003631" Reporter_gene "[80bp-egl-1::gfp]. To construct 80bp-egl-1::gfp..."
- Sequence curator creates a sequence feature for that object.
- In the sequence feature object there will be a link to the expression.
- The Expression pattern object in this case is linked to the sequence as the artificial construct might not resemble the endogenous egl-1 expression.
- It will be generally hard to determine where is the boundary between artificial and endogenous expression if no other experimental evidences -IHC, ish- are available.
* If we curate the objects this way we should determine how to display them on the site. Separate from other expression objects?
2) Another option would be to include those objects in Gene regulation other than expression. That specific sequence is responsible for expression in..
How were these kinds of objects curated in the past? Was it via gene_regulation Cis_regulated_seq?
Although 'Cis_regulated_seq' existed in old gene_regulation model, it was never used for any objects both in Wen and Xiaodong's hands. In new Interaction modle, this tag is gone. --XW
3) A third possibility is to add Drives_expression_in in the feature object
Drives_expression_in Life_stage ?Life_stage Anatomy_term ?Anatomy_term GO_term ?GO_term
This is a favorable way as it will not "contaminate" the expression pattern class and at the same time the info of expression of the enhancer is captured. In REDfly (Regulatory Element Database for Drosophila, http://redfly.ccr.buffalo.edu/) the enhancer region is annotated to the anatomy terms but that expression is not listed under the classic expression patterns. See for example the decapentaplegic gene (dpp) construct dpp_303lacZ.
In the example of Hwang and Sternberg, 2004 (WBPaper00006370), the feature object will be
Feature : Public_name "lin-3 enhancer" Sequence F36H1 Description "lin-3 enhancer region, driving anchor cell (AC) specific expression" Flanking_sequences "ctagaacttcccgtctctccctattcaatg" "cttaccaatgtctcaggcatttttggaaaa" Mapping_target F36H1 Associated_with_gene WBGene00002992 // lin-3 Species "Caenorhabditis elegans" Defined_by_paper WBPaper00006370 SO_term SO:0000165 // enhancer Method enhancer Associated_with_Interaction WBInteraction000501966// hlh-2 binds to lin-3 Associated_with_Interaction WBInteraction000520204// nhr-25 binds to lin-3 Anatomy_term "WBbt:0004522"//Anchor cell
4) We could simply generate an Expr_pattern object and add the Associated_feature ?Feature. For display purposes on the site we can display objects that have Associated_feature in a separate section
Example 3 from WBPaper00003631:
"The egl-1 gene appears to be expressed in the HSNs in males (Pegl-1::GFP reporter)...if tra-1 is bound to egl-1 the expression in HSNs is repressed"
- The region of tra-1 binding to egl-1 is known and 2 sequence features are created for it, one as TF_binding_site and one for silencer.
- A gene regulation object is created -> egl-1 downregulation in HSN.
- The object is added in the silencer sequence feature object.
Should we create an expression object for the tra-1 binding site? in this case should create a negative expression. egl-1 is NOT expressed in HSNs if bound by tra-1. This falls under gene regulation to me -DR
Should we link to the existing expression pattern Expr_pattern : "Expr11092" -see above? This might not be appropriate as Expr11092 depicts expression in male HSNs. If we want to pull out that info we could do it anyway through the gene regulation object -DR
Should we just leave the gene regulation association?
As of now few Expression Patterns are linked to the Genome Browser (Vancouver set is the only data set). The ultimate goal is to map, whenever we can, expression constructs to the genome browser.
Top down approach
We are brainstorming in order to develop a model that will be suitable for accommodating curation of all the above.
The potential model should contain the following info
for Expression
- sequence - the sequence could be any stretch of DNA from few bp to kbs
(?Feature, 1 or more)
- reporter -GFP, RFP, YFP, mCherry, Venus,...
(+ Other: text, including when endogenous gene is used as the (part of, e.g. gfp fused in) reporter)
- gene (the gene immediately downstream of the sequence) non unique because it could be associated to more than one gene
(NOT annotate gene because 1. the base model is about describing the pattern of expression, 2. location information intrinsically informs possible cis-targets, 3. if author asserts relevant genes, that should go in some ?Regulation)
- reflect endogenous expression of ?Gene #if the author assume that expression reflects the endogenous then we put it otherwise not
- anatomy term
- life stage
- (sex will be encoded in life stage and anatomy)
- WBPaper
- experimental info?
- other info will be textual
Next topics: capture regulation post-transcriptional regulation