Difference between revisions of "Working Group:Sequence Features"
From WormBaseWiki
Jump to navigationJump to searchm (→Headline text) |
|||
(99 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
+ | |||
+ | == Headline text == | ||
'''Topics''' | '''Topics''' | ||
* Display of Sequence Features on the website | * Display of Sequence Features on the website | ||
+ | ** See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365 | ||
** And Transcription factors and Gene_product binds. | ** And Transcription factors and Gene_product binds. | ||
+ | ** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cis-regulatory-element? | ||
+ | ** add a sequence feature table in sequence widget to display all features related to the gene | ||
+ | ** in cytoscape display, sequence feature as an entity node | ||
+ | |||
* Stream out working flow | * Stream out working flow | ||
+ | ** How can we automatically identify Sequence Feature papers. | ||
+ | ** How do papers come in now? SVM? Textpresso string matches? | ||
+ | |||
+ | * Improving data flow | ||
+ | ** How can we make the data immediately available to all curators? | ||
+ | *** Add Paper and Public_name fields to the Features in the Nameserver? | ||
+ | *** geneace is available for download updated every day - a copy is taken by Caltech. | ||
+ | **** still need WBsf object details in order to relate/curate regulation/expression/construct objects | ||
+ | |||
+ | |||
* Sample papers curation | * Sample papers curation | ||
** Prepare a paper list from each person | ** Prepare a paper list from each person | ||
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1' | * assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1' | ||
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions. | ** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions. | ||
− | + | ** dr - ok sounds good to me Gary | |
+ | |||
+ | '''Practical Issues''' | ||
+ | |||
+ | * Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US | ||
+ | * We arrive in Boston on Saturday evening and fly home on Wednesday night. | ||
+ | * What time shall we meet on Monday morning and where do we go? | ||
+ | **I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office. | ||
+ | * Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space. | ||
+ | **Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways. | ||
+ | * Thanks Xiaodong, I'm bringing my chromebook - Gary | ||
'''Pre-Jamboree prep''' | '''Pre-Jamboree prep''' | ||
Line 27: | Line 54: | ||
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices. | ** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices. | ||
** browsing capabilities through WormMine? | ** browsing capabilities through WormMine? | ||
+ | ** Should we put the list of papers in the Caltech curation status form | ||
+ | |||
+ | |||
+ | '''Matters Arising''' | ||
+ | |||
+ | * things noted - not necessarily to do with the topic in hand | ||
+ | ** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form. | ||
+ | ** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here??? | ||
+ | *** Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries. | ||
+ | *** Resolved: WBsf919641 merged into WBsf919607 - GW | ||
+ | ** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813. | ||
+ | *** Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there. | ||
+ | **** We have a merged_into tag structure in Feature. - GW | ||
+ | *** Resolved: WBsf019227 merged into WBsf038813 - GW | ||
+ | ** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it? | ||
+ | *** Fixed by Mary Ann. Changed Method to enhancer. | ||
+ | ** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed. | ||
+ | *** Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks. | ||
+ | ** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235 | ||
+ | *** Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run. | ||
+ | *** Resolved: all TF_binding_sites with SO_term SO:0000409 changed to SO:0000235 - GW | ||
+ | ** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag. | ||
+ | *** Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check. | ||
+ | *** Resolved: the MEC-3/UNC-86 heterodimer has been added - GW | ||
+ | * how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…? | ||
+ | **how to define these methods? | ||
+ | ** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods? | ||
+ | * Gary does a quick literature search to see what other work has been done in the sites described. | ||
+ | **along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features? | ||
+ | *** The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary). | ||
+ | *** agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model. | ||
+ | <pre> | ||
+ | ?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object | ||
+ | Name Public_name UNIQUE ?Text | ||
+ | Other_name ?Text | ||
+ | Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text | ||
+ | Mapping_target UNIQUE ?Sequence | ||
+ | Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks) | ||
+ | DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able | ||
+ | // store consensus sequences, e.g. binding site consensus sequence | ||
+ | Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness. | ||
+ | Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness. | ||
+ | History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence | ||
+ | Acquires_merge ?Feature XREF Merged_into #Evidence | ||
+ | Deprecated Text #Evidence | ||
+ | Visible Description ?Text | ||
+ | SO_term ?SO_term | ||
+ | Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence | ||
+ | Defined_by_paper ?Paper XREF Feature #Evidence | ||
+ | Defined_by_person ?Person | ||
+ | Defined_by_author ?Author | ||
+ | Defined_by_analysis ?Analysis Int | ||
+ | Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump | ||
+ | Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard | ||
+ | Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard | ||
+ | Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard | ||
+ | Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard | ||
+ | Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard | ||
+ | Associated_with_variation ?Variation XREF Feature #Evidence | ||
+ | Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence | ||
+ | Associated_with_operon ?Operon XREF Associated_feature #Evidence | ||
+ | Associated_with_Interaction ?Interaction XREF Feature_interactor | ||
+ | Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence | ||
+ | Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence | ||
+ | Associated_with_construct ?Construct XREF Sequence_feature | ||
+ | Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds | ||
+ | Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site | ||
+ | Annotation UNIQUE ?LongText // added for data attribution [030220 dl] | ||
+ | Confidential_remark ?Text //pad | ||
+ | Remark ?Text #Evidence | ||
+ | Method UNIQUE ?Method | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | '''Duplicated Features''' | ||
+ | The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature. | ||
+ | How can we avoid this in future? | ||
+ | * The list of duplicated Features has been moved to the Discussion page. | ||
'''Papers to curate''' | '''Papers to curate''' | ||
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each. | * I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each. | ||
+ | * Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it. | ||
+ | |||
<pre> | <pre> | ||
− | WBPaper00002925 Daniela | + | WBPaper00002925 Daniela Interaction: 8 |
− | WBPaper00004568 Daniela | + | Time: 10 mins |
+ | New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011 | ||
+ | Location of information: - | ||
+ | Comments: | ||
+ | WBPaper00004568 Daniela Interaction: 2 | ||
+ | Time: 25 mins | ||
+ | New objects: 3 promoters- pA, pB, and pC | ||
+ | Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites. | ||
+ | Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects. See RT 418146 as well. | ||
+ | potential cis-regulation objects for Xiaodong | ||
WBPaper00005842 Daniela | WBPaper00005842 Daniela | ||
− | WBPaper00024328 Daniela | + | Time: 3 hours |
+ | New objects: | ||
+ | mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct | ||
+ | |||
+ | mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct | ||
+ | |||
+ | mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct. | ||
+ | |||
+ | mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct. | ||
+ | |||
+ | mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct. | ||
+ | |||
+ | mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct. | ||
+ | |||
+ | Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material. | ||
+ | Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions | ||
+ | Comment: potential TF binding sites | ||
+ | Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions' | ||
+ | |||
+ | WBPaper00024328 Daniela Interaction: 4 | ||
+ | Time: 2.5 hours | ||
+ | New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already: | ||
+ | |||
+ | pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct | ||
+ | pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct | ||
+ | pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct | ||
+ | pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct | ||
+ | pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct | ||
+ | pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct | ||
+ | Detailed construction of plasmids in Materials and Methods | ||
+ | Location of information: body text and Figs. 5 and 6 | ||
+ | Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss | ||
WBPaper00028802 Daniela | WBPaper00028802 Daniela | ||
+ | Time: 5 min | ||
+ | Location: table 2 and body text | ||
+ | New objects: | ||
+ | Comment: potential paper on cis-regulation (no evidence found), WBPmat00000004 is associated with this paper - XW | ||
+ | Comment: Table 2 gives a list of experimentally verified GATA sites in other papers. These should be added to the curation ststus form - GW | ||
+ | Comment: no Features can be made from this paper. | ||
+ | Comment: In table 2 feature referring to: | ||
+ | WBPaper00001523 -> this paper has been already curated for features -> notentered in RT | ||
+ | WBPaper00001864 -> this paper has been already curated for features -> notentered in RT | ||
+ | WBPaper00002234 -> this paper has been already curated for features -> notentered in RT | ||
+ | WBPaper00003700 -> this paper has been already curated for features -> notentered in RT | ||
+ | WBPaper00006024 -> this paper has been already curated for features -> notentered in RT | ||
+ | WBPaper00024977 -> this paper has been already curated for features -> notentered in RT | ||
+ | features sent to RT, | ||
+ | please check WBPaper00003232, britton et al 1998. Found through WBPaper00028802 -table 2 | ||
+ | please check WBPaper00024333 and compare the feature associated with it to the one listed in table 2 of WBPaper00028802. Maybe a new GATA binding site feature should be generated? | ||
+ | please check WBPaper00024976, Fukushige et al. 2005. Found through WBPaper00028802 -table 2 | ||
+ | |||
WBPaper00028915 Daniela | WBPaper00028915 Daniela | ||
+ | Time: 15 mins | ||
+ | Comments: No features in this paper | ||
WBPaper00029140 Daniela | WBPaper00029140 Daniela | ||
− | WBPaper00029255 Daniela | + | Time: 10 minutes |
− | WBPaper00030829 Daniela | + | Comment: conserved intergenic inverted repeats structures in in C.briggsae and C. remanei. Would be good to have a sequence curator to check this one |
− | WBPaper00030933 Daniela | + | |
− | WBPaper00003929 Gary | + | WBPaper00029255 Daniela Interaction: 55 (21 in postgres only) |
− | WBPaper00005044 Gary | + | Time: 15 min |
− | WBPaper00005971 Gary This has already been done by Gary: Feature WBsf718850 | + | New objects: potential feature of NRE, LCS, and LCE |
− | WBPaper00024440 Gary | + | Location: figure 6, body text, page 560 |
+ | Comment: potential cis-regulation objects for xiaodong | ||
+ | |||
+ | WBPaper00030829 Daniela Interaction: 5 | ||
+ | WBPaper00030933 Daniela Interaction: 1 | ||
+ | WBPaper00003929 Gary Interaction: 25 | ||
+ | Time: 4 hours | ||
+ | New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180) | ||
+ | Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376 | ||
+ | Other curator data: "lin-41 and lin-42 are negatively regulated by let-7" | ||
+ | Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field! | ||
+ | Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding. | ||
+ | See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites. | ||
+ | Comments (XW): curated one new cis-regulation object with Gary's WBsfs | ||
+ | |||
+ | WBPaper00005044 Gary Interaction: 20 | ||
+ | Time: 3 hours | ||
+ | New objects: made ire-1 binding site Feature (WBsf977530) | ||
+ | Location of information: body of this paper and WBPaper00005036 | ||
+ | Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes. | ||
+ | See also: WBPaper00005036 | ||
+ | WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850 | ||
+ | WBPaper00024440 Gary Interaction: 1 | ||
+ | Time: 3 hours | ||
+ | New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22 | ||
+ | New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2 | ||
+ | Location of information: body of this paper and Figure Supplemental 3 | ||
+ | Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif." | ||
+ | Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed. | ||
WBPaper00028816 Gary | WBPaper00028816 Gary | ||
− | WBPaper00028986 Gary | + | Time: 30 mins |
+ | New objects: none | ||
+ | Location of information: body of this paper | ||
+ | Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites" | ||
+ | Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength. | ||
+ | WBPaper00028986 Gary Interaction: 16 | ||
WBPaper00029181 Gary | WBPaper00029181 Gary | ||
− | WBPaper00029327 | + | WBPaper00029327 GaryInteraction: 33 |
WBPaper00030849 Gary | WBPaper00030849 Gary | ||
− | WBPaper00031355 Gary | + | WBPaper00031355 Gary Interaction: 5 |
− | WBPaper00004181 Mary Ann | + | WBPaper00004181 Mary Ann Interaction: 8 |
− | WBPaper00005056 Mary Ann | + | No Features. Took 5 mins to read. |
− | WBPaper00006429 Mary Ann | + | WBPaper00005056 Mary Ann |
+ | Time: 35 mins | ||
+ | New objects: 37 - not yet curated. | ||
+ | Location of information: Mainly in fig. 1, but in body text. | ||
+ | Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, | ||
+ | tph-1, M05B5.2, myo-2. Bound by pha-4 | ||
+ | WBPaper00006429 Mary Ann Interaction: 3 | ||
+ | Time: 1/2hr | ||
+ | New objects: 1 | ||
+ | Location of information: In body of text and fig. 3B | ||
+ | Comments: Alludes to GATA binding sites, but no experimental data. | ||
+ | Comments (dr): Added Expr11833 | ||
WBPaper00024505 Mary Ann | WBPaper00024505 Mary Ann | ||
+ | Time: | ||
+ | New objects: | ||
+ | Location of information: | ||
+ | Comments: | ||
WBPaper00028849 Mary Ann | WBPaper00028849 Mary Ann | ||
− | WBPaper00029058 Mary Ann | + | Time: 45 mins |
+ | New objects: 1 | ||
+ | Location of information: In body of text and fig. 3B | ||
+ | Comments(dr): Added Expr11832 | ||
+ | WBPaper00029058 Mary Ann Interaction: 50 | ||
WBPaper00029190 Mary Ann | WBPaper00029190 Mary Ann | ||
− | WBPaper00029406 Mary Ann | + | WBPaper00029406 Mary Ann Interaction: 145 |
− | WBPaper00030877 Mary Ann | + | WBPaper00030877 Mary Ann Interaction: 8 |
WBPaper00031471 Mary Ann | WBPaper00031471 Mary Ann | ||
− | WBPaper00004482 Xiaodong | + | Time: 1hr |
− | WBPaper00005609 Xiaodong This has already been done by | + | New objects: 1 |
+ | Location of information: In body of text and fig. 5f | ||
+ | Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423 | ||
+ | WBPaper00004482 Xiaodong Interaction: 2 | ||
+ | Time: two hours | ||
+ | New objects: request new features for more interaction objects | ||
+ | Location of information: figure 5A, | ||
+ | Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects | ||
+ | Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently | ||
+ | WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792 | ||
+ | Time: 50 mins | ||
+ | New objects: three interaction objects | ||
+ | Location of information: In body of text and fig. 5 | ||
+ | Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant? | ||
WBPaper00024189 Xiaodong | WBPaper00024189 Xiaodong | ||
+ | Time: 10 mins | ||
+ | Comments: no features in this paper | ||
WBPaper00024981 Xiaodong | WBPaper00024981 Xiaodong | ||
− | WBPaper00028910 Xiaodong | + | Time: 40 mins |
+ | New objects: request through RT | ||
+ | Location of information: figure 2 | ||
+ | Comments: MED-1 binding sites in end-1and end-3 promoters | ||
+ | Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT | ||
+ | WBPaper00028910 Xiaodong Interaction: 7 | ||
+ | Time: 30 mins | ||
+ | New objects: request through RT | ||
+ | Location of information: in body text and figure 4 | ||
+ | Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text | ||
+ | |||
WBPaper00029109 Xiaodong | WBPaper00029109 Xiaodong | ||
− | WBPaper00029229 Xiaodong | + | |
+ | WBPaper00029229 Xiaodong Interaction: 3 | ||
WBPaper00030809 Xiaodong | WBPaper00030809 Xiaodong | ||
− | WBPaper00030931 Xiaodong | + | WBPaper00030931 Xiaodong Interaction: 18 |
− | WBPaper00031565 Xiaodong | + | WBPaper00031565 Xiaodong Interaction: 1 |
+ | Time: 30 mins | ||
+ | New objects: request through RT | ||
+ | Location of information: in body text and figure 2 and figure 3 | ||
+ | Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2 | ||
+ | |||
</pre> | </pre> | ||
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation. | * I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation. | ||
+ | |||
+ | '''Useful documents:''' | ||
+ | *meeting notes: https://docs.google.com/document/d/1gkxZjGyyxvPF6qwg6bBzntYCAPaCjLHCKoN0fqGQ48Y/edit | ||
+ | *github tickets | ||
+ | **create feature summary page:https://github.com/WormBase/website/issues/3161 | ||
+ | **create Feature widget on Gene summary page:https://github.com/WormBase/website/issues/3162 |
Latest revision as of 12:58, 22 September 2014
Headline text
Topics
- Display of Sequence Features on the website
- See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365
- And Transcription factors and Gene_product binds.
- in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cis-regulatory-element?
- add a sequence feature table in sequence widget to display all features related to the gene
- in cytoscape display, sequence feature as an entity node
- Stream out working flow
- How can we automatically identify Sequence Feature papers.
- How do papers come in now? SVM? Textpresso string matches?
- Improving data flow
- How can we make the data immediately available to all curators?
- Add Paper and Public_name fields to the Features in the Nameserver?
- geneace is available for download updated every day - a copy is taken by Caltech.
- still need WBsf object details in order to relate/curate regulation/expression/construct objects
- How can we make the data immediately available to all curators?
- Sample papers curation
- Prepare a paper list from each person
- assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'
- gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.
- dr - ok sounds good to me Gary
Practical Issues
- Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US
- We arrive in Boston on Saturday evening and fly home on Wednesday night.
- What time shall we meet on Monday morning and where do we go?
- I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.
- Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.
- Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.
- Thanks Xiaodong, I'm bringing my chromebook - Gary
Pre-Jamboree prep
- Suggestions for work
- Curate some (5? 10?) papers from our lists and collate data as follows
- How long did each paper take to curate
- How many new objects did you add to WormBase
- Where in the paper was the information (e.g. supplementary data, figure legends, within text)
- Did you need to contact the author for more information
- Did you come across data for other curators
- gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.
- Curate some (5? 10?) papers from our lists and collate data as follows
The Jamboree 15-17th Sept 2014
- Suggestions for work
- Discuss result from above
- Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.
- browsing capabilities through WormMine?
- Should we put the list of papers in the Caltech curation status form
Matters Arising
- things noted - not necessarily to do with the topic in hand
- There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.
- The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???
- Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries.
- Resolved: WBsf919641 merged into WBsf919607 - GW
- The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.
- Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there.
- We have a merged_into tag structure in Feature. - GW
- Resolved: WBsf019227 merged into WBsf038813 - GW
- Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there.
- WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?
- Fixed by Mary Ann. Changed Method to enhancer.
- WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.
- Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks.
- 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235
- Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run.
- Resolved: all TF_binding_sites with SO_term SO:0000409 changed to SO:0000235 - GW
- 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.
- Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check.
- Resolved: the MEC-3/UNC-86 heterodimer has been added - GW
- how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?
- how to define these methods?
- when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?
- Gary does a quick literature search to see what other work has been done in the sites described.
- along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?
- The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary).
- agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model.
- along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object Name Public_name UNIQUE ?Text Other_name ?Text Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text Mapping_target UNIQUE ?Sequence Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks) DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able // store consensus sequences, e.g. binding site consensus sequence Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness. Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness. History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence Acquires_merge ?Feature XREF Merged_into #Evidence Deprecated Text #Evidence Visible Description ?Text SO_term ?SO_term Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence Defined_by_paper ?Paper XREF Feature #Evidence Defined_by_person ?Person Defined_by_author ?Author Defined_by_analysis ?Analysis Int Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard Associated_with_variation ?Variation XREF Feature #Evidence Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence Associated_with_operon ?Operon XREF Associated_feature #Evidence Associated_with_Interaction ?Interaction XREF Feature_interactor Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence Associated_with_construct ?Construct XREF Sequence_feature Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site Annotation UNIQUE ?LongText // added for data attribution [030220 dl] Confidential_remark ?Text //pad Remark ?Text #Evidence Method UNIQUE ?Method
Duplicated Features
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.
How can we avoid this in future?
- The list of duplicated Features has been moved to the Discussion page.
Papers to curate
- I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.
- Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.
WBPaper00002925 Daniela Interaction: 8 Time: 10 mins New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011 Location of information: - Comments: WBPaper00004568 Daniela Interaction: 2 Time: 25 mins New objects: 3 promoters- pA, pB, and pC Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites. Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects. See RT 418146 as well. potential cis-regulation objects for Xiaodong WBPaper00005842 Daniela Time: 3 hours New objects: mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct. mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct. mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct. mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct. Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material. Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions Comment: potential TF binding sites Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions' WBPaper00024328 Daniela Interaction: 4 Time: 2.5 hours New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already: pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct Detailed construction of plasmids in Materials and Methods Location of information: body text and Figs. 5 and 6 Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss WBPaper00028802 Daniela Time: 5 min Location: table 2 and body text New objects: Comment: potential paper on cis-regulation (no evidence found), WBPmat00000004 is associated with this paper - XW Comment: Table 2 gives a list of experimentally verified GATA sites in other papers. These should be added to the curation ststus form - GW Comment: no Features can be made from this paper. Comment: In table 2 feature referring to: WBPaper00001523 -> this paper has been already curated for features -> notentered in RT WBPaper00001864 -> this paper has been already curated for features -> notentered in RT WBPaper00002234 -> this paper has been already curated for features -> notentered in RT WBPaper00003700 -> this paper has been already curated for features -> notentered in RT WBPaper00006024 -> this paper has been already curated for features -> notentered in RT WBPaper00024977 -> this paper has been already curated for features -> notentered in RT features sent to RT, please check WBPaper00003232, britton et al 1998. Found through WBPaper00028802 -table 2 please check WBPaper00024333 and compare the feature associated with it to the one listed in table 2 of WBPaper00028802. Maybe a new GATA binding site feature should be generated? please check WBPaper00024976, Fukushige et al. 2005. Found through WBPaper00028802 -table 2 WBPaper00028915 Daniela Time: 15 mins Comments: No features in this paper WBPaper00029140 Daniela Time: 10 minutes Comment: conserved intergenic inverted repeats structures in in C.briggsae and C. remanei. Would be good to have a sequence curator to check this one WBPaper00029255 Daniela Interaction: 55 (21 in postgres only) Time: 15 min New objects: potential feature of NRE, LCS, and LCE Location: figure 6, body text, page 560 Comment: potential cis-regulation objects for xiaodong WBPaper00030829 Daniela Interaction: 5 WBPaper00030933 Daniela Interaction: 1 WBPaper00003929 Gary Interaction: 25 Time: 4 hours New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180) Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376 Other curator data: "lin-41 and lin-42 are negatively regulated by let-7" Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field! Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding. See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites. Comments (XW): curated one new cis-regulation object with Gary's WBsfs WBPaper00005044 Gary Interaction: 20 Time: 3 hours New objects: made ire-1 binding site Feature (WBsf977530) Location of information: body of this paper and WBPaper00005036 Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes. See also: WBPaper00005036 WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850 WBPaper00024440 Gary Interaction: 1 Time: 3 hours New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22 New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2 Location of information: body of this paper and Figure Supplemental 3 Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif." Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed. WBPaper00028816 Gary Time: 30 mins New objects: none Location of information: body of this paper Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites" Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength. WBPaper00028986 Gary Interaction: 16 WBPaper00029181 Gary WBPaper00029327 GaryInteraction: 33 WBPaper00030849 Gary WBPaper00031355 Gary Interaction: 5 WBPaper00004181 Mary Ann Interaction: 8 No Features. Took 5 mins to read. WBPaper00005056 Mary Ann Time: 35 mins New objects: 37 - not yet curated. Location of information: Mainly in fig. 1, but in body text. Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, tph-1, M05B5.2, myo-2. Bound by pha-4 WBPaper00006429 Mary Ann Interaction: 3 Time: 1/2hr New objects: 1 Location of information: In body of text and fig. 3B Comments: Alludes to GATA binding sites, but no experimental data. Comments (dr): Added Expr11833 WBPaper00024505 Mary Ann Time: New objects: Location of information: Comments: WBPaper00028849 Mary Ann Time: 45 mins New objects: 1 Location of information: In body of text and fig. 3B Comments(dr): Added Expr11832 WBPaper00029058 Mary Ann Interaction: 50 WBPaper00029190 Mary Ann WBPaper00029406 Mary Ann Interaction: 145 WBPaper00030877 Mary Ann Interaction: 8 WBPaper00031471 Mary Ann Time: 1hr New objects: 1 Location of information: In body of text and fig. 5f Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423 WBPaper00004482 Xiaodong Interaction: 2 Time: two hours New objects: request new features for more interaction objects Location of information: figure 5A, Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792 Time: 50 mins New objects: three interaction objects Location of information: In body of text and fig. 5 Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant? WBPaper00024189 Xiaodong Time: 10 mins Comments: no features in this paper WBPaper00024981 Xiaodong Time: 40 mins New objects: request through RT Location of information: figure 2 Comments: MED-1 binding sites in end-1and end-3 promoters Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT WBPaper00028910 Xiaodong Interaction: 7 Time: 30 mins New objects: request through RT Location of information: in body text and figure 4 Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text WBPaper00029109 Xiaodong WBPaper00029229 Xiaodong Interaction: 3 WBPaper00030809 Xiaodong WBPaper00030931 Xiaodong Interaction: 18 WBPaper00031565 Xiaodong Interaction: 1 Time: 30 mins New objects: request through RT Location of information: in body text and figure 2 and figure 3 Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2
- I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.
Useful documents:
- meeting notes: https://docs.google.com/document/d/1gkxZjGyyxvPF6qwg6bBzntYCAPaCjLHCKoN0fqGQ48Y/edit
- github tickets
- create feature summary page:https://github.com/WormBase/website/issues/3161
- create Feature widget on Gene summary page:https://github.com/WormBase/website/issues/3162