https://wiki.wormbase.org/api.php?action=feedcontributions&user=Xdwang&feedformat=atomWormBaseWiki - User contributions [en]2024-03-28T10:00:54ZUser contributionsMediaWiki 1.33.0https://wiki.wormbase.org/index.php?title=Antibody&diff=29079Antibody2016-03-22T19:13:44Z<p>Xdwang: /* Notes */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
<br />
===Finding antibody papers===<br />
<br />
-Antibody papers are flagged using text string match method (although SVM also flag antibody, I don't use this method for curation purpose).<br />
<br />
1. go to curation status form: http://tazendra.caltech.edu/~postgres/cgi-bin/curation_status.cgi<br />
<br />
2. choose "Curation Statistics Options Page"<br />
<br />
3. check "antibody" and "str flags cur_strdata" <br />
<br />
4. the number following " '''STR positive not validated'''" are the paper numbers need to be curated.<br />
<br />
5. click on the number will give a list of papers need to be curated<br />
<br />
<br />
<strike>1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]</strike><br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]::anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]::anti-EPG-2''')<br />
<br />
2. Original_publication where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase). Add the paper in reference as well.<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
7. If it is a new antibody, it usually belongs to the lab where corresponding author is.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/use_package.pl<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"<br />
<br />
'''Disease with antibody reagent as 03/22/2016'''<br />
<br />
WBPaper00004603 OSM-5 polycystic kidney disease ( DOID:898 ) [cgc4603]::anti-OSM-5<br />
<br />
WBPaper00004614 DYB-1 Duchenne muscular dystrophy ( DOID:11723 ) [cgc4614]::anti-DYB-1<br />
<br />
WBPaper00005324 HLH-8 Saethre-Chotzen syndrome ( DOID:14768 ) [WBPaper00005324]::anti-HLH-8<br />
<br />
WBPaper00005585 PQE-1 Huntington's disease ( DOID:12858 ) [cgc5585]::anti-PQE-1_a, b, c<br />
<br />
WBPaper00005722 TOR-2 dystonia ( DOID:543 ) [cgc5722]::anti-TOR-2<br />
<br />
WBPaper00006152 TOR-2 dystonia ( DOID:543 ) [WBPaper00006152]::anti-TOR-2<br />
<br />
WBPaper00024206 WRN-1 Werner syndrome ( DOID:5688 ) [WBPaper00024206]::anti-WRN-1<br />
<br />
WBPaper00024364 ATX-2 autosomal dominant cerebellar ataxia ( DOID:1441 ) [cgc6898]::anti-ATX-2_a, [cgc6898]::anti-ATX-2_b<br />
<br />
WBPaper00025052 ALR-1 non-syndromic X-linked intellectual disability ( DOID:0050776 ) [WBPaper00025052]::anti-ALR-1<br />
<br />
WBPaper00025083 TOR-2 Parkinson's disease ( DOID:14330 ) [cgc5722]::anti-TOR-2<br />
<br />
WBPaper00029200 LIS-1 lissencephaly ( DOID:0050453 ) [WBPaper00029200]::anti-LIS-1<br />
<br />
WBPaper00029440 SPAS hereditary spastic paraplegia ( DOID:2476 ) [WBPaper00029440]::anti-SPAS-1<br />
<br />
WBPaper00030773 SUT-1 tauopathy ( DOID:680 ) [WBPaper00030773]::anti-SUT-1<br />
<br />
WBPaper00031344 DYC-1 Duchenne muscular dystrophy ( DOID:11723 ) [WBPaper00031344]::anti-DYC-1<br />
<br />
WBPaper00031430 LAD-2 hereditary spastic paraplegia ( DOID:2476 ) [WBPaper00031430]::anti-LAD-2_a, [WBPaper00031430]::anti-LAD-2_b<br />
<br />
WBPaper00032363 ELP-1 Duchenne muscular dystrophy ( DOID:11723 ) [WBPaper00032363]::anti-ELP-1<br />
<br />
WBPaper00032910 ASPM-1 microcephaly ( DOID:10907 ) [WBPaper00032910]::anti-ASPM-1<br />
<br />
WBPaper00032968 ALP-1 myopathy ( DOID:423 ) , cardiomyopathy ( DOID:0050700 ) [WBPaper00032968]::anti-ALP-1_a, [WBPaper00032968]::anti-ALP-1_a<br />
<br />
WBPaper00032979 SUT-2 tauopathy ( DOID:680 ) b<br />
<br />
WBPaper00035062 HYLS-1 hydrolethalus syndrome ( DOID:0050779 ) [WBPaper00035062]::anti-HYLS-1<br />
<br />
WBPaper00035587 WRN-1 Werner syndrome ( DOID:5688 ) [WBPaper00035587]::anti-WRN-1_b, [WBPaper00035587]::anti-WRN-1_a<br />
<br />
WBPaper00036018 FRG-1 facioscapulohumeral muscular dystrophy ( DOID:11727 ) [WBPaper00036018]::anti-FRG-1<br />
<br />
WBPaper00040518 LEM-2 Emery-Dreifuss muscular dystrophy ( DOID:11726 ) [cgc4339]::anti-LEM-2_b<br />
<br />
WBPaper00040684 WRN-1 Werner syndrome ( DOID:5688 ) [WBPaper00040684]::anti-WRN-1<br />
<br />
WBPaper00041456 GLO-2 Hermansky-Pudlak syndrome ( DOID:3753 ) [WBPaper00041456]::anti-GLO-2.a, [WBPaper00041456]::anti-GLO-2.b<br />
<br />
WBPaper00041513 ZYG-8 dyslexia ( DOID:4428 ) , intellectual disability ( DOID:1059 ), lissencephaly ( DOID:0050453 ) [WBPaper00041513]::anti-ZYG-8<br />
<br />
WBPaper00042056 ZYX-1 Duchenne muscular dystrophy ( DOID:11723 ) [WBPaper00042056]::anti-ZYX-1.a,[WBPaper00042056]::anti-ZYX-1..b, [WBPaper00042056]::anti-ZYX-1.c<br />
<br />
WBPaper00042360 DNJ-27 Alzheimer's disease ( DOID:10652 ) ,Huntington's disease ( DOID:12858 ) ,Parkinson's disease ( DOID:14330 ) ,neurodegenerative disease ( DOID:1289 ) [WBPaper00042360]::anti-DNJ-27<br />
<br />
WBPaper00045518 HIM-6 Bloom syndrome ( DOID:2717 ) [WBPaper00045518]::anti-HIM-6<br />
<br />
WBPaper00046032 SAS-1 orofaciodigital syndrome ( DOID:4501 ) [WBPaper00046032]::anti-SAS-1</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=29073Antibody2016-03-22T13:20:41Z<p>Xdwang: /* Antibody dumper */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
<br />
===Finding antibody papers===<br />
<br />
-Antibody papers are flagged using text string match method (although SVM also flag antibody, I don't use this method for curation purpose).<br />
<br />
1. go to curation status form: http://tazendra.caltech.edu/~postgres/cgi-bin/curation_status.cgi<br />
<br />
2. choose "Curation Statistics Options Page"<br />
<br />
3. check "antibody" and "str flags cur_strdata" <br />
<br />
4. the number following " '''STR positive not validated'''" are the paper numbers need to be curated.<br />
<br />
5. click on the number will give a list of papers need to be curated<br />
<br />
<br />
<strike>1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]</strike><br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]::anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]::anti-EPG-2''')<br />
<br />
2. Original_publication where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase). Add the paper in reference as well.<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
7. If it is a new antibody, it usually belongs to the lab where corresponding author is.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/use_package.pl<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=29072Antibody2016-03-22T13:19:43Z<p>Xdwang: /* Antibody curation guideline */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
<br />
===Finding antibody papers===<br />
<br />
-Antibody papers are flagged using text string match method (although SVM also flag antibody, I don't use this method for curation purpose).<br />
<br />
1. go to curation status form: http://tazendra.caltech.edu/~postgres/cgi-bin/curation_status.cgi<br />
<br />
2. choose "Curation Statistics Options Page"<br />
<br />
3. check "antibody" and "str flags cur_strdata" <br />
<br />
4. the number following " '''STR positive not validated'''" are the paper numbers need to be curated.<br />
<br />
5. click on the number will give a list of papers need to be curated<br />
<br />
<br />
<strike>1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]</strike><br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]::anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]::anti-EPG-2''')<br />
<br />
2. Original_publication where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase). Add the paper in reference as well.<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
7. If it is a new antibody, it usually belongs to the lab where corresponding author is.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=29071Antibody2016-03-22T13:17:04Z<p>Xdwang: /* Antibody curation guideline */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
<br />
===Finding antibody papers===<br />
<br />
-Antibody papers are flagged using text string match method (although SVM also flag antibody, I don't use this method for curation purpose).<br />
<br />
1. go to curation status form: http://tazendra.caltech.edu/~postgres/cgi-bin/curation_status.cgi<br />
<br />
2. choose "Curation Statistics Options Page"<br />
<br />
3. check "antibody" and "str flags cur_strdata" <br />
<br />
4. the number following " '''STR positive not validated'''" are the paper numbers need to be curated.<br />
<br />
5. click on the number will give a list of papers need to be curated<br />
<br />
<br />
<strike>1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]</strike><br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]::anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]::anti-EPG-2''')<br />
<br />
2. Original_publication where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase). Add the paper in reference as well.<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=29070Antibody2016-03-22T13:14:43Z<p>Xdwang: /* Antibody curation */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
<br />
===Finding antibody papers===<br />
<br />
-Antibody papers are flagged using text string match method (although SVM also flag antibody, I don't use this method for curation purpose).<br />
<br />
1. go to curation status form: http://tazendra.caltech.edu/~postgres/cgi-bin/curation_status.cgi<br />
<br />
2. choose "Curation Statistics Options Page"<br />
<br />
3. check "antibody" and "str flags cur_strdata" <br />
<br />
4. the number following " '''STR positive not validated'''" are the paper numbers need to be curated.<br />
<br />
5. click on the number will give a list of papers need to be curated<br />
<br />
<br />
<strike>1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]</strike><br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]::anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]::anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase). Add the paper in reference as well.<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=29069Antibody2016-03-22T13:13:27Z<p>Xdwang: /* Antibody curation */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
=== Antibody curation===<br />
<br />
'''Finding antibody papers:'''<br />
Antibody papers are flagged using text string match method (although SVM also flag antibody, I don't use this method for curation purpose).<br />
<br />
1. go to curation status form: http://tazendra.caltech.edu/~postgres/cgi-bin/curation_status.cgi<br />
<br />
2. choose "Curation Statistics Options Page"<br />
<br />
3. check "antibody" and "str flags cur_strdata" <br />
<br />
4. the number following " '''STR positive not validated'''" are the paper numbers need to be curated.<br />
<br />
5. click on the number will give a list of papers need to be curated<br />
<br />
<br />
<strike>1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]</strike><br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]::anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]::anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase). Add the paper in reference as well.<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=29068Antibody2016-03-22T12:59:01Z<p>Xdwang: /* Antibody curation */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
=== Antibody curation===<br />
<strike>1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]</strike><br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]::anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]::anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase). Add the paper in reference as well.<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=29067Antibody2016-03-22T12:54:15Z<p>Xdwang: /* Antibody curation guideline */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
=== Antibody curation===<br />
1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]<br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]::anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]::anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase). Add the paper in reference as well.<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=27810Antibody2015-09-22T20:03:03Z<p>Xdwang: /* Notes */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
=== Antibody curation===<br />
1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]<br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]:anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]:anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase.)<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=27809Antibody2015-09-22T19:01:51Z<p>Xdwang: /* Notes */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
=== Antibody curation===<br />
1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]<br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]:anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]:anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase.)<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
**the DO_term obo is to be used for this field. The source is https://diseaseontology.svn.sourceforge.net/svnroot/diseaseontology/trunk/HumanDO.obo<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=27808Antibody2015-09-22T18:58:15Z<p>Xdwang: /* Notes */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
=== Antibody curation===<br />
1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]<br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]:anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]:anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase.)<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:<br />
* 'Antibody for disease' Multiontology on DOIDs<br />
* 'Disease paper' Multiontology (or single ?) on WBPaper<br />
<br />
-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=27807Antibody2015-09-21T20:08:34Z<p>Xdwang: /* Notes */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
=== Antibody curation===<br />
1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]<br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]:anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]:anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase.)<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day<br />
<br />
'''Dumper change for disease reagent:-09/21/2015'''<br />
<br />
-antibody model change refer to above<br />
<br />
-we want to add a new tab called 'Disease' for antibody OA, add two new fields, 'Antibody for disease' and 'Disease paper' in this tab.<br />
*when dump, dump these two fields in one line, as, Antibody_for_disease "DO_term" WBpaper00012345</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=27610Antibody2015-08-31T18:13:49Z<p>Xdwang: </p>
<hr />
<div>back to [[Caltech documentation]]<br />
==Antibody Model==<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
?Antibody Summary UNIQUE ?Text #Evidence<br />
Other_name Text<br />
Gene ?Gene XREF Antibody #Evidence<br />
Isolation Original_publication UNIQUE ?Paper<br />
No_original_reference Text //proposed by Wen<br />
Person ?Person<br />
Location ?Laboratory<br />
Clonality UNIQUE Polyclonal Text<br />
Monoclonal Text<br />
Antigen UNIQUE Peptide Text<br />
Protein Text<br />
Other_antigen Text<br />
Animal UNIQUE Rabbit<br />
Mouse<br />
Rat<br />
Guinea_pig<br />
Chicken<br />
Goat<br />
Other_animal Text<br />
Historical_gene ?Gene Text <br />
Possible_pseudonym ?Antibody XREF Possible_pseudonym_of<br />
Possible_pseudonym_of ?Antibody XREF Possible_pseudonym<br />
Expr_pattern ?Expr_pattern XREF Antibody_info<br />
Interactor ?Interaction<br />
Reference ?Paper XREF Antibody<br />
Remark ?Text #Evidence <br />
</pre><br />
<br />
modle change proposed to associate human disease to antibody (08/31/2015):<br />
<br />
<pre style="white-space: pre-wrap;<br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
?Antibody<br />
Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence<br />
<br />
?Construct<br />
Construct_for_disease ?DO_term XREF Associated_construct #Evidence<br />
<br />
?Transgene<br />
Transgene_for_disease ?DO_term XREF Associated_transgene #Evidence<br />
<br />
<br />
?DO_term<br />
Reagent_info Associated_antibody ?Antibody XREF Antibody_for_disease <br />
Associated_construct ?Construct XREF Construct_for_disease <br />
Associated_transgene ?Transgene XREF Transgene_for_disease<br />
</pre><br />
<br />
== Antibody curation SOPs ==<br />
<br />
=== Antibody curation===<br />
1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]<br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]:anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]:anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase.)<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day</div>Xdwanghttps://wiki.wormbase.org/index.php?title=WormBase-Caltech_Weekly_Calls&diff=27452WormBase-Caltech Weekly Calls2015-08-06T16:53:11Z<p>Xdwang: /* July 30, 2015 */</p>
<hr />
<div>= Previous Years =<br />
<br />
[[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_2011|2011 Meetings]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_2013|2013 Meetings]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_2014|2014 Meetings]]<br />
<br />
<br />
= 2015 Meetings =<br />
<br />
[[WormBase-Caltech_Weekly_Calls_January_2015|January]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_February_2015|February]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_March_2015|March]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_April_2015|April]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_May_2015|May]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_June_2015|June]]<br />
<br />
<br />
== July 2, 2015 ==<br />
<br />
=== Discussion Topics from IWM ===<br />
* Explaining job posting options via the forum in new Worm Breeder's Gazette article<br />
* Display of CRISPR data<br />
** Alleles with multiple lesions (one name, many mutations), need to be curated and mapped<br />
* Ontology term enrichment analysis, using ontologies other than gene ontology<br />
** Discussed on GO call yesterday; we can/should follow up with Paul Thomas<br />
** Would be good to have a single central tool/resource for enrichment analysis<br />
** PantherDB vs DAVID<br />
<br />
=== WormBook Chapters ===<br />
* Paul S will review over next week and will provide feedback<br />
<br />
=== Outreach ===<br />
* Sending out e-mails to all labs/PIs reminding about new data forms<br />
* Could also do more personalized outreach to a smaller subset of PI's/labs<br />
** Could focus on PIs not at the IWM<br />
<br />
=== Anatomy ===<br />
* Embryonic development, cell division timing<br />
** Sulston timing<br />
** Waterson datasets<br />
** Zhao cell lineage timing datasets<br />
** Bao lab?<br />
* New EM reconstructions from David Hall, Scott Emmons, etc.<br />
* Neuronal connectivity, collaborative database with Scott Emmons and colleagues<br />
<br />
=== Citace upload ===<br />
* Curators submit data to Wen on Tuesday, July 28th<br />
<br />
=== Taking over Gene Orienteer ===<br />
* Xiaodong and Sibyl working on<br />
<br />
=== RNASeq data ===<br />
* Gary Williams only using high quality data, taking care of all curation (including meta data)<br />
* Public archive of rejected datasets?<br />
<br />
=== WormGuides ===<br />
* Bill Mohler et al working on desktop application<br />
<br />
<br />
== July 9, 2015 ==<br />
<br />
=== Expression Pattern ===<br />
* Certain/uncertain qualifiers not annotated before some date<br />
* ~3,000 ?Expr_pattern objects without that annotation/tag<br />
* Daniela work on bringing up to date, hopefully won't take long<br />
<br />
=== Expression Clusters to Anatomy & Life Stage annotations ===<br />
* Many large scale datasets with tissue-specific expression data<br />
* Much of what is in SPELL is not annotated to ?Anatomy or ?Life_stage terms<br />
* A goal: make expression data queryable via ?Anatomy terms/pages<br />
* Wen will make the model change proposal<br />
* We may not want to show explicitly in widget<br />
* There is a need for a condensed display of expression data (per gene)<br />
* Some datasets, like the EPIC data, explicitly mention each embryonic cell name<br />
* Need for a condensed ontology browser per gene/anatomy and gene/life stage<br />
<br />
=== Proteomic analysis ===<br />
* Encyclopedia of Proteomic Dynamics, contacted Wen to share data<br />
* Wen will meet/discuss with group soon to determine what the goals are<br />
* It isn't clear what format the data has<br />
* Should include Gary Williams on discussions as he already processes Mass Spectrometry data<br />
<br />
=== External Databases ===<br />
* To what extent can we take care of the data and display of other lab/publication databases<br />
* Many authors want to share and make links to their database/website via WormBase<br />
* What is the best way to handle large scale dataset sharing requests that don't necessarily (for the time being) fit our data model<br />
* We can take advantage of the "External Links" display on WBPaper pages to link out the the external databases affiliated with the paper, including a link to our FTP site with shared data files, maybe?<br />
* At least a stop gap measure until we can properly model the data<br />
<br />
=== Cis-regulatory site nomenclature? ===<br />
* Barbara Meyer's lab published many "rex" (Recruitment Elements on X) sites, numbered sequentially<br />
* Tim Schedl wondering about others' thoughts/opinions on how to, possibly, standardize the names of cis-regulatory elements<br />
* Could be like gene names, without dash, e.g. "rex1", "rex2"<br />
* We may want to try "WBsf-" prefix, on all element names like "WBsf-rex1", although may be only used in-house<br />
<br />
=== Phenotypes ===<br />
* Were there any conclusions about phenotype lookup from the Allele-Phenotype form?<br />
* Chris spoke with Harald Hutter and others at the meeting about how to improve the lookup for phenotypes<br />
* Would be good to provide an explicit option to see phenotypes of related (or allele-affiliated) genes, perhaps by shared GO-term annotation<br />
* Need to think more on how to best compress display of phenotypes on gene pages as well<br />
* We do already provide links to the Variation and Gene pages (with Phenotypes displayed) in the term information box of the form<br />
<br />
<br />
== July 16, 2015 ==<br />
<br />
=== Anatomy term page expression ===<br />
* Raymond and Juancarlos are working on a display of genes that may be exclusively expressed in that anatomy object<br />
<br />
=== Construct/Transgene curation ===<br />
* Karen trying to make the curation of constructs & transgenes easier<br />
* May consider merging the transgene and construct OA's<br />
* Possibly add a construct/transgene request functionality in other OA's<br />
** Would those need multiple input fields?<br />
** Karen would take care of the details<br />
<br />
=== Molecule model ===<br />
* Exogenous/endogenous tags issue<br />
* Scraping data from external chemical databases versus adding biologically relevant data from papers<br />
* We pull data from, e.g. CHEBI, but not all molecules fall under their purview, e.g. proteins<br />
<br />
=== Micropublication ===<br />
* Promotion and outreach<br />
* Micropublications discoverable in PubMed?<br />
* Publisher = WormBase? Caltech?<br />
* Minimal standards for publication?<br />
<br />
<br />
== July 23, 2015 ==<br />
<br />
=== Worm model for autism ===<br />
* Would want to take human variations implicated in autism; look for orthologous genes in C. elegans/nematodes and find/make synonymous mutations<br />
* Prioritize based on worm phenotypes<br />
* Generally applies to human disease variants<br />
<br />
=== Database Migration ===<br />
* Thomas Down leaving WormBase in September<br />
* Moving ahead with Datomic<br />
* Good starting use-case for Datomic is querying Datomic-version of GeneACE<br />
* Need to make sure documentation for migration to Datomic is available and comprehensible<br />
* Point-people at each site: Sibyl @ OICR, Juancarlos @ Caltech<br />
* Now need to work out the mechanics of curating into Datomic<br />
<br />
=== WormBase ParaSite ===<br />
* Reciprocal searches (WB <-> PS) are working well<br />
<br />
=== Microarray datasets & modSeek ===<br />
* Some earlier datasets were re-processed (log-transformed, or re-annotated into original replicates instead of averaged results)<br />
* Need to try out different methods of processing raw-data (WB usually only takes in processed data)<br />
* One pipeline can feed data into SPELL and modSeek<br />
* It's difficult to establish/determine gold standards for assessing process performance<br />
<br />
=== WormBook chapter reviewers ===<br />
* Send reviewer suggestions to Paul ASAP<br />
<br />
=== C. elegans proteome in UniProt ===<br />
* Not a complete correspondence between WormBase and UniProt<br />
* Cases: UniProt has entry for a protein that differs by one or two amino acids from WormBase<br />
** Made from translations of what cDNAs etc. have been submitted<br />
** Partial data, e.g. partial cDNAs translated<br />
* Anything we can do to achieve greater consistency?<br />
* Protein data sets are important<br />
* Hinxton can use disrepancies as a flag to check on the gene/protein models<br />
* Would be good to have more reciprocal linkage between UniProt and WormBase<br />
* AVR-15, UniProt have two additional entries compared to Wormbase, differing in only 1 or 2 amino acids<br />
* Should we pick up different entries from UniProt and store/display the data; how to reconcile?<br />
* Possible use case: enter a UniProt ID into the BLAST/BLAT tool to identify WormBase matches<br />
<br />
=== Gene Orienteer Data ===<br />
* Sibyl and Xiaodong looking at data and scripts from Gene Orienteer<br />
<br />
=== Precanned queries for exclusive expression ===<br />
* Raymond & Juancarlos working on final details<br />
* Intent is to display genes that may be specifically/exclusively expressed in e.g. an anatomy term<br />
<br />
=== Embryonic developmental timing ===<br />
* Sulston, Murray timing data sets for wild type embryonic cell division timing<br />
* Mutant data sets are coming in as well<br />
<br />
=== Genetic Interaction Ontology (GIO) ===<br />
* Latest version of the GIO complete<br />
* Juancarlos and Chris built a "genetic interaction calculator" to determine interaction types from quantitative phenotype inequalities<br />
** http://mangolassi.caltech.edu/~azurebrd/cgi-bin/forms/gi_calculator.cgi<br />
* Sending out to other MODs, etc.<br />
* Seems that although there is buy in conceptually, most curators can't afford the time for such detailed curation<br />
<br />
=== Phenotype (ontology) display ===<br />
* Problems with display of phenotypes (and other annotations) on WormBase, as pointed out by several people at the IWM<br />
* Karen would like to start creating allele concise descriptions<br />
* We need compact, intelligently ordered annotation lists, not just alphabetical lists of ontology annotations<br />
* It would be good to show ancestors for relatedness and order<br />
* Chris working on Python script to display all annotations in the context of the entire ontology<br />
* We will need to see if this approach is feasible/beneficial<br />
<br />
=== PATO-style EQ (Entitiy-Quality) phenotype annotations ===<br />
* It is clear that some phenotype annotations require details, e.g. "drug sensitivity" annotations should have the drug involved<br />
** This drug/molecule annotation should be present in the details if not directly in the term itself<br />
* Raises the issue of a number of cases where we need PATO-style EQ annotations, not just explicit phenotype terms for all possible scenarios<br />
* This would be helpful in annotating embryonic timing and identity phenotype datasets<br />
<br />
<br />
== July 30, 2015 ==<br />
<br />
=== Wen Chen helped Wen Chen ===<br />
* Wen Chen (lab) has list of genes to analyze<br />
* Wen Chen (WB) helped process the list<br />
* Would be good to have a simple CGI to process a list of genes in a variety of ways<br />
** http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/fraqmine.cgi<br />
** For GeneTissueLifeStage and GeneConciseDescription more datatypes easily slotted in if curator makes a file<br />
* Is this redundant with WormMine? <br />
** Not for data that doesn't exist (in WormMine) yet; more agile: could be up and running within a matter of days<br />
<br />
=== Interconnections between WormBase and FlyBase ===<br />
* We could create more inter-connectivity between the two databases<br />
* Sharing concise descriptions of genes<br />
* Would be good for FlyBase and WB curators (Xiaodong?) to talk about where the links should exist at each site<br />
<br />
== August 6, 2015 ==<br />
<br />
=== WormMine ===<br />
*prioritize new data types into WormMine<br />
**RNAi phenotype, interactions, human disease...<br />
**WormMine wiki page: http://wiki.wormbase.org/index.php/WormMine<br />
<br />
=== WormMart machine ===<br />
*Wen wants to use the machine when WormMart retires<br />
=== UniProt/wormbase gene class ===<br />
*need to talk to UniProt C.elegans curator<br />
<br />
=== Raymond, Chris and Juancarlos are working on phenotype viewer ===<br />
<br />
=== James: list of genes, enrich in what tissues ===<br />
*python code <br />
*biotype ontology, tissue expression from postgres as input</div>Xdwanghttps://wiki.wormbase.org/index.php?title=WBConfCall_2015.06.04-Agenda_and_Minutes&diff=26851WBConfCall 2015.06.04-Agenda and Minutes2015-06-04T17:46:47Z<p>Xdwang: /* Minutes */</p>
<hr />
<div>=Agenda=<br />
please submit your agenda items here<br />
<br />
== Community Annotation Forms ==<br />
* Mary Ann and Sibyl have setup the community curation portal page on staging: http://staging.wormbase.org/about/userguide/submit_data#0--10<br />
* We want to make sure that when WS248 goes live (this weekend?) that the links to the forms are appropriately updated<br />
* Question for webteam: It looks like these form links will be editable at any time by curators, is that correct?<br />
** If so, then we can update the links at our leisure<br />
** If not, and the links need to be updated before WS248, we need to make sure the links point to Tazendra, not Mangolassi<br />
* I will be making my poster and preparing the workshop talk next week so the sooner we finalise what we intend to promote (to a greater or lesser degree) the better. (mary ann)<br />
<br />
== Tempates for IWM ==<br />
* Do we have a template for posters and talks for the meeting? (mary ann)<br />
<br />
== GO term enrichment (TE) tool for WormBase == <br />
* I've done some groundwork on how we could implement the Panther GO TE tool for both C. elegans and the other core species. (Jane Lomax)<br />
<br />
== Models for WS250 ==<br />
* Proposed change to ?GO_term model for WS250<br />
#Remove the Term tag. The name of the GO_term would be populated in the Name tag. This change will have downstream effects for the build and web display.<br />
#Add an Alt_id tag to populate secondary IDs as listed in the GO obo file. This would allow anyone searching WB with a secondary, or alternate, GO ID to still arrive at the correct GO_term page.<br />
#Although this doesn't affect any tags in the model, starting with WS250 we will populate the Version tag with the information in the 'data-version:' tag in the obo file, not the 'remark' tag that holds the SVN revision number. All terms have the Version tag populated, so we should remove the comment in the ?GO_term model about version only being stored on the 3 parent terms.<br />
<br />
* Proposed WS250 models dates (from Paul D.'s 13/05/15 message)<br />
**12/06/15 - Models Deadline<br />
**19/06/15 - Models CVS<br />
**23-29/06/15 - IWM<br />
<br />
== Link Outs to Same Site, Different Base URL ==<br />
* See github [https://github.com/WormBase/website/issues/3915 #3915]<br />
* Can we accomodate different base URLs for the same site, e.g. UniProt, RefSeq<br />
<br />
= Minutes =<br />
* '''GO term enrichment (TE) tool for WormBase'''<br />
** database wholesale, easy transition to C. elegans, but we need to be the provider for gene association files, including parasites, core species, ig briggae, no extra work, interpro to go, phenotype to go, not from Tony, gff is generated at the end when wormbase build.<br />
**web interface, use Amigo on GO page, panther GO TE<br />
<br />
<br />
*'''WS248 will go live most likely on Saturday morning'''<br />
<br />
<br />
'''*Community annotation forms'''<br />
**concise description and allele forms are ready to go. <br />
**micro publication form is waiting for more feedback from Oliver Hobert and Ian Hop.<br />
**allele form link has some hard-coded work, Sybyl will help to update the link<br />
** other forms links can be edit by owner curators.<br />
<br />
<br />
*'''IWM preparation'''<br />
**Mary Ann and Scott Cain will present posters<br />
**Daniela will talk micro publication on workshop as well<br />
<br />
<br />
*'''Link Outs to Same Site, Different Base URL'''<br />
**github #3915<br />
** Todd and Sybyl are looking into it</div>Xdwanghttps://wiki.wormbase.org/index.php?title=WBConfCall_2015.06.04-Agenda_and_Minutes&diff=26849WBConfCall 2015.06.04-Agenda and Minutes2015-06-04T17:45:30Z<p>Xdwang: /* Minutes */</p>
<hr />
<div>=Agenda=<br />
please submit your agenda items here<br />
<br />
== Community Annotation Forms ==<br />
* Mary Ann and Sibyl have setup the community curation portal page on staging: http://staging.wormbase.org/about/userguide/submit_data#0--10<br />
* We want to make sure that when WS248 goes live (this weekend?) that the links to the forms are appropriately updated<br />
* Question for webteam: It looks like these form links will be editable at any time by curators, is that correct?<br />
** If so, then we can update the links at our leisure<br />
** If not, and the links need to be updated before WS248, we need to make sure the links point to Tazendra, not Mangolassi<br />
* I will be making my poster and preparing the workshop talk next week so the sooner we finalise what we intend to promote (to a greater or lesser degree) the better. (mary ann)<br />
<br />
== Tempates for IWM ==<br />
* Do we have a template for posters and talks for the meeting? (mary ann)<br />
<br />
== GO term enrichment (TE) tool for WormBase == <br />
* I've done some groundwork on how we could implement the Panther GO TE tool for both C. elegans and the other core species. (Jane Lomax)<br />
<br />
== Models for WS250 ==<br />
* Proposed change to ?GO_term model for WS250<br />
#Remove the Term tag. The name of the GO_term would be populated in the Name tag. This change will have downstream effects for the build and web display.<br />
#Add an Alt_id tag to populate secondary IDs as listed in the GO obo file. This would allow anyone searching WB with a secondary, or alternate, GO ID to still arrive at the correct GO_term page.<br />
#Although this doesn't affect any tags in the model, starting with WS250 we will populate the Version tag with the information in the 'data-version:' tag in the obo file, not the 'remark' tag that holds the SVN revision number. All terms have the Version tag populated, so we should remove the comment in the ?GO_term model about version only being stored on the 3 parent terms.<br />
<br />
* Proposed WS250 models dates (from Paul D.'s 13/05/15 message)<br />
**12/06/15 - Models Deadline<br />
**19/06/15 - Models CVS<br />
**23-29/06/15 - IWM<br />
<br />
== Link Outs to Same Site, Different Base URL ==<br />
* See github [https://github.com/WormBase/website/issues/3915 #3915]<br />
* Can we accomodate different base URLs for the same site, e.g. UniProt, RefSeq<br />
<br />
= Minutes =<br />
* '''GO term enrichment (TE) tool for WormBase'''<br />
** database wholesale, easy transition to C. elegans, but we need to be the provider for gene association files, including parasites, core species, ig briggae, no extra work, interpro to go, phenotype to go, not from Tony, gff generate at the end when wormbase build, interpro to go for other species<br />
**web interface, use Amigo on GO page, panther GO TE<br />
<br />
<br />
*'''WS248 will go live most likely on Saturday morning'''<br />
<br />
<br />
'''*Community annotation forms'''<br />
**concise description and allele forms are ready to go. <br />
**micro publication form is waiting for more feedback from Oliver Hobert and Ian Hop.<br />
**allele form link has some hard-coded work, Sybyl will help to update the link<br />
** other forms links can be edit by owner curators.<br />
<br />
<br />
*'''IWM preparation'''<br />
**Mary Ann and Scott Cain will present posters<br />
**Daniela will talk micro publication on workshop as well<br />
<br />
<br />
*'''Link Outs to Same Site, Different Base URL'''<br />
**github #3915<br />
** Todd and Sybyl are looking into it</div>Xdwanghttps://wiki.wormbase.org/index.php?title=WBConfCall_2015.06.04-Agenda_and_Minutes&diff=26848WBConfCall 2015.06.04-Agenda and Minutes2015-06-04T17:44:10Z<p>Xdwang: /* Minutes */</p>
<hr />
<div>=Agenda=<br />
please submit your agenda items here<br />
<br />
== Community Annotation Forms ==<br />
* Mary Ann and Sibyl have setup the community curation portal page on staging: http://staging.wormbase.org/about/userguide/submit_data#0--10<br />
* We want to make sure that when WS248 goes live (this weekend?) that the links to the forms are appropriately updated<br />
* Question for webteam: It looks like these form links will be editable at any time by curators, is that correct?<br />
** If so, then we can update the links at our leisure<br />
** If not, and the links need to be updated before WS248, we need to make sure the links point to Tazendra, not Mangolassi<br />
* I will be making my poster and preparing the workshop talk next week so the sooner we finalise what we intend to promote (to a greater or lesser degree) the better. (mary ann)<br />
<br />
== Tempates for IWM ==<br />
* Do we have a template for posters and talks for the meeting? (mary ann)<br />
<br />
== GO term enrichment (TE) tool for WormBase == <br />
* I've done some groundwork on how we could implement the Panther GO TE tool for both C. elegans and the other core species. (Jane Lomax)<br />
<br />
== Models for WS250 ==<br />
* Proposed change to ?GO_term model for WS250<br />
#Remove the Term tag. The name of the GO_term would be populated in the Name tag. This change will have downstream effects for the build and web display.<br />
#Add an Alt_id tag to populate secondary IDs as listed in the GO obo file. This would allow anyone searching WB with a secondary, or alternate, GO ID to still arrive at the correct GO_term page.<br />
#Although this doesn't affect any tags in the model, starting with WS250 we will populate the Version tag with the information in the 'data-version:' tag in the obo file, not the 'remark' tag that holds the SVN revision number. All terms have the Version tag populated, so we should remove the comment in the ?GO_term model about version only being stored on the 3 parent terms.<br />
<br />
* Proposed WS250 models dates (from Paul D.'s 13/05/15 message)<br />
**12/06/15 - Models Deadline<br />
**19/06/15 - Models CVS<br />
**23-29/06/15 - IWM<br />
<br />
== Link Outs to Same Site, Different Base URL ==<br />
* See github [https://github.com/WormBase/website/issues/3915 #3915]<br />
* Can we accomodate different base URLs for the same site, e.g. UniProt, RefSeq<br />
<br />
= Minutes =<br />
* GO term enrichment (TE) tool for WormBase<br />
** database wholesale, easy transition to C. elegans, but we need to be the provider for gene association files, including parasites, core species, ig briggae, no extra work, interpro to go, phenotype to go, not from Tony, gff generate at the end when wormbase build, interpro to go for other species<br />
**web interface, use Amigo on GO page, panther GO TE<br />
<br />
<br />
*WS248 will go live most likely on Saturday morning<br />
<br />
<br />
*Community annotation forms<br />
**concise description and allele forms are ready to go. <br />
**micro publication form is waiting for more feedback from Oliver Hobert and Ian Hop.<br />
**allele form link has some hard-coded work, Sybyl will help to update the link<br />
** other forms links can be edit by owner curators.<br />
<br />
<br />
*IWM preparation<br />
**Mary Ann and Scott Cain will present posters<br />
**Daniela will talk micro publication on workshop as well<br />
<br />
<br />
*Link Outs to Same Site, Different Base URL<br />
**github #3915<br />
** Todd and Sybyl are looking into it</div>Xdwanghttps://wiki.wormbase.org/index.php?title=WBConfCall_2015.06.04-Agenda_and_Minutes&diff=26847WBConfCall 2015.06.04-Agenda and Minutes2015-06-04T17:43:36Z<p>Xdwang: /* Minutes */</p>
<hr />
<div>=Agenda=<br />
please submit your agenda items here<br />
<br />
== Community Annotation Forms ==<br />
* Mary Ann and Sibyl have setup the community curation portal page on staging: http://staging.wormbase.org/about/userguide/submit_data#0--10<br />
* We want to make sure that when WS248 goes live (this weekend?) that the links to the forms are appropriately updated<br />
* Question for webteam: It looks like these form links will be editable at any time by curators, is that correct?<br />
** If so, then we can update the links at our leisure<br />
** If not, and the links need to be updated before WS248, we need to make sure the links point to Tazendra, not Mangolassi<br />
* I will be making my poster and preparing the workshop talk next week so the sooner we finalise what we intend to promote (to a greater or lesser degree) the better. (mary ann)<br />
<br />
== Tempates for IWM ==<br />
* Do we have a template for posters and talks for the meeting? (mary ann)<br />
<br />
== GO term enrichment (TE) tool for WormBase == <br />
* I've done some groundwork on how we could implement the Panther GO TE tool for both C. elegans and the other core species. (Jane Lomax)<br />
<br />
== Models for WS250 ==<br />
* Proposed change to ?GO_term model for WS250<br />
#Remove the Term tag. The name of the GO_term would be populated in the Name tag. This change will have downstream effects for the build and web display.<br />
#Add an Alt_id tag to populate secondary IDs as listed in the GO obo file. This would allow anyone searching WB with a secondary, or alternate, GO ID to still arrive at the correct GO_term page.<br />
#Although this doesn't affect any tags in the model, starting with WS250 we will populate the Version tag with the information in the 'data-version:' tag in the obo file, not the 'remark' tag that holds the SVN revision number. All terms have the Version tag populated, so we should remove the comment in the ?GO_term model about version only being stored on the 3 parent terms.<br />
<br />
* Proposed WS250 models dates (from Paul D.'s 13/05/15 message)<br />
**12/06/15 - Models Deadline<br />
**19/06/15 - Models CVS<br />
**23-29/06/15 - IWM<br />
<br />
== Link Outs to Same Site, Different Base URL ==<br />
* See github [https://github.com/WormBase/website/issues/3915 #3915]<br />
* Can we accomodate different base URLs for the same site, e.g. UniProt, RefSeq<br />
<br />
= Minutes =<br />
* GO term enrichment (TE) tool for WormBase<br />
** database wholesale, easy transition to C. elegans, but we need to be the provider for gene association files, including parasites, core species, ig briggae, no extra work, interpro to go, phenotype to go, not from Tony, gff generate at the end when wormbase build, interpro to go for other species<br />
**web interface, use Amigo on GO page, panther GO TE<br />
<br />
<br />
*WS248 will go live most likely on Saturday morning<br />
<br />
<br />
*Community annotation forms<br />
**concise description and allele forms are ready to go. <br />
**micro publication form is waiting for more feedback from Oliver Hobert and Ian Hop.<br />
**allele form link has some hard-coded work, Sybyl will help to update the link<br />
** other form link can be edit by owner curators.<br />
<br />
<br />
*IWM preparation<br />
**Mary Ann and Scott Cain will present posters<br />
**Daniela will talk micro publication on workshop as well<br />
<br />
<br />
*Link Outs to Same Site, Different Base URL<br />
**github #3915<br />
** Todd and Sybyl are looking into it</div>Xdwanghttps://wiki.wormbase.org/index.php?title=WBConfCall_2015.06.04-Agenda_and_Minutes&diff=26846WBConfCall 2015.06.04-Agenda and Minutes2015-06-04T17:40:43Z<p>Xdwang: /* Minutes */</p>
<hr />
<div>=Agenda=<br />
please submit your agenda items here<br />
<br />
== Community Annotation Forms ==<br />
* Mary Ann and Sibyl have setup the community curation portal page on staging: http://staging.wormbase.org/about/userguide/submit_data#0--10<br />
* We want to make sure that when WS248 goes live (this weekend?) that the links to the forms are appropriately updated<br />
* Question for webteam: It looks like these form links will be editable at any time by curators, is that correct?<br />
** If so, then we can update the links at our leisure<br />
** If not, and the links need to be updated before WS248, we need to make sure the links point to Tazendra, not Mangolassi<br />
* I will be making my poster and preparing the workshop talk next week so the sooner we finalise what we intend to promote (to a greater or lesser degree) the better. (mary ann)<br />
<br />
== Tempates for IWM ==<br />
* Do we have a template for posters and talks for the meeting? (mary ann)<br />
<br />
== GO term enrichment (TE) tool for WormBase == <br />
* I've done some groundwork on how we could implement the Panther GO TE tool for both C. elegans and the other core species. (Jane Lomax)<br />
<br />
== Models for WS250 ==<br />
* Proposed change to ?GO_term model for WS250<br />
#Remove the Term tag. The name of the GO_term would be populated in the Name tag. This change will have downstream effects for the build and web display.<br />
#Add an Alt_id tag to populate secondary IDs as listed in the GO obo file. This would allow anyone searching WB with a secondary, or alternate, GO ID to still arrive at the correct GO_term page.<br />
#Although this doesn't affect any tags in the model, starting with WS250 we will populate the Version tag with the information in the 'data-version:' tag in the obo file, not the 'remark' tag that holds the SVN revision number. All terms have the Version tag populated, so we should remove the comment in the ?GO_term model about version only being stored on the 3 parent terms.<br />
<br />
* Proposed WS250 models dates (from Paul D.'s 13/05/15 message)<br />
**12/06/15 - Models Deadline<br />
**19/06/15 - Models CVS<br />
**23-29/06/15 - IWM<br />
<br />
== Link Outs to Same Site, Different Base URL ==<br />
* See github [https://github.com/WormBase/website/issues/3915 #3915]<br />
* Can we accomodate different base URLs for the same site, e.g. UniProt, RefSeq<br />
<br />
= Minutes =<br />
* GO term enrichment (TE) tool for WormBase<br />
** database wholesale, easy transition to C. elegans, but we need to be the provider for gene association files, including parasites, core species, ig briggae, no extra work, interpro to go, phenotype to go, not from Tony, gff generate at the end when wormbase build, interpro to go for other species<br />
**web interface, use Amigo on GO page, panther GO TE<br />
<br />
<br />
*WS248 will go live most likely on Saturday morning<br />
<br />
<br />
*Community annotation forms<br />
**concise description and allele forms are ready to go. <br />
**micro publication form is waiting for more feedback from Oliver Hobert and Ian Hop.<br />
**allele form link has some hard-coded work, Sybyl will help to update the link<br />
** other form link can be edit by owner curators.<br />
<br />
<br />
*IWM preparation<br />
**Mary Ann and Scott Cain will present posters<br />
**Daniela will talk micro publication on workshop as well<br />
<br />
<br />
*Link Outs to Same Site, Different Base URL<br />
**github #3915<br />
** Tod and Sybyl are looking into it</div>Xdwanghttps://wiki.wormbase.org/index.php?title=WBConfCall_2015.06.04-Agenda_and_Minutes&diff=26835WBConfCall 2015.06.04-Agenda and Minutes2015-06-04T14:00:57Z<p>Xdwang: /* Agenda */</p>
<hr />
<div>=Agenda=<br />
please submit your agenda items here<br />
<br />
== Community Annotation Forms ==<br />
* Mary Ann and Sibyl have setup the community curation portal page on staging: http://staging.wormbase.org/about/userguide/submit_data#0--10<br />
* We want to make sure that when WS248 goes live (this weekend?) that the links to the forms are appropriately updated<br />
* Question for webteam: It looks like these form links will be editable at any time by curators, is that correct?<br />
** If so, then we can update the links at our leisure<br />
** If not, and the links need to be updated before WS248, we need to make sure the links point to Tazendra, not Mangolassi<br />
* I will be making my poster and preparing the workshop talk next week so the sooner we finalise what we intend to promote (to a greater or lesser degree) the better. (mary ann)<br />
<br />
== Tempates for IWM ==<br />
* Do we have a template for posters and talks for the meeting? (mary ann)<br />
<br />
== GO term enrichment (TE) tool for WormBase == <br />
* I've done some groundwork on how we could implement the Panther GO TE tool for both C. elegans and the other core species. (Jean Lomax)<br />
<br />
= Minutes =</div>Xdwanghttps://wiki.wormbase.org/index.php?title=WBConfCall_2015.06.04-Agenda_and_Minutes&diff=26818WBConfCall 2015.06.04-Agenda and Minutes2015-06-02T17:12:37Z<p>Xdwang: Created page with "=Agenda= please submit your agenda items here = Minutes ="</p>
<hr />
<div>=Agenda=<br />
please submit your agenda items here<br />
<br />
= Minutes =</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Help_desk_schedule&diff=26817Help desk schedule2015-06-02T17:11:20Z<p>Xdwang: /* 2015 Officers */</p>
<hr />
<div>__TOC__ <br />
<br />
==='''Working Document'''===<br />
<br />
Document to try and help the office in the current github scheme<br />
<br />
[[Github_Helpdesk_Working_Document]] <br />
<br />
==='''Queue'''===<br />
Help Desk is a rotation, everyone serves once, then it loops again keeping the same order.<br />
<br />
Any new people get inserted between the current officer and the end of the queue. <br />
<br />
If anyone needs to be inserted somewhere in the middle of the existing schedule, please keep the dates matching with the rotation order (email Juancarlos if you don't want to line up all the dates yourself).<br />
<br />
The current queue is : Xiaodong Wang, Thomas Down, Michael Paulini, Gary Williams, Daniel Wang, Gary Schindelman, Todd Harris, Ranjana Kishore, James Done, Paul Davis, Chris Grove, Bruce Bolt, Kevin Howe, Daniela Raciti, Yuling Li, Karen Yook, Jane Lomax, Scott Cain, Raymond Lee, Kimberly Van Auken, Sibyl Gao, Wen Chen, Juancarlos Chan, Cecilia Nakamura, Mary Ann Tuli. <br />
<br />
PIs are not on the roster. Michael Mueller is handling all textpresso-help emails.<br />
<br />
=== 2016 Officers ===<br />
<br />
Date refers to first day on duty. <br />
<br />
{| border="1" class="wikitable"<br />
|-<br />
| bgcolor="gray" | Date<br />
| bgcolor="gray" | Curator<br />
| bgcolor="gray" | Agenda/Minutes<br />
|-<br />
|2016-01-04<br />
|Karen Yook<br />
|<br />
|-<br />
|2016-01-18<br />
|Jane Lomax<br />
|<br />
|-<br />
|2016-02-01<br />
|Scott Cain<br />
|<br />
|-<br />
|2016-02-15<br />
|Raymond Lee<br />
|<br />
|-<br />
|2016-02-29<br />
|Kimberly Van Auken<br />
|<br />
|-<br />
|2016-03-14<br />
|Sibyl Gao<br />
|<br />
|-<br />
|2016-03-28<br />
|Wen Chen<br />
|<br />
|-<br />
|2016-04-11<br />
|Juancarlos Chan<br />
|<br />
|-<br />
|2016-04-25<br />
|Cecilia Nakamura<br />
|<br />
|-<br />
|2016-05-09<br />
|Mary Ann Tuli<br />
|<br />
|-<br />
|2016-05-23<br />
|Xiaodong Wang<br />
|<br />
|-<br />
|2016-06-06<br />
|Thomas Down<br />
|<br />
|-<br />
|2016-06-20<br />
|Michael Paulini<br />
|<br />
|-<br />
|2016-07-04<br />
|Gary Williams<br />
|<br />
|-<br />
|2016-07-18<br />
|Daniel Wang<br />
|<br />
|-<br />
|2016-08-01<br />
|Gary Schindelman<br />
|<br />
|-<br />
|2016-08-15<br />
|Todd Harris<br />
|<br />
|-<br />
|2016-08-29<br />
|Ranjana Kishore<br />
|<br />
|-<br />
|2016-09-12<br />
|James Done<br />
|<br />
|-<br />
|2016-09-26<br />
|Paul Davis<br />
|<br />
|-<br />
|2016-10-10<br />
|Chris Grove<br />
|<br />
|-<br />
|2016-10-24<br />
|Bruce Bolt<br />
|<br />
|-<br />
|2016-11-07<br />
|Kevin Howe<br />
|<br />
|-<br />
|2016-11-21<br />
|Daniela Raciti<br />
|<br />
|-<br />
|2016-12-05<br />
|Yuling Li<br />
|<br />
|-<br />
|2016-12-19<br />
|Karen Yook<br />
|<br />
|}<br />
<br />
=== 2015 Officers ===<br />
<br />
Date refers to first day on duty. <br />
<br />
{| border="1" class="wikitable"<br />
|-<br />
| bgcolor="gray" | Date<br />
| bgcolor="gray" | Curator<br />
| bgcolor="gray" | Agenda/Minutes<br />
|-<br />
|<strike>2015-01-05</strike><br />
|Daniela Raciti<br />
|[[WBConfCall_2015.01.15-Agenda_and_Minutes | Conference Call Minutes]]<br />
|-<br />
|<strike>2015-01-19</strike><br />
|Yuling Li<br />
|no call<br />
|-<br />
|<strike>2015-02-02</strike><br />
|Karen Yook<br />
|[[WBConfCall_2015.02.5-Agenda_and_Minutes|Conference Call Minutes]]<br />
|-<br />
|<strike>2015-02-16</strike><br />
|Scott Cain<br />
|[[WBConfCall_2015.02.19-Agenda_and_Minutes|Conference Call Minutes]]<br />
|-<br />
|<strike>2015-03-02</strike><br />
|Raymond Lee<br />
|[[WBConfCall_2015.03.05-Agenda_and_Minutes|Conference Call Minutes]]<br />
|-<br />
|<strike>2015-03-16</strike><br />
|Kimberly Van Auken<br />
|[[WBConfCall_2015.03.19-Agenda_and_Minutes|Conference Call Minutes]]<br />
|-<br />
|<strike>2015-03-30</strike><br />
|Sibyl Gao<br />
|[[WBConfCall_2015.04.02-Agenda_and_Minutes|Conference Call Minutes]]<br />
|-<br />
|<strike>2015-04-13</strike><br />
|Wen Chen<br />
|[[WBConfCall_2015.04.16-Agenda_and_Minutes|Conference Call Minutes]]<br />
|-<br />
|<strike>2015-04-27</strike><br />
|Juancarlos Chan<br />
|[[WBConfCall_2015.05.07-Agenda_and_Minutes|Conference Call Minutes]]<br />
|-<br />
|<strike>2015-05-11</strike><br />
|Cecilia Nakamura<br />
|[[WBConfCall_2015.05.21-Agenda_and_Minutes|Conference Call Minutes]]<br />
|-<br />
|2015-05-25<br />
|Xiaodong Wang<br />
|[[WBConfCall_2015.06.04-Agenda_and_Minutes|Conference Call Minutes]]<br />
|-<br />
|2015-06-08<br />
|Mary Ann Tuli<br />
|<br />
|-<br />
|2015-06-22<br />
|Thomas Down<br />
|<br />
|-<br />
|2015-07-06<br />
|Michael Paulini<br />
|<br />
|-<br />
|2015-07-20<br />
|Gary Williams<br />
|<br />
|-<br />
|2015-08-03<br />
|Daniel Wang<br />
|<br />
|-<br />
|2015-08-17<br />
|Gary Schindelman<br />
|<br />
|-<br />
|2015-08-31<br />
|Todd Harris<br />
|<br />
|-<br />
|2015-09-14<br />
|Ranjana Kishore<br />
|<br />
|-<br />
|2015-09-28<br />
|James Done<br />
|<br />
|-<br />
|2015-10-12<br />
|Paul Davis<br />
|<br />
|-<br />
|2015-10-26<br />
|Chris Grove<br />
|<br />
|-<br />
|2015-11-09<br />
|Bruce Bolt<br />
|<br />
|-<br />
|2015-11-23<br />
|Kevin Howe<br />
|<br />
|-<br />
|2015-12-07<br />
|Daniela Raciti<br />
|<br />
|-<br />
|2015-12-21<br />
|Yuling Li<br />
|<br />
|}<br />
<br />
=== 2014 Officers ===<br />
<br />
Date refers to first day on duty. <br />
<br />
{| border="1" class="wikitable"<br />
|-<br />
| bgcolor="gray" | Date<br />
| bgcolor="gray" | Curator<br />
| bgcolor="gray" | Agenda/Minutes<br />
|-<br />
|<strike>2014-01-06</strike><br />
|James Done<br />
|[[WBConfCall_2014.01.09-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-01-20</strike><br />
|Paul Davis<br />
|[[WBConfCall_2014.01.23-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-02-03</strike><br />
|Abigail Cabunoc<br />
|[[WBConfCall_2014.02.06-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-02-17</strike><br />
|Chris Grove<br />
|[[WBConfCall_2014.02.20-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-03-03</strike><br />
|Kevin Howe<br />
|[[WBConfCall_2014.03.06-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-03-17</strike><br />
|Daniela Raciti<br />
|[[WBConfCall_2014.03.20-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-03-31</strike><br />
|Yuling Li<br />
|[[WBConfCall_2014.04.03-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-04-14</strike><br />
|Karen Yook <br />
|No call<br />
|-<br />
|<strike>2014-04-28</strike><br />
|Raymond Lee<br />
|[[WBConfCall_2014.05.01-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-05-12</strike><br />
|Kimberly Van Auken<br />
|no call<br />
|-<br />
|<strike>2014-05-26</strike><br />
|Wen Chen<br />
|[[WBConfCall_2014.06.05-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-06-09</strike><br />
|Juancarlos Chan<br />
|no call<br />
|-<br />
|<strike>2014-06-23</strike><br />
|Cecilia Nakamura<br />
|[[WBConfCall_2014.07.03-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-07-07</strike><br />
|Mary Ann Tuli<br />
|[[WBConfCall_2014.07.17-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-07-21</strike><br />
|Xiaodong Wang<br />
|[[WBConfCall_2014.07.31-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-08-04</strike><br />
|Michael Paulini<br />
|[[WBConfCall_2014.08.07-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-08-18</strike><br />
|Gary Williams<br />
|[[WBConfCall_2014.08.21-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-09-01</strike><br />
|Daniel Wang<br />
|[[WBConfCall_2014.09.03-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-09-15</strike><br />
|Gary Schindelman<br />
|[[WBConfCall_2014.09.18-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-09-29</strike><br />
|Todd Harris<br />
|[[WBConfCall_2014.10.02-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-10-13</strike><br />
|Ranjana Kishore<br />
|[[WBConfCall_2014.10.16-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-10-27</strike><br />
|James Done<br />
|[[WBConfCall_2014.11.06-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-11-10</strike><br />
|Paul Davis<br />
|[[WBConfCall_2014.11.13-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-11-24</strike><br />
|Chris Grove<br />
|[[WBConfCall_2014.12.4-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-12-08</strike><br />
|Bruce Bolt<br />
|[[WBConfCall_2014.12.18-Agenda_and_Minutes | Conference call minutes]]<br />
|-<br />
|<strike>2014-12-22</strike><br />
|Kevin Howe<br />
| no call<br />
|}<br />
<br />
=== 2013 Officers ===<br />
<br />
Date refers to first day on duty. <br />
<br />
{| border="1" class="wikitable"<br />
|-<br />
| bgcolor="gray" | Date<br />
| bgcolor="gray" | Curator<br />
| bgcolor="gray" | Agenda/Minutes<br />
|-<br />
|<strike>2013-01-07</strike> <br />
|Gary Schindelman<br />
|[[WBConfCall_2013.01.10-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-01-21 </strike><br />
|Todd Harris<br />
|[[WBConfCall_2013.01.24-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-02-04 </strike><br />
|Ranjana Kishore<br />
|[[WBConfCall_2013.02.07-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-02-18 </strike><br />
|Daniela Raciti<br />
|[[WBConfCall_2013.02.21-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-03-04 </strike><br />
|Paul Davis<br />
|[[WBConfCall_2013.03.07-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-03-18 </strike><br />
|Abigail Cabunoc<br />
||[[WBConfCall_2013.03.21-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-04-01</strike><br />
|Chris Grove<br />
|[[WBConfCall_2013.04.04-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-04-15 </strike><br />
|Phil Ozersky<br />
|[[WBConfCall_2013.04.18-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-04-29</strike><br />
|Kevin Howe<br />
|[[WBConfCall_2013.05.02-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-05-13 </strike><br />
|James Done<br />
|[[WBConfCall_2013.05.16-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-05-27 </strike><br />
|Raymond Lee<br />
|[[WBConfCall_2013.06.06-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-06-10 </strike><br />
|Karen Yook<br />
|[[WBConfCall_2013.06.20-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-06-24 </strike><br />
|Yuling Li<br />
|No Call<br />
|-<br />
|<strike>2013-07-08 </strike><br />
|Kimberly Van Auken<br />
|[[WBConfCall_2013.07.18-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-07-22</strike> <br />
|Wen Chen<br />
|[[WBConfCall_2013.08.01-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-08-05</strike><br />
|Juancarlos Chan<br />
|[[WBConfCall_2013.08.15-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-08-19</strike><br />
|Cecilia Nakamura<br />
|Juancarlos filling, no call<br />
|-<br />
|<strike>2013-09-02</strike> <br />
|Mary Ann Tuli<br />
|[[WBConfCall_2013.09.05-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-09-16</strike><br />
|JD Wong<br />
|[[WBConfCall_2013.09.19-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-09-30 </strike><br />
|Xiaodong Wang<br />
|[[WBConfCall_2013.10.02-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-10-14</strike><br />
|Michael Paulini <br />
| [[WBConfCall_2013.10.17-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-10-28</strike><br />
|Gary Williams<br />
| [[WBConfCall_2013.11.07-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-11-11</strike><br />
|Daniel Wang<br />
| [[WBConfCall_2013.11.21-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2013-11-25</strike><br />
|Gary Schindelman<br />
|[[WBConfCall_2013.12.05-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|2013-12-09<br />
|Todd Harris<br />
|<br />
|-<br />
|2013-12-23<br />
|Ranjana Kishore<br />
|<br />
|}<br />
<br />
=== 2012 Officers ===<br />
<br />
Date refers to first day on duty.<br />
<br />
{| border="1" class="wikitable"<br />
|-<br />
| bgcolor="gray" | Date<br />
| bgcolor="gray" | Curator<br />
| bgcolor="gray" | Agenda/Minutes<br />
|-<br />
|<strike>2012-01-09</strike><br />
|Tamberlyn Bieri<br />
|<br />
|-<br />
|<strike>2012-01-23 </strike><br />
|Michael Paulini <br />
|[[WBConfCall_2012.02.02-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-02-06 </strike><br />
|Gary Williams <br />
|[[WBConfCall_2012.02.16-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-02-20 </strike><br />
|Daniel Wang <br />
|[[WBConfCall_2012.03.01-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-03-05 </strike><br />
|Gary Schindelman <br />
|[[WBConfCall_2012.03.15-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-03-19 </strike><br />
|Todd Harris <br />
|[[WBConfCall_2012.04.05-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-04-02 </strike><br />
|Ranjana Kishore<br />
|<br />
|-<br />
|<strike>2012-04-16 </strike> <br />
|Paul Davis <br />
|[[WBConfCall_2012.04.19-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-04-30 </strike><br />
|Abigail Cabunoc <br />
|[[WBConfCall_2012.05.03-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-05-14 </strike> <br />
|Chris Grove <br />
|[[WBConfCall_2012.05.17-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-05-28 </strike><br />
|Phil Ozersky <br />
|[[WBConfCall_2012.06.07-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-06-11</strike><br />
|Daniela Raciti <br />
|[[WBConfCall_2012.06.21-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-06-25</strike><br />
|Kevin Howe<br />
|[[WBConfCall_2012.07.05-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-07-09</strike><br />
|Yuling Li<br />
|[[WBConfCall_2012.07.019-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-07-23</strike><br />
|Karen Yook<br />
|[[WBConfCall_2012.08.02-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-08-06</strike><br />
|Raymond Lee<br />
|[[WBConfCall_2012.08.16-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-08-20</strike><br />
|Kimberly Van Auken<br />
|<br />
|-<br />
|<strike>2012-09-03</strike><br />
|Wen Chen <br />
|[[WBConfCall_2012.09.06-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-09-17 </strike><br />
|Juancarlos Chan<br />
|[[WBConfCall_2012.09.20-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-10-01 </strike><br />
|Cecilia Nakamura<br />
||[[WBConfCall_2012.10.03-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-10-15 </strike><br />
|Mary Ann Tuli<br />
||[[WBConfCall_2012.10.18-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|<strike>2012-10-29 </strike><br />
|Xiaodong Wang<br />
||[[WBConfCall_2012.11.01-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|2012-11-12 <br />
|Tamberlyn Bieri<br />
|<br />
|-<br />
|2012-11-26 <br />
|Michael Paulini<br />
||[[WBConfCall_2012.12.06-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|2012-12-10 <br />
|Gary Williams<br />
||[[WBConfCall_2012.12.20-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|2012-12-24 <br />
|Daniel Wang<br />
|<br />
|}<br />
<br />
=== 2011 Officers ===<br />
<br />
{| border="1" class="wikitable"<br />
|-<br />
| bgcolor="gray" | Date<br />
| bgcolor="gray" | Curator<br />
| bgcolor="gray" | Agenda/Minutes<br />
|-<br />
|<strike>2011-01-10 </strike><br />
|Michael Paulini <br />
|<br />
|-<br />
|<strike>2011-01-24 </strike> <br />
|Gary Williams<br />
|<br />
|-<br />
|<strike>2011-02-07 </strike><br />
|Daniel Wang <br />
|<br />
|-<br />
|<strike>2011-02-21 </strike> <br />
|Norie de la Cruz<br />
|<br />
|-<br />
|<strike>2011-03-07 </strike><br />
|Gary Schindelman <br />
|<br />
|-<br />
|<strike>2011-03-21 </strike><br />
|Todd Harris <br />
|<br />
|-<br />
|<strike>2011-04-04 </strike> <br />
|Ranjana Kishore<br />
|<br />
|-<br />
|<strike>2011-04-18 </strike><br />
|Bill Nash <br />
|<br />
|-<br />
|<strike>2011-05-02 </strike><br />
|Abigail Cabunoc <br />
|<br />
|-<br />
|<strike> 2011-05-16 </strike> <br />
|Paul Davis <br />
|[[WBConfCall_2011.05.19-Agenda_and_Minutes | Conference call Minutes]] <br />
|-<br />
|-<br />
|<strike> 2011-05-30 </strike><br />
|Ruihua Fang <br />
|<br />
|-<br />
|<strike>2011-06-13 </strike> <br />
|Chris Grove<br />
|<br />
|-<br />
|<strike>2011-06-27 </strike> <br />
|Phil Ozersky<br />
|<br />
|-<br />
|<strike>2011-07-11 </strike><br />
|Kevin Howe <br />
|[[WBConfCall_2011.07.21-Agenda_and_Minutes | Conference call Minutes]] <br />
|-<br />
|-<br />
|<strike>2011-07-25 </strike><br />
|Daniela Raciti <br />
|<br />
|-<br />
|<strike>2011-08-08 </strike> <br />
|Yuling Li<br />
|<br />
|-<br />
|<strike>2011-08-22 </strike> <br />
|Karen Yook<br />
|<br />
|-<br />
|<strike>2011-09-05 </strike><br />
|Raymond Lee <br />
|<br />
|-<br />
|<strike>2011-09-19 </strike><br />
|Xiaoqi Shi <br />
|<br />
|-<br />
|<strike>2011-09-19 </strike><br />
|Kimberly Van Auken <br />
|<br />
|-<br />
|<strike>2011-10-03 </strike><br />
|Wen Chen <br />
|<br />
|-<br />
|<strike>2011-10-17 </strike><br />
|Juancarlos Chan <br />
|<br />
|-<br />
|<strike>2011-10-31 </strike><br />
|Cecilia Nakamura <br />
|[[WBConfCall_2011.11.03-Agenda_and_Minutes | Conference call Minutes]] <br />
|-<br />
|<strike>2011-11-14 </strike> <br />
|Mary Ann Tuli<br />
|[[WBConfCall_2011.11.17-Agenda_and_Minutes | Conference call Minutes]] <br />
|-<br />
|<strike>2011-11-28 </strike><br />
|Xiaodong Wang <br />
|<br />
|-<br />
|<strike>2011-12-12 </strike><br />
|Arun Rangarajan <br />
|[[WBConfCall_2011.12.15-Agenda_and_Minutes | Conference call Minutes]] <br />
|-<br />
|<strike>2011-12-26 </strike><br />
|Xiaoqi Shi <br />
|[[WBConfCall_2012.02.05-Agenda_and_Minutes | Conference call Minutes]]<br />
|-<br />
|}<br />
<br />
=== 2010 Officers ===<br />
<br />
<br />
<strike>2010-01-04 Xiaodong Wang<br />
<br />
2010-01-18 Tamberlyn Bieri<br />
<br />
2010-02-01 Michael Paulini<br />
<br />
2010-02-15 Gary Williams<br />
<br />
2010-03-01 Daniel Wang<br />
<br />
2010-03-15 Norie de la Cruz<br />
<br />
2010-03-29 Gary Schindelman<br />
<br />
2010-04-12 Todd Harris<br />
<br />
2010-04-26 Ranjana Kishore<br />
<br />
2010-05-10 Abigail Cabunoc<br />
<br />
2010-05-24 Paul Davis (Split shift)<br />
<br />
2010-05-31 Xiaoqi Shi<br />
<br />
2010-06-14 Paul Davis (Split shift)<br />
<br />
2010-06-21 Bill Nash<br />
<br />
2010-07-05 Ruihua Fang<br />
<br />
2010-07-19 Chris Grove<br />
<br />
2010-08-02 Phil Ozersky<br />
<br />
2010-08-16 Karen Yook<br />
<br />
2010-08-30 Raymond Lee<br />
<br />
2010-09-13 Kimberly Van Auken<br />
<br />
2010-09-27 Wen Chen<br />
<br />
2010-10-11 Juancarlos Chan<br />
<br />
2010-10-25 Cecilia Nakamura<br />
<br />
2010-11-08 Mary Ann Tuli<br />
<br />
2010-11-22 Xiaodong Wang<br />
<br />
2010-12-13 Arun Rangarajan<br />
<br />
2010-12-27 Tamberlyn Bieri</strike><br />
<br />
=== 2009 Officers ===<br />
<br />
<strike>2009-01-05 Juancarlos Chan</strike><br />
<br />
<strike>2009-01-19 Cecilia Nakamura</strike><br />
<br />
<strike>2009-02-02 Karen Yook</strike><br />
<br />
<strike>2009-02-16 Jolene Fernandes</strike><br />
<br />
<strike>2009-03-02 Hans-Michael Mueller</strike><br />
<br />
<strike>2009-03-16 Xiaodong Wang</strike><br />
<br />
<strike>2009-03-30 Tamberlyn Bieri</strike><br />
<br />
<strike>2009-04-13 Michael Han</strike><br />
<br />
<strike>2009-04-27 Gary Williams</strike><br />
<br />
<strike>2009-05-11 Daniel Wang</strike><br />
<br />
<strike>2009-05-25 Norie de la Cruz</strike><br />
<br />
<strike>2009-06-08 Gary Schindelman</strike><br />
<br />
<strike>2009-06-22 Todd Harris</strike><br />
<br />
<strike>2009-07-06 Ranjana Kishore</strike><br />
<br />
<strike>2009-07-20 Anthony Rogers</strike><br />
<br />
<strike>2009-08-03 Paul Davis</strike><br />
<br />
<strike>2009-08-17 Phil Ozersky</strike><br />
<br />
<strike>2009-08-31 Karen Yook</strike><br />
<br />
<strike>2009-09-14 Erich Schwarz</strike><br />
<br />
<strike>2009-09-28 Raymond Lee<br />
<br />
2009-10-12 Kimberly Van Auken<br />
<br />
2009-10-26 Wen Chen<br />
<br />
2009-11-09 Juancarlos Chan<br />
<br />
2009-11-23 Cecilia Nakamura<br />
<br />
2009-12-07 Mary Ann Tuli<br />
<br />
2009-12-21 Jolene Fernandes</strike><br />
<br />
== [[2008 Officers]] ==<br />
<br />
As of 2008 01 31 conference calls will be monthly, but Help Desk duties will split into 2-week cycles. <br />
<br />
<strike>2008-01-31 Cecilia Nakamura</strike> <br />
<br />
<strike>2008-02-14 Hans-Michael Mueller</strike> <br />
<br />
<strike>2008-02-?? Xiaodong Wang</strike> <br />
<br />
<strike>2008-03-?? Tamberlyn Bieri</strike> <br />
<br />
<strike>2008-03-31 Michael Han</strike> <br />
<br />
<strike>2008-04-14 Gary Williams</strike> <br />
<br />
<strike>2008-04-28 Sheldon</strike> <br />
<br />
<strike>2008-05-12 Daniel Wang</strike> <br />
<br />
<strike>2008-05-26 Norie de la Cruz</strike> <br />
<br />
<strike>2008-06-09 Gary Schindelman</strike> <br />
<br />
<strike>2008-06-23 Will Spooner</strike> <br />
<br />
<strike>2008-07-07 Darin Blasiar </strike> <br />
<br />
<strike>2008-07-21 Mary Ann Tuli</strike> <br />
<br />
<strike>2008-08-04 Todd Harris</strike> <br />
<br />
<strike>2008-08-18 Ranjana Kishore</strike> <br />
<br />
<strike>2008-09-01&nbsp;Anthony Rogers </strike> <br />
<br />
<strike>2008-09-15 Paul Davis</strike> <br />
<br />
<strike>2008-09-29 Phil Ozersky</strike> <br />
<br />
<strike>2008-10-13 Erich Schwarz</strike> <br />
<br />
<strike>2008-10-27 Raymond Lee </strike> <br />
<br />
<strike>2008-11-10 Kimberly Van Auken</strike> <br />
<br />
<strike>2008-11-24 Andrei Petcherski</strike> <br />
<br />
<strike>2008-12-08 Igor Antoshechkin</strike> <br />
<br />
<strike>2008-12-22 Wen Chen</strike><br><br />
<br />
<br />
[[Category:Help Desk]]<br />
[[Category:WormBase Documentation]]<br />
[[Category:Communication (Web Dev)]]</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Antibody&diff=26669Antibody2015-04-30T13:11:46Z<p>Xdwang: /* Antibody dumper */</p>
<hr />
<div>back to [[Caltech documentation]]<br />
<br />
== Antibody curation SOPs ==<br />
<br />
=== Antibody curation===<br />
1. '''Antibody paper first pass:'''<br />
*Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation: <br />
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen<br />
*The results is in html page. It lists paper names and the antibodies associated with them. <br />
<br />
2. '''A few steps''' need to be done before starting curation:<br />
<br />
*save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)<br />
*open a terminal<br />
**cd to curation, then my_acefiles, then antibody_curation<br />
**cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)<br />
**[OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]<br />
**run Yuling's new script (written on 3/12/2012) to get new antibody paper:<br />
***./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa<br />
****basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'<br />
*copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder<br />
<br />
3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file: <br />
<br />
Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file '''AbCurationLog.txt''', so that the same paper will not appear again next time.<br />
<br />
'''AbCurationLog.txt''' is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
<br />
<br />
4. '''A few important documents''':<br />
*'''AbCurationLog.txt''' -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation<br />
**this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)<br />
*'''WBAbPaperList.ace''' -- Antibody papers curated before Textpresso time<br />
<br />
<br />
<br />
OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace<br />
<br />
OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl<br />
<br />
OBSOLETE STEP (3/20/2012):[<br />
----<br />
Here is how to use the TextpressoAbFinder.pl<br />
<br />
(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)<br />
<br />
/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl<br />
<br />
<br />
This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.<br />
<br />
Input file 1: anti_protein.txt -- all antibody papers found by Textpresso<br />
<br />
Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time<br />
<br />
Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.<br />
<br />
Output file 1: NewAbPaper.txt -- New antibody papers <br />
**I can change the output file 1 name to NewAbPaper_20110113 in script.<br />
**two places need to be changed each time<br />
<br />
Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.<br />
<br />
1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked.<br />
Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern.<br />
1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841.<br />
518 papers identified by Textpresso are false positive. Precision is 0.710452766908888. <br />
<br />
5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. <br />
I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner<br />
<br />
6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time.<br />
]<br />
<br />
===Antibody curation controlled vocabulary===<br />
<br />
Antibody control vocabulary<br />
<br />
Remark "Commercial Antibody."<br />
Remark "Tissue Specific Antibody Marker."<br />
Summary "Rabbit polyclonal antibody against XXX recombinant protein."<br />
Summary "Rabbit polyclonal peptide antibody against XXX."<br />
Summary "Mouse monoclonal peptide antibody against XXX."<br />
<br />
=== Antibody curation guideline===<br />
<br />
WormBase requires the following information for Antibody: <br />
<br />
1. Antibody Name: for consistance, use [WBPaperID]:anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. '''[WBPaper00036348]:anti-EPG-2''')<br />
<br />
2. Original reference where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase.)<br />
<br />
3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)<br />
<br />
4. Antigen used to generate antibody (peptide or protein sequence) <br />
<br />
5. If the antibody is from another paper, find the original antibody object and add the reference to it. <br />
<br />
6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.<br />
<br />
=='''Antibody dumper'''==<br />
<br />
* dumper is located on tazendra:<br />
**the module and a use_package.pl are on the tazendra at :<br />
***/home/postgres/work/citace_upload/antibody/get_antibody_ace.pm<br />
*** /home/postgres/work/citace_upload/antibody/use_package.pl<br />
**They symlinked the use_package.pl on the tazendra at :<br />
***/home/acedb/xiaodong/antibody/<br />
<br />
*the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.<br />
*cronjob was cancealed for antibody dumping for upload<br />
<br />
<br />
<strike>[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)</strike><br />
<br />
<strike>/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
<br />
<br />
'''dump out file:''' ./dump_antibody_ace.pl<br />
<br />
file name: antibody.ace<br />
<br />
'''I usually change the file name:''' cp antibody.ace antibody.ace.date_of_dump<br />
<br />
'''then copy file to spica:''' scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.</strike><br />
<br />
=== Handling Dead Genes During Dump Process ===<br />
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:<br />
<br />
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
</pre><br />
<br />
becomes<br />
<br />
<pre><br />
Gene "WBGene00002345" Inferred_automatically<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234.<br />
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has <br />
replaced WBGene00001234 accordingly."<br />
</pre><br />
<br />
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.<br />
<br />
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that is now considered dead. Please interpret with discretion."<br />
</pre><br />
<br />
OR<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene<br />
(WBGene00001234) that has been suppressed. Please interpret with discretion."<br />
</pre><br />
<br />
and lastly,<br />
<br />
3) If the gene has undergone a split, such genes will be dumped as:<br />
<br />
<pre><br />
Gene "WBGene00001234"<br />
Historical_gene "WBGene00001234" "Note: This object originally referred to a gene <br />
(WBGene00001234) that is now considered split. Please interpret with discretion."<br />
</pre><br />
<br />
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.<br />
<br />
<br />
Gene Examples:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
=='''Notes'''==<br />
<br />
'''Changed the postgres tables for ---05/22/2011''' <br />
<br />
reference -> paper<br />
<br />
location -> laboratory<br />
<br />
'''cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012'''<br />
<br />
<strike>'''email from Juancarlose related to cronjob --- 06/06/2011'''<br />
<br />
Set the cronjob to run every Thursday :<br />
0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
It puts the file at :<br />
/home/postgres/public_html/cgi-bin/data/antibody.ace<br />
So you can see it at :<br />
http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace<br />
If you need to run it manually, just paste into the shell :<br />
/home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl<br />
Then log onto spica, cd into the directory where you want it, remove<br />
the existing antibody.ace file, and do : <br />
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"</strike><br />
<br />
'''Dumper change for historical_gene tag:-05/22/2013'''<br />
<br />
-model change refer to Chris document:<br />
https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit<br />
<br />
-dumper change via Skype with J:<br />
[5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl<br />
<br />
[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi<br />
<br />
[5/22/13 2:08:06 PM] j chan: gin_dead<br />
<br />
[5/22/13 2:08:19 PM] j chan: Dead -> dead<br />
<br />
[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged<br />
<br />
[5/22/13 2:08:31 PM] j chan: split_into -> split<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> suppressed<br />
<br />
[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else<br />
<br />
[5/22/13 2:09:16 PM] j chan: abp_gene<br />
<br />
[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically<br />
<br />
[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark<br />
<br />
[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message<br />
<br />
[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark<br />
<br />
[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value<br />
<br />
<br />
-tested with genes:<br><br />
A split gene: WBGene00012507<br><br />
A merged gene: WBGene00007524<br><br />
A dead gene: WBGene00007814<br><br />
A suppressed gene: WBGene00015490<br><br />
<br />
-migrated to tazendra on the same day</div>Xdwanghttps://wiki.wormbase.org/index.php?title=New_2012_Curation_Status&diff=25187New 2012 Curation Status2014-10-24T15:48:06Z<p>Xdwang: /* Datatypes for Textpresso String Searches */</p>
<hr />
<div>Curation Status & Statistics Form (2012)<br />
<br />
The live form (on Tazendra) can be found [http://tazendra.caltech.edu/~postgres/cgi-bin/curation_status.cgi here]<br />
<br />
The sandbox/testing form can be found [http://mangolassi.caltech.edu/~postgres/cgi-bin/curation_status.cgi here]<br />
<br />
<br />
The CGI code is located on Tazendra/Mangolassi here:<br />
<br />
/home/postgres/public_html/cgi-bin/curation_status.cgi<br />
<br />
<br />
<br />
<br />
<br />
= User Guide =<br />
<br />
<br />
== Main Page ==<br />
<br />
<br />
[[File:Curation_Status_Form_Main_Page_11-8-2013.png]]<br />
<br />
<br />
Above is a screenshot of the main page of the Curation Status Form. The user/curator is requested to identify who they wish to login as, and to select one of four options to continue:<br />
<br />
1) [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]] - This is where the curator can specify one or more specific papers they wish to view curation status results for (see below). This page includes a Topic paper filter to search for papers related to a WormBase Biological Topic.<br />
<br />
2) [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] - This is where the curator can add curation status results for one or more specific papers (see below).<br />
<br />
3) [[New_2012_Curation_Status#Main_Curation_Statistics_Page|Curation Statistics Page]] - This is where the curator can view all curation statistics for '''ALL''' datatypes and '''ALL''' flagging methods (see below).<br />
<br />
4) [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]] - As an alternative to viewing the curation statistics for '''ALL''' datatypes and '''ALL''' flagging methods (as with option #3 above), this is where the curator can specify for which datatypes and flagging methods they would like to see curation statistics (see below).<br />
<br />
<br><br />
<br><br />
<br><br />
<br />
== Specific Paper Page ==<br />
<br />
<br />
[[File:Curation_Status_Form_Specific_Paper_Page_11-8-2013.png]]<br />
<br />
<br />
Above is a screenshot of the Specific Paper Page where a curator can specify which paper(s) they would like to view curation status results for. After typing/pasting in one or more WBPaper IDs in the paper entry field, the curator can specify which datatypes and flagging methods they would like to see results for. Note that selecting "all datatypes" will override any single datatype selections below. A curator can select what curation data sources they would like to see results for (i.e. Ontology Annotator and/or cur_curdata), flagging methods (SVM, AFP, CFP), the number of papers they would like to load at one time (default of 10), and whether they would like to see info (and links) for the PubMed ID (PMID), the PDF, and the paper's journal.<br />
<br><br />
<br><br />
Once a curator clicks on "Get Results", they will be directed to the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]], where they can view the results of their query.<br />
<br />
Papers can be listed as WBPaper### or simply as numbers, separated by spaces, commas, pipes, new lines, anything that is not a number. Any number that is entered will be considered a valid paper ID.<br />
<br />
'''Recent addition as of November 2013''' CG 11-6-2013<br />
A Topic dropdown menu has been added to the "Specific Paper Page" so as to allow curators to view all papers related to a particular curation topic with respect to their data type. Note that selecting a Topic will look for overlap with any WBPaper IDs entered into the main paper entry field, only populating the form with papers associated with the Topic AND in the paper entry field. Topic papers will be pulled from the Topic Curation OA. If no papers are entered in the paper entry field AND no topic is selected, the form will return ALL papers (having undergone at least one flagging pipeline).<br />
<br />
<br><br />
<br><br />
<br><br />
<br />
== Add Results Page ==<br />
<br />
<br />
[[File:Curation_Status_Form_Add_Results_Page2.png]]<br />
<br />
Above is a screenshot of the Add Results Page of the form, where a curator can add new curation status results for one or more papers that they specify. A curator '''must''' specify what datatype they wish to submit paper results for and '''must''' specify what the status is for the paper(s): curated and (hence) positive, validated postive (but not yet curated), validated negative, or (if they need to revert back to not validated, or blank, status) not validated. The curator '''must''' then also specify at least one paper for which to apply this curation status in the paper entry field. Multiple papers '''must''' be entered as WBPaper### format and each on a separate line.<br />
<br><br />
<br><br />
Optionally, a curator can select a pre-made comment from a drop down menu and/or enter a free-text comment. Once the curator clicks "Add Results", they will be directed to a '''New Results Summary Page''':<br />
<br />
<br />
[[File:Curation_Status_Form_Submission_Summary2.png]]<br />
<br />
<br />
If the results are overwriting existing results, they will be directed to an '''Overwrite Confirmation Page''':<br />
<br />
<br />
[[File:Curation_Status_Form_Confirm_Overwrite.png]]<br />
<br />
<br />
at which point the curator can confirm the overwrite of the previous results for the indicated paper and datatype, or simply go the main page (or go back a page to make corrections/edits). Note that the fields for which data has changed are highlighted in yellow for easy viewing. If the curator confirms the overwrite by checking the confirmation check box and clicking on "Overwrite Selected Results", they will be directed to the '''Overwrite Confirmation Summary Page''':<br />
<br />
[[File:Curation_Status_Form_Overwrite_Conf_Summary.png]]<br />
<br />
A link is provided to go back to the main page of the form.<br />
<br />
<br><br />
<br><br />
<br><br />
<br />
== Main Curation Statistics Page ==<br />
<br />
<br />
[[File:Curation_Status_Form_Curation_Statistics_Page.png]]<br />
<br />
<br />
Above is a screenshot of a portion of the entire Curation Statistics table that a curator would be directed to from the main page of the form if they had clicked on the Curation Statistics Page button. Displayed at the top of the table are general paper statistics for a given datatype (datatypes indicated at the top of each column). Below that are statistics for papers that have been flagged (positive or negative) for the indicated datatype by ANY (at least one) flagging method. Below the "Any" statistics are the "Intersection" statistics, indicating papers flagged by ALL flagging methods for the indicated datatype. It should be emphasized here that '''"flagged"''' means processed by the flagging method, not necessarily flagged positive. Although not visible in the above screenshot, statistics for SVM results, AFP results, and CFP results are also included in this table.<br />
<br><br />
<br><br />
The "Any", "Intersection", and individual flagging method sections of the table each follow a general template:<br />
<br />
<pre><br />
Flagged<br />
Flagged Positive<br />
Flagged Positive and Validated<br />
Flagged Positive, Validated False Positive<br />
Flagged Positive, Validated True Positive<br />
Flagged Positive, Validated True Positive, Curated<br />
Flagged Positive, Validated True Positive, Not Curated<br />
Flagged Positive, Not Validated<br />
Flagged Positive, Not Curated<br />
</pre><br />
<br />
and the individual flagging method sections additionally have a section for flagged negatives:<br />
<br />
<pre><br />
Flagged Negative<br />
Flagged Negative and Validated<br />
Flagged Negative, Validated True Negative<br />
Flagged Negative, Validated False Negative<br />
Flagged Negative, Validated False Negative, Curated<br />
Flagged Negative, Validated False Negative, Not Curated<br />
Flagged Negative, Not Validated<br />
Flagged Negative, Not Curated<br />
</pre><br />
<br />
Each row title/header can be clicked on to bring up a small pop-up window with a brief description of what each title means. Each cell of the table has numbers indicating the number of papers that fit the criteria for that datatype and flag status, and the percentage (to two significant digits) that represents of a subset of some larger set. Each percentage is calculated, generally, as follows:<br />
<br />
<pre><br />
Flagged (% of curatable papers)<br />
Flagged Positive (% flagged)<br />
Flagged Positive and Validated (% flagged positive)<br />
Flagged Positive, Validated False Positive (% flagged positive and validated)<br />
Flagged Positive, Validated True Positive (% flagged positive and validated)<br />
Flagged Positive, Validated True Positive, Curated (% flagged positive and validated true positive)<br />
Flagged Positive, Validated True Positive, Not Curated (% flagged positive and validated true positive)<br />
Flagged Positive, Not Validated (% flagged positive)<br />
Flagged Positive, Not Curated (% flagged positive)<br />
</pre><br />
<br />
Each cell number (aside from the top three rows) is also a hyperlink to the [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]], listing the paper IDs for each paper in the list, as well as providing options for the view of each of those papers in the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]]. <br />
<br />
<br><br />
<br><br />
<br />
=== Curation Statistics Page Display Info ===<br />
<br />
The title/headers for each row are displayed at the left AND right sides of the table, to enable easier viewing when there are several datatypes being viewed at once. If the number of datatypes is restricted via the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], the titles/headers for each row will only display on both left and right sides of the table if more than six datatypes are selected for viewing (this was done to avoid overcrowding of the page when six or fewer datatypes/columns were visible).<br />
<br />
<br />
The row-title column (leftmost column and, when more than six datatypes are visible, the rightmost column) are set to display at a fixed width of 600 pixels to allow all titles to fit on a single line. All other columns (datatype columns) are set to display at a fixed width of 120 pixels.<br />
<br />
<br><br />
<br><br />
<br><br />
<br />
== Curation Statistics Options Page ==<br />
<br />
<br />
[[File:Curation_Status_Form_Curation_Statistics_Options_Page.png]]<br />
<br />
<br />
Above is a screenshot of the Curation Statistics Options Page, where a curator can specify what flagging methods and datatypes they would like to see curation statistics for. ('''Note 1''') Loading a table with fewer datatypes or flagging methods is often much faster than loading the entire table. Whereas loading the entire table (with ALL datatypes and ALL flagging methods) takes roughly 23 seconds to load, loading a table with ALL flagging methods but ONE datatype will usually only take 1-3 seconds to load. ('''Note 2''') If 6 or fewer datatypes are requested, the row titles/headers will only appear on the left side of the table; if more than 6 datatypes are viewed at once, the row titles/headers will appear on both the left AND the right sides of the table. ('''Note 3''') The "Any" and "Intersection" rows of the resulting table will only show results for the flagging methods that you have selected to view. If only one flagging method is selected to view, the "Any" and "Intersection" rows will be identical to each other and to the "flagged positive" rows for the single flagging method.<br />
<br />
<br><br />
<br><br />
<br><br />
<br />
== Prepopulated Specific Papers Page ==<br />
<br />
<br />
[[File:Curation_Status_Form_Prepopulated_Specific_Papers_Page_11-8-2013.png]]<br />
<br />
<br />
Above is a screenshot of the Prepopulated Specific Papers Page. Curators are directed here from any [[New_2012_Curation_Status#Main_Curation_Statistics_Page|Curation Statistics table]] via the hyperlinked numbers in the statistics table. The entire list of paper IDs (that fit the criteria indicated in the table row/column of the statistics table) is listed as hyperlinks to the individual paper results (on the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]]), as well as in the search box. Note that this page is identical to the [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]], except that paper IDs are already pre-populated into the form from the statistics table.<br />
<br />
An addition as of November 2013 is the Topic paper filtering drop down menu. The drop down menu provides a list of WormBase Biological Topics as read from the Topic Curation OA. If a topic is selected (e.g. 'Aging' in the example screenshot), the form will look for any papers that exist in BOTH the Topic paper list AND the list of papers entered into the paper entry field (in this case prepopulated from some prior filtering step). If there are no overlapping papers from both lists, the form will not return any papers. If a Topic is not selected, the form will simply return the papers as listed in the paper entry field. If no topic is selected AND there are no papers in the paper entry field, the form will return ALL papers (having undergone at least one flagging pipeline).<br />
<br />
<br />
<br><br />
<br><br />
<br><br />
<br />
== Detailed Results of Papers Page ==<br />
<br />
<br />
[[File:Curation_Status_Form_Detailed_Results_of_Papers_Page.png]]<br />
<br />
<br />
Above is a screenshot of the Detailed Results of Papers Page, where curators can view and edit the curation status of individual papers as well as view the the flagging results of each flagging method for individual papers. <br />
<br />
'''When ALL columns are visible''': <br />
<br />
the '''first column''' displays the WormBase paper ID (WBPaperID#); <br />
<br />
the '''second column''' displays the name of the journal for the publication; <br />
<br />
the '''third column''' lists the PubMed ID (PMID) and links out to the PubMed webpage for this article; <br />
<br />
the '''fourth column''' displays the name of the PDF file stored locally on Tazendra with a hyperlink to the PDF file of the article in our local PDF archives; <br />
<br />
the '''fifth column''' displays the datatype for that row (note that multiple datatypes may be displayed per paper, on separate rows); <br />
<br />
the '''sixth, seventh, and eighth columns''' display the results of the flagging methods SVM, CFP, and AFP, respectively; <br />
<br />
the '''ninth''' column indicates the status of the paper in the Ontology Annotator (OA) for that datatype indicating "oa_blank" if the paper does not exist in the respective OA, or "curated" if the paper does exist in the respective OA, indicating that it has been curated (or at least partially curated); <br />
<br />
the '''tenth column''' provides a drop-down menu to select a curator (selecting a curator is only necessary if overwriting/changing the existing curator; the form recognizes what curator is logged in and automatically populates this field with the correct (logged in) curator if this field is blank); <br />
<br />
the '''eleventh''' column provides a drop-down menu to select the "new result" for the paper, indicating whether it is "curated and positive", "validated positive", "validated negative", or "not validated" ("not validated" only needs to be selected when reverting back from "curated and positive", "validated positive", or "validated negative" entries that may have been entered accidentally; selecting this option will result in a blank field once the change has been submitted through the form)<br />
<br />
the '''twelfth column''' provides the drop-down menu of standard, premade comments<br />
<br />
the '''thirteenth (and final) column''' is a free-text area where a curator can write in any pertinent notes about the curation status of this paper-datatype pair<br />
<br />
If any new results for a paper are entered (in columns 10-13), the curator must click on the "Submit New Results" button at the bottom of the screen, at which point they will either be directed to the New Results Summary page or to the Overwrite Confirmation Page, as shown above in the Wiki section describing the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]]. Note that in order for new paper result submissions to take effect, there '''must''' be a value in the "new results" column. Otherwise any comments (premade or free-text) will not be registered with the paper.<br />
<br />
<br />
== Conflicts ==<br />
<br />
A paper that is found to have mutually exclusive flags triggers the papers to be flagged as a "conflict". These papers will not appear in any other lists (e.g. "curated", "validated", etc.) and need to be resolved before they enter back into a normal list. A conflict can be triggered if:<br />
<br />
* A paper is found to be curated by the OA, but has been flagged, via the form, to be "validated negative". This raises a conflict which needs a curator to check if the curation is bogus or if the "validated negative" flag was a mistake. This can be resolved by changing the "validated negative" status to any other status ("not validated" (blank), "validated positive", or "curated and positive") if the paper was mistakenly flagged as "validated negative", OR by fixing the curation in the OA (delete bogus curation or fix the paper reference).<br />
<br />
A paper will not be considered in conflict if the OA status indicates "oa blank" and is flagged as "curated and positive". This will be the case, for example, for all of the large scale papers whose annotations do not reside in the OA/Postgres.<br />
<br />
<br />
== Topic-Paper Filter ==<br />
<br />
(In progress... CG 11-5-2013)<br />
<br />
The Curation Status Form will now provide an option to filter a list of papers based on a WormBase Biological Topic. Official topics and affiliated papers are recognized from the "Topic" OA.<br />
<br />
<br><br />
<br><br />
<br><br />
<br />
= Code Documentation =<br />
<br />
Below is the documentation for the form's code, located on Tazendra (when live) or Mangolassi (sandbox):<br />
<br />
/home/postgres/public_html/cgi-bin/curation_status.cgi<br />
<br />
<br />
== Specific Paper Page/ Prepopulated Specific Paper Page ==<br />
<br />
The following code prints the "[[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]]":<br />
<br />
<pre><br />
sub printSpecificPaperPage {<br />
&printFormOpen();<br />
&printHiddenCurator();<br />
&printTextareaSpecificPapers('');<br />
&printSelectTopics();<br />
&printCheckboxesDatatype('off');<br />
&printCheckboxesCurationSources('all');<br />
&printPaperOptions();<br />
&printSubmitGetResults();<br />
&printFormClose();<br />
} # sub printSpecificPaperPage<br />
</pre><br />
<br />
<br />
The following code prints the "[[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]]":<br />
<br />
<pre><br />
sub listCurationStatisticsPapersPage {<br />
&printFormOpen();<br />
&printHiddenCurator();<br />
my ($papers) = &printListCurationStatisticsPapers();<br />
&printTextareaSpecificPapers($papers);<br />
&printSelectTopics();<br />
&printSubmitGetResults();<br />
($oop, my $listDatatype) = &getHtmlVar($query, "listDatatype");<br />
&printCheckboxesDatatype($listDatatype);<br />
&printCheckboxesCurationSources('all');<br />
&printPaperOptions();<br />
&printSubmitGetResults();<br />
&printFormClose();<br />
} # sub listCurationStatisticsPapersPage<br />
</pre><br />
<br />
<br />
=== Paper-Topic Filter ===<br />
<br />
On both the [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]] and the [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]], we have the option to filter the papers listed in the WBPaper ID field based on their affiliation to a particular WormBase Biological Topic, as informed by the Topic Curation OA.<br />
<br />
<br />
The following code is responsible for displaying the drop down menu of Topics:<br />
<br />
<pre><br />
sub printSelectTopics {<br />
print qq(Filter papers from list through a topic :<br/>);<br />
print qq(<select name="select_topic">);<br />
print qq(<option value="none">no topic, use all papers from textarea above</option>\n);<br />
my %topicIDs; my %topicIdToName;<br />
$result = $dbh->prepare( "SELECT DISTINCT(pro_process.pro_process) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') ORDER BY pro_process.pro_process" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) { $topicIDs{$row[0]}++; }<br />
my $topicIDs = join"','", sort keys %topicIDs; # for all the topicIDs, get the name from the prt_processname<br />
$result = $dbh->prepare( "SELECT prt_processid.prt_processid, prt_processname.prt_processname FROM prt_processid, prt_processname WHERE prt_processid.joinkey = prt_processname.joinkey AND prt_processid.prt_processid IN ('$topicIDs')" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) { $topicIdToName{$row[0]} = $row[1]; } # map wbprocess ids to their names for dropdown display<br />
foreach my $topic (sort keys %topicIdToName) { print qq(<option value="$topic $topicIdToName{$topic}">$topic $topicIdToName{$topic}</option>); }<br />
print qq(</select><br/>);<br />
} # sub printSelectTopics<br />
</pre><br />
<br />
The above code incorporates two Postgres queries:<br />
<br />
First, gets the WBProcessIDs that are in the topic curation OA and have a paper and the status is 'relevant':<br />
<br />
<pre><br />
SELECT DISTINCT(pro_process.pro_process) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') ORDER BY pro_process.pro_process<br />
</pre><br />
<br />
Second, the WBProcess IDs go into the variable $topic, and for each of those WBProcess IDs we get the corresponding name from the process Term OA:<br />
<br />
<pre><br />
SELECT prt_processid.prt_processid, prt_processname.prt_processname FROM prt_processid, prt_processname WHERE prt_processid.joinkey = prt_processname.joinkey AND prt_processid.prt_processid IN ('$topicIDs')<br />
</pre><br />
<br />
then we have a dropdown of process IDs ordered by ID, with the human readable process name next to it<br />
<br />
<br><br />
<br />
<br><br />
<br />
<br><br />
<br />
== Add Results Page: Loading Page and Processing Input ==<br />
<br />
<br />
The following code is responsible for printing the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]]: <br />
<br />
<pre><br />
sub printAddResultsPage {<br />
&printAddSection('', '', '', '', '', '', '');<br />
} # sub printAddResultsPage<br />
</pre><br />
<br />
This code defines the ''printAddResultsPage'' subroutine which in turn calls upon the ''printAddSection'' subroutine (see below), passing it empty strings, declaring empty/initialized values for the curator ($twonumForm), datatype ($datatypeForm), validation result ($donposnegForm), paper list ($paperResultsForm), premade comment ($selcommentForm), and free-text comment ($txtcommentForm). Once data is submitted, these variables acquire values and are reported in the event of an error.<br />
<br />
<br />
=== ''printAddSection'' Subroutine ===<br />
<br />
The following code defines the ''printAddSection'' subroutine mentioned above, which adds the form components for entering datatype, validation status, paper IDs, premade comments, and free-text comments:<br />
<br />
<pre><br />
sub printAddSection {<br />
my ($twonumForm, $datatypeForm, $donposnegForm, $paperResultsForm, $selcommentForm, $txtcommentForm) = @_;<br />
my $selected = '';<br />
&printFormOpen();<br />
&printHiddenCurator();<br />
print qq(Select your datatype :<br/>);<br />
print qq(<select name="select_datatype">);<br />
print qq(<option value="" ></option>\n);<br />
foreach my $datatype (keys %datatypes) {<br />
if ($datatype eq $datatypeForm) { $selected = qq(selected="selected"); } else { $selected = ''; }<br />
print qq(<option value="$datatype" $selected>$datatype</option>\n); }<br />
print qq(</select><br/>);<br />
print qq(Select if the data is positive or negative :<br/>);<br />
my $select_size = scalar keys %donPosNegOptions;<br />
print qq(<select name="select_donposneg" size="$select_size">);<br />
foreach my $donposnegValue (keys %donPosNegOptions) {<br />
if ($donposnegForm eq $donposnegValue) { $selected = qq(selected="selected"); } else { $selected = ''; }<br />
print qq(<option value="$donposnegValue" $selected>$donPosNegOptions{$donposnegValue}</option>\n); }<br />
print qq(</select><br/>);<br />
print qq(Enter paper data here in the format "WBPaper00001234" (paper as a whole) with separate papers in separate lines.<br/>);<br />
print qq(<textarea name="textarea_paper_results" rows="6" cols="80">$paperResultsForm</textarea><br/>\n);<br />
print qq(Select your comment (optional) :<br/>);<br />
print qq(<select name="select_comment">);<br />
print qq(<option value="" ></option>\n);<br />
foreach my $comment (keys %premadeComments) {<br />
if ($comment eq $selcommentForm) { $selected = qq(selected="selected"); } else { $selected = ''; }<br />
print qq(<option value="$comment" $selected>$premadeComments{$comment}</option>\n); }<br />
print qq(</select><br/>);<br />
print qq(Enter a free text comment to associate with all papers above (optional) :<br/>);<br />
print qq(<textarea rows="4" cols="80" name="textarea_comment">$txtcommentForm</textarea><br/>);<br />
print qq(<input type="submit" name="action" value="Add Results"><br/>\n);<br />
&printFormClose();<br />
} # sub printAddSection<br />
</pre><br />
<br />
<br />
<br />
=== ''addResults'' Subroutine ===<br />
<br />
When a curator clicks on "Add Results" on the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]], the following code will process the curator's input, catching errors when they arise:<br />
<br />
<pre><br />
sub addResults {<br />
&printFormOpen();<br />
&printHiddenCurator();<br />
my $errorData = '';<br />
my %papersToAdd;<br />
my $twonum = $curator;<br />
($oop, my $datatype) = &getHtmlVar($query, "select_datatype");<br />
unless ($datatype) { $errorData .= "Error : Need to select a datatype.<br/>\n"; }<br />
($oop, my $donposneg) = &getHtmlVar($query, "select_donposneg");<br />
unless ($donposneg) { $errorData .= "Error : Need to select whether result is curated, validated positive, or validated negative.<br/>\n"; }<br />
($oop, my $paperResults) = &getHtmlVar($query, "textarea_paper_results");<br />
if ($paperResults) {<br />
my @lines = split/\r\n/, $paperResults;<br />
foreach my $line (@lines) {<br />
if ($line =~ m/^WBPaper(\S+)$/) { $papersToAdd{$1}++; }<br />
else { $errorData .= qq(Error bad line : ${line}<br/>\n); }<br />
} } # foreach my $line (@lines)<br />
else { $errorData .= "Error : Need to enter at least one paper.<br/>\n"; }<br />
($oop, my $selcomment) = &getHtmlVar($query, "select_comment");<br />
($oop, my $txtcomment) = &getHtmlVar($query, "textarea_comment");<br />
if ($errorData) { # problem with data, do not allow creation of any data, show form again<br />
print "$errorData<br />\n";<br />
printAddSection($twonum, $datatype, $donposneg, $paperResults, $selcomment, $txtcomment); }<br />
else { # all data is okay, enter data.<br />
my $joinkeys = join"','", sort keys %papersToAdd;<br />
my ($pgDataRef) = &getPgDataForJoinkeys($joinkeys, $datatype);<br />
my %pgData = %$pgDataRef;<br />
<br />
my @data; my @duplicateData;<br />
foreach my $joinkey (sort keys %papersToAdd) {<br />
my @line;<br />
push @line, $joinkey;<br />
push @line, $datatype;<br />
push @line, $twonum;<br />
push @line, $donposneg;<br />
push @line, $selcomment;<br />
push @line, $txtcomment;<br />
if ($pgData{$joinkey}{$datatype}) { push @duplicateData, \@line; }<br />
else { push @data, \@line; }<br />
} # foreach my $joinkey (sort keys %papersToAdd)<br />
&processResultDataDuplicateData(\@data, \@duplicateData, \%pgData);<br />
} # else # if ($errorData)<br />
&printFormClose();<br />
} # sub addResults<br />
</pre><br />
<br />
<br />
This code (above) will check to ensure the following:<br />
<br />
1) A datatype has been selected<br />
<br />
2) A validation status has been entered<br />
<br />
3) At least one paper ID has been submitted<br />
<br />
4) There is only one paper ID per line<br />
<br />
5) The paper IDs entered are in the format 'WBPaper########'<br />
<br />
Any exceptions to these will result in an error message printed to the screen, in addition to reprinting the screen with the submitted values in there respective fields.<br />
<br />
If no errors are found, the script will continue by calling on the ''getPgDataForJoinkeys'' subroutine (to query Postgres for the cur_curdata associated with each paper-datatype pair; see below) and writing the new input values to the appropriate paper-datatype pairs. <br />
<br />
<br />
<pre><br />
sub getPgDataForJoinkeys {<br />
my ($joinkeys, $datatype) = @_;<br />
my %pgData;<br />
$result = $dbh->prepare( "SELECT * FROM cur_curdata WHERE cur_datatype = '$datatype' AND cur_paper IN ('$joinkeys')" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
$pgData{$row[0]}{$row[1]}{curator} = $row[2];<br />
$pgData{$row[0]}{$row[1]}{donposneg} = $row[3];<br />
$pgData{$row[0]}{$row[1]}{selcomment} = $row[4];<br />
$pgData{$row[0]}{$row[1]}{txtcomment} = $row[5];<br />
$pgData{$row[0]}{$row[1]}{timestamp} = $row[6]; }<br />
return \%pgData;<br />
} # sub getPgDataForJoinkeys<br />
</pre><br />
<br />
<br />
=== ''processResultDataDuplicateData'' Subroutine ===<br />
<br />
The code will then run the ''processResultDataDuplicateData'' subroutine (see below) to print the results to the screen as the '''New Results Summary Page''' (for new data) and/or handle the overwrite confirmation when data needs to be overwritten, generating the '''Overwrite Confirmation Page''' (see [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] section above).<br />
<br />
<br />
<pre><br />
sub processResultDataDuplicateData {<br />
my ($dataRef, $duplicateDataRef, $pgDataRef) = @_;<br />
my @data = @$dataRef;<br />
my @duplicateData = @$duplicateDataRef;<br />
my %pgData = %$pgDataRef;<br />
print qq(<table border="1">\n);<br />
print qq(<tr>${thDot}paperId</td>${thDot}datatype</td>${thDot}curator</td>${thDot}value</td>${thDot}selcomment</td>${thDot}textcomment</td></tr>\n);<br />
foreach my $lineRef (@data) {<br />
my @line = @$lineRef;<br />
foreach (@line) { unless ($_) { $_ = ''; } } # initialize values if none are there<br />
my $pgvalues = join"','", @line;<br />
my @pgcommands = ();<br />
my $pgcommand = "INSERT INTO cur_curdata VALUES ('$pgvalues');";<br />
push @pgcommands, $pgcommand;<br />
$pgcommand = "INSERT INTO cur_curdata_hst VALUES ('$pgvalues');";<br />
push @pgcommands, $pgcommand;<br />
foreach my $pgcommand (@pgcommands) {<br />
print qq($pgcommand<br/>\n);<br />
# UNCOMMENT TO POPULATE<br />
$dbh->do( $pgcommand );<br />
}<br />
my $trData = join"</td>$tdDot", @line;<br />
print qq(<tr>${tdDot}$trData</td></tr>\n);<br />
} # foreach my $lineRef (@data)<br />
print qq(</table>\n);<br />
if (scalar @data > 0) { print "results added<br />\n"; }<br />
</pre><br />
<br />
<br />
The first section of the subroutine (above) processes all data submitted through the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] that is not overwriting any existing data. The new data is submitted to Postgres and a table is printed to the screen as the '''New Results Summary Page'''.<br />
<br />
The next sections of the subroutine handle new results that will overwrite existing Postgres values for the paper-datatype pairs. First, the code determines what data already exists in Postgres for a given paper-datatype pair and stores these in $xxxxxPg variables. Then, the code determines what values for curator, validation status, premade comment, and free-text comment were submitted through the form and stores these in $xxxxxFm variables. Next, the code generates and displays a table for each paper-datatype pair that has values being overwritten (if the corresponding $xxxxxPg and $xxxxxFm variables are not equal) with each set of data (old and new) highlighted in yellow to draw attention to these for overwrite confirmation. A confirmation checkbox is displayed for each paper-datatype pair undergoing an overwrite. <br />
<br />
<pre><br />
my $overwriteCount = 0;<br />
foreach my $lineRef (@duplicateData) { # for data already in postgres, add option to overwrite<br />
my @line = @$lineRef;<br />
foreach (@line) { unless ($_) { $_ = ''; } } # initialize values if none are there<br />
my ( $joinkey, $datatype, $twonum, $donposneg, $selcomment, $txtcomment ) = @line;<br />
my ( $curatorPg, $curatorPgName, $donposnegPg, $selcommentPg, $selcommentPgText, $txtcommentPg, $timestampPg ) = ( '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;' );<br />
my ( $curatorFm, $curatorFmName, $donposnegFm, $selcommentFm, $selcommentFmText, $txtcommentFm, $timestampFm ) = ( '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '<td>&nbsp;</td>' );<br />
if ( $pgData{$joinkey}{$datatype}{curator} ) { $curatorPg = $pgData{$joinkey}{$datatype}{curator}; $curatorPgName = $curators{$curatorPg}; }<br />
if ( $pgData{$joinkey}{$datatype}{donposneg} ) { $donposnegPg = $pgData{$joinkey}{$datatype}{donposneg}; }<br />
if ( $pgData{$joinkey}{$datatype}{selcomment} ) { $selcommentPg = $pgData{$joinkey}{$datatype}{selcomment}; $selcommentPgText = $premadeComments{$selcommentPg}; }<br />
if ( $pgData{$joinkey}{$datatype}{txtcomment} ) { $txtcommentPg = $pgData{$joinkey}{$datatype}{txtcomment}; }<br />
if ( $pgData{$joinkey}{$datatype}{timestamp} ) { $timestampPg = "<td>$pgData{$joinkey}{$datatype}{timestamp};</td>" }<br />
if ( $twonum ) { $curatorFm = $twonum;<br />
if ( $curators{$curatorFm} ) { $curatorFmName = $curators{$curatorFm}; } }<br />
if ( $donposneg ) { $donposnegFm = $donposneg; }<br />
if ( $selcomment ) { $selcommentFm = $selcomment;<br />
if ( $premadeComments{$selcommentFm} ) { $selcommentFmText = $premadeComments{$selcommentFm}; } }<br />
if ( $txtcomment ) { $txtcommentFm = $txtcomment; }<br />
my $isDifferent = 0; # if any of the non-key values has changed, show option to overwrite<br />
if ($curatorFmName ne $curatorPgName) {<br />
$isDifferent++;<br />
$curatorFmName = '<td style="background-color:yellow">' . $curatorFmName . '</td>';<br />
$curatorPgName = '<td style="background-color:yellow">' . $curatorPgName . '</td>'; }<br />
else {<br />
$curatorFmName = '<td>' . $curatorFmName . '</td>';<br />
$curatorPgName = '<td>' . $curatorPgName . '</td>'; }<br />
if ($donposnegFm ne $donposnegPg) {<br />
$isDifferent++;<br />
$donposnegFm = '<td style="background-color:yellow">' . $donposnegFm . '</td>';<br />
$donposnegPg = '<td style="background-color:yellow">' . $donposnegPg . '</td>'; }<br />
else {<br />
$donposnegFm = '<td>' . $donposnegFm . '</td>';<br />
$donposnegPg = '<td>' . $donposnegPg . '</td>'; }<br />
if ($selcommentFmText ne $selcommentPgText) {<br />
$isDifferent++;<br />
$selcommentFmText = '<td style="background-color:yellow">' . $selcommentFmText . '</td>';<br />
$selcommentPgText = '<td style="background-color:yellow">' . $selcommentPgText . '</td>'; }<br />
else {<br />
$selcommentFmText = '<td>' . $selcommentFmText . '</td>';<br />
$selcommentPgText = '<td>' . $selcommentPgText . '</td>'; }<br />
if ($txtcommentFm ne $txtcommentPg) {<br />
$isDifferent++;<br />
$txtcommentFm = '<td style="background-color:yellow">' . $txtcommentFm . '</td>';<br />
$txtcommentPg = '<td style="background-color:yellow">' . $txtcommentPg . '</td>'; }<br />
else {<br />
$txtcommentFm = '<td>' . $txtcommentFm . '</td>';<br />
$txtcommentPg = '<td>' . $txtcommentPg . '</td>'; }<br />
next unless ($isDifferent > 0);<br />
$overwriteCount++;<br />
print qq(<input type="hidden" name="joinkey_$overwriteCount" value="$joinkey" >);<br />
print qq(<input type="hidden" name="datatype_$overwriteCount" value="$datatype" >);<br />
print qq(<input type="hidden" name="twonum_$overwriteCount" value="$twonum" >);<br />
print qq(<input type="hidden" name="donposneg_$overwriteCount" value="$donposneg" >);<br />
print qq(<input type="hidden" name="selcomment_$overwriteCount" value="$selcomment" >);<br />
print qq(<input type="hidden" name="txtcomment_$overwriteCount" value="$txtcomment" >);<br />
print qq(WBPaper$joinkey $datatype : <br/>\n);<br />
print qq(<table border="1">\n);<br />
print qq(<tr><th>&nbsp;</th><th>curator</th><th>value</th><th>selcomment</th><th>txtcomment</th><th>timestamp</th></tr>);<br />
print qq(<tr><td>old</td>${curatorPgName}${donposnegPg}${selcommentPgText}${txtcommentPg}${timestampPg}</tr>\n);<br />
print qq(<tr><td>new</td>${curatorFmName}${donposnegFm}${selcommentFmText}${txtcommentFm}${timestampFm}</tr>\n);<br />
print qq(</table>\n);<br />
print qq(Confirm change <input type="checkbox" name="checkbox_$overwriteCount" value="overwrite"><br/><br/>\n);<br />
} # foreach my $lineRef (@data)<br />
if ($overwriteCount > 0) {<br />
print qq(<input type="hidden" name="overwrite_count" value="$overwriteCount">);<br />
print qq(<input type="submit" name="action" value="Overwrite Selected Results"><br/>\n); }<br />
} # sub processResultDataDuplicateData<br />
</pre><br />
<br />
<br />
Once the curator has confirmed the overwrite of the relevant results and clicked on the "Overwrite Selected Results", the ''overwriteSelectedResults'' subroutine (below) will run to officially overwrite the data in the Postgres cur_curdata table.<br />
<br />
<pre><br />
sub overwriteSelectedResults {<br />
($oop, my $overwriteCount) = &getHtmlVar($query, "overwrite_count");<br />
my @pgcommands;<br />
for my $i (1 .. $overwriteCount) {<br />
($oop, my $overwrite) = &getHtmlVar($query, "checkbox_$i");<br />
next unless ($overwrite eq 'overwrite');<br />
($oop, my $joinkey ) = &getHtmlVar($query, "joinkey_$i" );<br />
($oop, my $datatype ) = &getHtmlVar($query, "datatype_$i" );<br />
($oop, my $twonum ) = &getHtmlVar($query, "twonum_$i" );<br />
($oop, my $donposneg ) = &getHtmlVar($query, "donposneg_$i" );<br />
($oop, my $selcomment ) = &getHtmlVar($query, "selcomment_$i" );<br />
($oop, my $txtcomment ) = &getHtmlVar($query, "txtcomment_$i" );<br />
unless ($donposneg) { $donposneg = ''; } unless ($selcomment) { $selcomment = ''; } unless ($txtcomment) { $txtcomment = ''; }<br />
push @pgcommands, qq(DELETE FROM cur_curdata WHERE cur_paper = '$joinkey' AND cur_datatype = '$datatype' AND cur_curator = '$twonum');<br />
push @pgcommands, qq(INSERT INTO cur_curdata VALUES ('$joinkey', '$datatype', '$twonum', '$donposneg', '$selcomment', '$txtcomment'));<br />
push @pgcommands, qq(INSERT INTO cur_curdata_hst VALUES ('$joinkey', '$datatype', '$twonum', '$donposneg', '$selcomment', '$txtcomment'));<br />
} # for my $i (1 .. $overwriteCount)<br />
foreach my $pgcommand (@pgcommands) {<br />
print "$pgcommand<br />\n";<br />
# UNCOMMENT TO POPULATE<br />
$dbh->do( $pgcommand );<br />
} # foreach my $pgcommand (@pgcommands)<br />
} # sub overwriteSelectedResults<br />
<br />
</pre><br />
<br />
<br />
<br />
== Detailed Results of Papers Page: Loading Data (the ''getResults'' Subroutine) ==<br />
<br />
The following code is for the ''getResults'' subroutine which displays the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]] after receiving input from the curator about what to display (from a [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]] or [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]] submission).<br />
<br />
The first section of code collects all curatable papers, processes the checkbox input for the various datatypes, and processes the paper IDs that were submitted in the paper field.<br />
<br />
<pre><br />
sub getResults {<br />
&printFormOpen();<br />
&printHiddenCurator();<br />
&populateCuratablePapers(); # assume for now that we only care about curatable papers<br />
<br />
($oop, my $all_datatypes_checkbox) = &getHtmlVar($query, "checkbox_all_datatypes");<br />
unless ($all_datatypes_checkbox) { $all_datatypes_checkbox = ''; }<br />
foreach my $datatype (keys %datatypes) {<br />
($oop, my $chosen) = &getHtmlVar($query, "checkbox_$datatype");<br />
unless ($chosen) { $chosen = ''; }<br />
if ($all_datatypes_checkbox eq 'all') { $chosen = $datatype; } # if all datatypes checkbox was selected, set that datatype's chosen to that datatype<br />
print qq(<input type="hidden" name="checkbox_$datatype" value="$chosen">\n);<br />
if ($chosen) { $chosenDatatypes{$chosen}++; }<br />
} # foreach my $datatype (keys %datatypes)<br />
<br />
($oop, my $specificPapers) = &getHtmlVar($query, "specific_papers");<br />
my %filterPapers; my %specificPapers; my %topicPapers;<br />
if ($specificPapers) { my (@joinkeys) = $specificPapers =~ m/(\d+)/g; foreach (@joinkeys) { $specificPapers{$_}++; } }<br />
($oop, my $topic) = &getHtmlVar($query, "select_topic"); # if there's a selected topic replace specific papers with those from topic<br />
unless ($topic) { $topic = 'none'; }<br />
if ($topic ne 'none') {<br />
print "using topic $topic<br/>\n";<br />
my ($topicID) = $topic =~ m/(WBbiopr:\d+)/; # get the WBProcessID from the topic which includes the name<br />
print qq(<input type="hidden" name="select_topic" value="$topic">\n);<br />
$result = $dbh->prepare( "SELECT DISTINCT(pro_paper.pro_paper) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') AND pro_process.pro_process = '$topicID'" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) { $row[0] =~ s/WBPaper//; $topicPapers{$row[0]}++; }<br />
} # if ($topic ne 'none')<br />
if ($specificPapers && ($topic ne 'none')) {<br />
foreach (sort keys %specificPapers) { if ($topicPapers{$_}) { $chosenPapers{$_}++; } } }<br />
elsif ($specificPapers) {<br />
foreach (sort keys %specificPapers) { $chosenPapers{$_}++; } }<br />
elsif ($topic ne 'none') {<br />
foreach (sort keys %topicPapers) { $chosenPapers{$_}++; } }<br />
else { $chosenPapers{'all'}++; } <br />
print qq(<input type="hidden" name="specific_papers" value="$specificPapers">\n);<br />
</pre><br />
<br />
The above code looks at the Topic Curation OA for rows where a paper matches the selected topic, the status is 'relevant', and gets the associated WBPapers<br />
<br />
How filtering works: <br />
There are two lists of papers:<br />
(1) papers in the text area box, (2) papers for the topic (from the Topic Curation OA)<br />
The filtering code looks for papers that exist in both lists and generates results to display for the filtered list of papers. If no papers are entered in the text area, the resulting list of papers will be whatever papers are affiliated with the Topic. If there are no papers affiliated with the topic, the resulting list will be whatever papers were entered into the paper text area. If there are no papers in either the text area or affiliated with the Topic, it will return all papers relevant to this form.<br />
<br />
<br />
The next section of code populates cur_curdata for the respective paper-datatype pairs given the input from above. The code takes into account whether to show OA data and what flagging methods to display for each paper-datatype pair, to prepare them for display. Additionally, the code is now processing how many papers to display per page, setting the default page (if multiple pages of results) to see page "0", and determining whether or not the curator wishes to see the journal, PMID, and/or PDF links for each paper.<br />
<br />
<pre><br />
&populateCurCurData(); # always show curator values since they have to be editable<br />
<br />
($oop, my $displayOa) = &getHtmlVar($query, "checkbox_oa"); unless ($displayOa) { $displayOa = ''; }<br />
($oop, my $displayCfp) = &getHtmlVar($query, "checkbox_cfp"); unless ($displayCfp) { $displayCfp = ''; }<br />
($oop, my $displayAfp) = &getHtmlVar($query, "checkbox_afp"); unless ($displayAfp) { $displayAfp = ''; }<br />
($oop, my $displaySvm) = &getHtmlVar($query, "checkbox_svm"); unless ($displaySvm) { $displaySvm = ''; }<br />
print qq(<input type="hidden" name="checkbox_oa" value="$displayOa" >\n);<br />
print qq(<input type="hidden" name="checkbox_cfp" value="$displayCfp">\n);<br />
print qq(<input type="hidden" name="checkbox_afp" value="$displayAfp">\n);<br />
print qq(<input type="hidden" name="checkbox_svm" value="$displaySvm">\n);<br />
if ($displayOa) { &populateOaData(); }<br />
if ($displayCfp) { &populateCfpData(); }<br />
if ($displayAfp) { &populateAfpData(); }<br />
if ($displaySvm) { &populateSvmData(); }<br />
<br />
($oop, my $showJournal) = &getHtmlVar($query, "checkbox_journal"); unless ($showJournal) { $showJournal = ''; }<br />
($oop, my $showPmid) = &getHtmlVar($query, "checkbox_pmid"); unless ($showPmid) { $showPmid = ''; }<br />
($oop, my $showPdf) = &getHtmlVar($query, "checkbox_pdf"); unless ($showPdf) { $showPdf = ''; }<br />
print qq(<input type="hidden" name="checkbox_journal" value="$showJournal">\n);<br />
print qq(<input type="hidden" name="checkbox_pmid" value="$showPmid">\n);<br />
print qq(<input type="hidden" name="checkbox_pdf" value="$showPdf">\n);<br />
<br />
($oop, my $papersPerPage) = &getHtmlVar($query, "papers_per_page");<br />
($oop, my $pageSelected) = &getHtmlVar($query, "select_page");<br />
unless ($papersPerPage) { $papersPerPage = 10; }<br />
unless ($pageSelected) { $pageSelected = 0; }<br />
print qq(<input type="hidden" name="papers_per_page" value="$papersPerPage">\n);<br />
<br />
my @headerRow = qw( paperID );<br />
if ($showJournal) { push @headerRow, "journal"; &populateJournal(); }<br />
if ($showPmid) { push @headerRow, "pmid"; &populatePmid(); }<br />
if ($showPdf) { push @headerRow, "pdf"; &populatePdf(); }<br />
</pre><br />
<br />
<br />
The code now generates a hash (%trs) of rows of results (not necessarily in the order submitted). Any paper that has data for the relevant datatype and flagging method, this data is then loaded into the appropriate column of each row. For each paper submitted and each datatype requested, the code will load the flagging results, cur_curdata, and OA (curation) data into the ''%allPaperData'' hash table. The code then loads, for each paper queried and datatype requested, the relevant results. The first column for any paper will display the WBPaper ID#. If the PMID, journal, and/or PDF link were requested to be displayed, they are displayed in the next columns for a given paper. Next is displayed the datatype column. In the next column (if SVM data was requested for view), the SVM data for the each paper-datatype pair for the paper are populated into the table, highlighting the "high", "medium", and "low" SVM results in decreasing intensities of red highlight, respectively. The next columns are populated with CFP, AFP, and OA data (if requested) for each paper-datatype pair. Note that a blank field for CFP and AFP are simply blank ("") whereas an empty result for OA data is represented by "oa_blank". Otherwise, the CFP and AFP fields would be populated with free-text entries from the CFP and AFP results and the OA field would indicate "curated" if data was found in the OA for the paper-datatype pair. In the next columns are displayed the curator drop-down menu, the validation status drop-down menu, the premade comment drop-down menu, and the free-text comment field. Any new results submitted via this page for a paper-datatype pair will automatically be attributed to the curator that is logged in, unless their is already a curator listed in the curator field or the curator explicitly selects a curator from the curator drop-down list. In the free-text comment field, if there are more than 20 characters stored, only the first 20 characters will be displayed followed by an ellipsis ("..."). Clicking inside the free-text field will open up the full view of the text and while editing will remain in full text view. Subsequent clicking outside of the text field will revert back to the truncated, ellipsis view to conserve screen space. Each cell in the table is outlined with a dotted line format.<br />
<br />
<br />
<pre><br />
my %trs; # td data for each table row<br />
my %paperPosNegOkay; # papers that have positive-negative data okay, so show all svm results for that paper even if a given row isn't positive-negative okay<br />
my %paperInfo; # for a joinkey, all the paper information about it to show in a big rowspan for that table row<br />
<br />
my %allPaperData; # hash of datatype - joinkey for all posible queried data structures, to key off from this when there are no svm results for a data structure with data.<br />
foreach my $datatype (keys %svmData) { foreach my $joinkey (keys %{ $svmData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }<br />
foreach my $datatype (keys %curData) { foreach my $joinkey (keys %{ $curData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }<br />
foreach my $datatype (keys %oaData) { foreach my $joinkey (keys %{ $oaData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }<br />
foreach my $datatype (keys %cfpData) { foreach my $joinkey (keys %{ $cfpData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }<br />
foreach my $datatype (keys %afpData) { foreach my $joinkey (keys %{ $afpData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }<br />
<br />
my $trCounter = 0;<br />
foreach my $joinkey (sort keys %curatablePapers) { # TODO curatablePapers or allPaperData that have some flag ?<br />
next unless ($chosenPapers{$joinkey} || $chosenPapers{all});<br />
<br />
push @{ $paperInfo{$joinkey} }, $joinkey;<br />
my $journal = ''; my $pmid = ''; my $pdf = ''; my $primaryData = '';<br />
if ($showJournal) {<br />
if ($journal{$joinkey}) { $journal = $journal{$joinkey}; }<br />
push @{ $paperInfo{$joinkey} }, $journal; }<br />
if ($showPmid) {<br />
if ($pmid{$joinkey}) { $pmid = $pmid{$joinkey}; }<br />
push @{ $paperInfo{$joinkey} }, $pmid; }<br />
if ($showPdf) {<br />
if ($pdf{$joinkey}) { $pdf = $pdf{$joinkey}; }<br />
push @{ $paperInfo{$joinkey} }, $pdf; }<br />
<br />
foreach my $datatype (sort keys %{ $allPaperData{$joinkey} }) {<br />
next unless ($chosenDatatypes{$datatype}); # show only results for selected datatype<br />
my @dataRow = ( "$datatype" );<br />
$trCounter++;<br />
if ($displaySvm) {<br />
my $svmResult = '';<br />
if ($svmData{$datatype}{$joinkey}) { $svmResult = $svmData{$datatype}{$joinkey}; }<br />
my $bgcolor = 'white';<br />
if ($svmResult eq 'high') { $bgcolor = '#FFA0A0'; }<br />
elsif ($svmResult eq 'medium') { $bgcolor = '#FFC8C8'; }<br />
elsif ($svmResult eq 'low') { $bgcolor = '#FFE0E0'; }<br />
$svmResult = qq(<span style="background-color: $bgcolor">$svmResult</span>);<br />
push @dataRow, $svmResult;<br />
} # if ($displaySvm)<br />
<br />
if ($displayCfp) {<br />
my $cfpResult = '';<br />
if ($cfpData{$datatype}{$joinkey}) { $cfpResult = $cfpData{$datatype}{$joinkey}; }<br />
push @dataRow, $cfpResult;<br />
}<br />
<br />
if ($displayAfp) {<br />
my $afpResult = '';<br />
if ($afpData{$datatype}{$joinkey}) { $afpResult = $afpData{$datatype}{$joinkey}; }<br />
push @dataRow, $afpResult;<br />
}<br />
<br />
if ($displayOa) {<br />
my $oaResult = 'oa_blank';<br />
if ($oaData{$datatype}{$joinkey}) { $oaResult = $oaData{$datatype}{$joinkey}; }<br />
push @dataRow, $oaResult;<br />
}<br />
<br />
my $thisCurator = ''; # curator in cur_curdata for this paper-datatype if it has a value<br />
if ( $curData{$datatype}{$joinkey}{curator} ) { $thisCurator = $curData{$datatype}{$joinkey}{curator}; }<br />
my $curatorSelectCurator = qq(<select name="select_curator_curator_$trCounter" size="1">\n<option value=""></option>\n);<br />
foreach my $curator_two (keys %curators) { # display curators in alphabetical (tied hash) order, if IP matches existing ip record, select it<br />
if ($thisCurator eq $curator_two) { $curatorSelectCurator .= qq(<option value="$curator_two" selected="selected">$curators{$curator_two}</option>\n); }<br />
else { $curatorSelectCurator .= qq(<option value="$curator_two">$curators{$curator_two}</option>\n); } }<br />
$curatorSelectCurator .= qq(</select>);<br />
<br />
$curatorSelectCurator .= qq(<input type="hidden" name="joinkey_$trCounter" value="$joinkey" >); # these are required, arbitrarily added here<br />
$curatorSelectCurator .= qq(<input type="hidden" name="datatype_$trCounter" value="$datatype">); # these are required, arbitrarily added here<br />
push @dataRow, $curatorSelectCurator;<br />
<br />
my $thisDonPosNeg = ''; if ( $curData{$datatype}{$joinkey}{donposneg} ) { $thisDonPosNeg = $curData{$datatype}{$joinkey}{donposneg}; }<br />
my $curatorSelectDonposneg = qq(<select name="select_curator_donposneg_$trCounter">);<br />
foreach my $donposneg (keys %donPosNegOptions) { # display curators in alphabetical (tied hash) order, if IP matches existing ip record, select it<br />
if ($thisDonPosNeg eq $donposneg) { $curatorSelectDonposneg .= qq(<option value="$donposneg" selected="selected">$donPosNegOptions{$donposneg}</option>\n); }<br />
else { $curatorSelectDonposneg .= qq(<option value="$donposneg" >$donPosNegOptions{$donposneg}</option>\n); } }<br />
$curatorSelectDonposneg .= qq(</select>);<br />
push @dataRow, $curatorSelectDonposneg;<br />
<br />
my $thisSelComment = ''; if ( $curData{$datatype}{$joinkey}{selcomment} ) { $thisSelComment = $curData{$datatype}{$joinkey}{selcomment}; }<br />
my $curatorSelectComment = qq(<select name="select_curator_comment_$trCounter">);<br />
$curatorSelectComment .= qq(<option value="" ></option>\n);<br />
foreach my $comment (keys %premadeComments) {<br />
if ($thisSelComment eq $comment) { $curatorSelectComment .= qq(<option value="$comment" selected="selected">$premadeComments{$comment}</option>\n); }<br />
else { $curatorSelectComment .= qq(<option value="$comment" >$premadeComments{$comment}</option>\n); } }<br />
$curatorSelectComment .= qq(</select>);<br />
push @dataRow, $curatorSelectComment;<br />
<br />
my $txtcomment = ''; if ( $curData{$datatype}{$joinkey}{txtcomment} ) { $txtcomment = $curData{$datatype}{$joinkey}{txtcomment}; }<br />
my $shortTxtComment = $txtcomment; unless ($shortTxtComment) { $shortTxtComment = '&nbsp;'; }<br />
if ($txtcomment =~ m/^(.{20})/) { $shortTxtComment = $1; $shortTxtComment .= '...'; }<br />
my $curatorTextareaComment = qq(<div id="div_curator_comment_$trCounter" onclick="document.getElementById('div_curator_comment_$trCounter').style.display = 'none'; document.getElementById('textarea_curator_comment_$trCounter').style.display = ''; document.getElementById('textarea_curator_comment_$trCounter').focus();" >$shortTxtComment</div>\n);<br />
$curatorTextareaComment .= qq(<textarea rows="4" cols="80" id="textarea_curator_comment_$trCounter" name="textarea_curator_comment_$trCounter" style="display:none" onblur="document.getElementById('div_curator_comment_$trCounter').style.display = ''; document.getElementById('textarea_curator_comment_$trCounter').style.display = 'none'; var divValue = document.getElementById('textarea_curator_comment_$trCounter').value; if (divValue === '') { divValue = '&nbsp;'; } document.getElementById('div_curator_comment_$trCounter').innerHTML = divValue; ">$txtcomment</textarea>\n);<br />
# $curatorTextareaComment .= qq(<textarea rows="4" cols="80" id="textarea_curator_comment_$trCounter" name="textarea_curator_comment_$trCounter" style="display:none" onblur="document.getElementById('div_curator_comment_$trCounter').style.display = ''; document.getElementById('textarea_curator_comment_$trCounter').style.display = 'none'; document.getElementById('div_curator_comment_$trCounter').innerHTML = document.getElementById('textarea_curator_comment_$trCounter').value.substring(0,20)">$txtcomment</textarea>\n); # to get the first 20 characters without adding ...<br />
push @dataRow, $curatorTextareaComment;<br />
<br />
$paperPosNegOkay{$joinkey}++; # all papers always okay for pos/neg since we no longer have pos/neg filtering 2012 11 08<br />
<br />
my $trData = join"</td>$tdDot", @dataRow;<br />
push @{ $trs{$joinkey} }, qq(${tdDot}$trData</td></tr>\n);<br />
} # foreach my $datatype (sort keys %{ $allPaperData{$joinkey} })<br />
} # foreach my $joinkey (sort keys %allPaperData)<br />
</pre><br />
<br />
<br />
The following code will print the rows of the table, displaying the requested information for each paper and paper-datatype pair. The code collects all relevant, valid papers and then calculates the number of pages that the results will be distributed across. The number of pages is calculated by dividing the total number of papers resulting from the query and dividing by the number of papers per page requested, and rounding up to an integer value. A drop-down menu is then generated and displayed, providing the curator with a means to access the paper results not available from the current page. A "Get Results" button is displayed which allows the curator to request a new page of results once a page has been selected from the page number drop-down menu. The form displays the total number of papers in the query result. The HTML table is then printed, adding column headers. Columns and their respective headers for cur_curdata (paper ID, curator, validation status, premade comment, and free-text comment) are always displayed, whereas all other columns and headers are optional. Cell borders are displayed with dotted lines.<br />
<br />
To determine which papers to display on a page, the code determines what page number (n) is currently to be viewed and skips the first (n-1)*(papers per page) papers. The following paper and subsequent papers, up to the number requested per page, are then displayed. The code is written so that for each paper, the paper ID, PMID, journal, and PDF link are all only displayed once per paper, regardless of how many datatypes are being viewed. A "Submit New Results" button is then printed at the bottom of the complete table and the form is closed.<br />
<br />
<br />
<pre><br />
print qq(<input type="hidden" name="trCounter" value="$trCounter">);<br />
<br />
my $joinkeysAmount = scalar(keys %paperPosNegOkay);<br />
my $pagesAmount = ceil($joinkeysAmount / $papersPerPage);<br />
print qq(Page number <select name="select_page">);<br />
for my $i (1 .. $pagesAmount) {<br />
if ($i == $pageSelected) { print qq(<option selected="selected">$i</option>\n); }<br />
else { print qq(<option>$i</option>\n); }<br />
} # for my $i (1 .. $pagesAmount)<br />
print qq(</select>);<br />
print qq(<input type="submit" name="action" value="Get Results">\n);<br />
print qq(amount of papers $joinkeysAmount<br/>\n);<br />
print qq(<br />\n);<br />
<br />
print qq(<table border="1">\n);<br />
push @headerRow, "datatype";<br />
if ($displaySvm) { push @headerRow, "SVM Prediction"; }<br />
if ($displayCfp) { push @headerRow, "cfp value"; }<br />
if ($displayAfp) { push @headerRow, "afp value"; }<br />
if ($displayOa) { push @headerRow, "oa value"; }<br />
push @headerRow, "curator"; push @headerRow, "new result"; push @headerRow, "select comment"; push @headerRow, "textarea comment";<br />
my $headerRow = join"</th>$thDot", @headerRow;<br />
$headerRow = qq(<tr>$thDot) . $headerRow . qq(</th></tr>);<br />
print qq($headerRow\n);<br />
<br />
my $papCount = 0;<br />
my $papCountToSkip = 0; my $papToSkip = ($pageSelected - 1 ) * $papersPerPage;<br />
foreach my $joinkey (sort keys %paperPosNegOkay) { # from all papers that have good positve-negative values, show all TRs<br />
$papCountToSkip++; next if ($papCountToSkip <= $papToSkip); # skip entries until at the proper page<br />
$papCount++;<br />
last if ($papCount > $papersPerPage);<br />
my $trsInPaperAmount = scalar @{ $trs{$joinkey} }; # amount of rows for a joinkey, make that the rowspan<br />
my $firstTr = shift @{ $trs{$joinkey} }; # the first table row needs the paper info and rowspan<br />
my $tdMultiRow = $tdDot; $tdMultiRow =~ s/>$/ rowspan="$trsInPaperAmount">/; # add the rowspan to the td style<br />
my $paperInfoTds = join"</td>$tdMultiRow", @{ $paperInfo{$joinkey} }; # make paper info tds from %paperInfo<br />
print qq(<tr>${tdMultiRow}$paperInfoTds</td>$firstTr\n); # print the first row which has paper info<br />
foreach my $tr (@{ $trs{$joinkey} }) { print qq(<tr>$tr\n); } } # print other table rows without paper info<br />
print qq(</table>\n);<br />
<br />
print qq(<input type="submit" name="action" value="Submit New Results"><br/>\n);<br />
<br />
&printFormClose();<br />
} # sub getResults<br />
<br />
</pre><br />
<br />
<br><br />
<br><br />
<br />
== Detailed Results of Papers Page: Processing Input (the ''submitNewResults'' Subroutine) ==<br />
<br />
When a curator clicks on the "Submit New Results" button at the bottom of the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]], the ''submitNewResults'' subroutine (below) is run. First, the subroutine opens the form and prints the hidden curator, capturing the curator currently logged in. The code then looks at all paper-datatype pair data loaded by the ''getResults'' subroutine which generated the data to display on the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]], skipping over papers that do not have a "new result" (validation status) entry. For papers that have a "new result" entry, if there is a curator already listed in the curator field, that curator is kept as the curator for the given paper-datatype pair. If no curator is present in the field at the time of submission, the curator logged in is entered as the curator for the new data. Alternatively, if a curator has been manually selected from the curator drop-down menu, this selected curator is entered. The data from each row of the form with a "new result" entry is then compared to its respective data in Postgres. If the data is different it is prepared for display in the '''New Results Summary Page''' or the '''Overwrite Confirmation Page''' as described in the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] and the [[New_2012_Curation_Status#Add_Results_Page:_Loading_Page_and_Processing_Input|Add Results Page: Loading Page and Processing Input]] sections. Once complete, the form is closed.<br />
<br />
<pre><br />
sub submitNewResults {<br />
&printFormOpen();<br />
&printHiddenCurator();<br />
($oop, my $trAmount) = &getHtmlVar($query, "trCounter");<br />
my %papersToAdd;<br />
my %curatorData;<br />
for my $i (1 .. $trAmount) {<br />
($oop, my $curatorDonposneg) = &getHtmlVar($query, "select_curator_donposneg_$i");<br />
next unless $curatorDonposneg; # skip entries without a curator result for done / positive / negative<br />
($oop, my $dropdownCurator) = &getHtmlVar($query, "select_curator_curator_$i");<br />
my $activeCurator = $curator; if ($dropdownCurator) { $activeCurator = $dropdownCurator; } # if a curator was chosen use that, otherwise use logged in curator<br />
($oop, my $curatorSelComment) = &getHtmlVar($query, "select_curator_comment_$i");<br />
($oop, my $curatorTxtComment) = &getHtmlVar($query, "textarea_curator_comment_$i");<br />
($oop, my $joinkey) = &getHtmlVar($query, "joinkey_$i");<br />
($oop, my $datatype) = &getHtmlVar($query, "datatype_$i");<br />
<br />
$papersToAdd{$datatype}{$joinkey}++;<br />
$curatorData{$joinkey}{$datatype}{curator} = $activeCurator;<br />
$curatorData{$joinkey}{$datatype}{donposneg} = $curatorDonposneg;<br />
$curatorData{$joinkey}{$datatype}{selcomment} = $curatorSelComment;<br />
$curatorData{$joinkey}{$datatype}{txtcomment} = $curatorTxtComment;<br />
} # for my $i (1 .. $trAmount)<br />
my %pgData;<br />
foreach my $datatype (sort keys %papersToAdd) {<br />
my $joinkeys = join"','", sort keys %{ $papersToAdd{$datatype} };<br />
my ($pgDatatypeDataRef) = &getPgDataForJoinkeys($joinkeys, $datatype);<br />
my %pgDatatypeData = %$pgDatatypeDataRef;<br />
foreach my $joinkey (keys %pgDatatypeData) {<br />
foreach my $datatype (keys %{ $pgDatatypeData{$joinkey} }) {<br />
foreach my $valuetype (keys %{ $pgDatatypeData{$joinkey}{$datatype} }) {<br />
$pgData{$joinkey}{$datatype}{$valuetype} = $pgDatatypeData{$joinkey}{$datatype}{$valuetype}; } } } }<br />
<br />
my @data; my @duplicateData;<br />
foreach my $joinkey (sort keys %curatorData) {<br />
foreach my $datatype (keys %{ $curatorData{$joinkey} }) {<br />
my $thisCurator = $curatorData{$joinkey}{$datatype}{curator};<br />
my $donposneg = $curatorData{$joinkey}{$datatype}{donposneg};<br />
my $selcomment = $curatorData{$joinkey}{$datatype}{selcomment};<br />
my $txtcomment = $curatorData{$joinkey}{$datatype}{txtcomment};<br />
my @line;<br />
push @line, $joinkey;<br />
push @line, $datatype;<br />
push @line, $thisCurator;<br />
push @line, $donposneg;<br />
push @line, $selcomment;<br />
push @line, $txtcomment;<br />
if ($pgData{$joinkey}{$datatype}) { push @duplicateData, \@line; }<br />
else { push @data, \@line; }<br />
} }<br />
&processResultDataDuplicateData(\@data, \@duplicateData, \%pgData);<br />
&printFormClose();<br />
} # sub submitNewResults<br />
<br />
</pre><br />
<br />
<br><br />
<br><br />
<br />
== Printing Curation Statistics Table ==<br />
<br />
<br />
The beginning lines (and some later lines) of the ''printCurationStatisticsTable'' subroutine provide code to display the loading times of the Curation Statistics table as a whole, and the loading/processing times of each portion of the code. The lines are ignored when the variable ''$showTimes'' is set to zero. By default, all papers and datatypes are loaded for performing calculations and display. If only some flagging methods or datatypes are selected from the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], only papers flagged by the selected methods will load and only numbers for the selected datatypes will be displayed.<br />
<br />
Row header columns are set to 600 pixels in width. Columns for individual datatypes are set to 120 pixels in width. If more than 6 datatypes are selected for viewing (via the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]]; ALL datatypes are selected by default), the row header column will appear at the right hand side of the table, in addition to the left hand side (default). The overall table width is calculated accordingly. Curatable papers and curated papers are populated into the table. Flagging methods are then loaded. ALL flagging methods (SVM, AFP, CFP) are loaded by default, but specific flagging methods may be selected via the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]]. Column headers for each datatype (requested from the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]] or ALL datatypes if the main Curation Statistics Page) are then printed. The total number of curatable papers is printed once (for the entire table) followed by the number of objects curated (per datatype) and the average number of objects curated per paper (per datatype). Note that for the "rnai" datatype, curation statistics from 8 large scale papers have been manually added to the code as that data does not exist in the RNAi OA. 2084 objects coming from Chronograms have been manually added to the code for otherexpr as that data does not exist in the expression OA but only in Citace Minus. 19052 objects coming from expression images from Itai Yanai's study have been manually added to the code for picture as that data does not exist in the picture OA but only in Citace Minus.<br />
<br />
Next are stats for all curated papers, then all validated papers, "Any" flagged papers and "Intersection" flagged papers. The statistics are calculated and then displayed for each of these sections, as determined by any options selected (if applicable).<br />
<br />
<br />
<pre><br />
sub printCurationStatisticsTable {<br />
my ($showTimes, $startprintCurationStatisticsTable, $start, $end, $diff) = (0, '', '', '', '');<br />
if ($showTimes) { $startprintCurationStatisticsTable = time; $start = $startprintCurationStatisticsTable; }<br />
<br />
$chosenPapers{all}++;<br />
my @datatypesToShow;<br />
<br />
&populateStatisticsHashToLabel();<br />
<br />
($oop, my $all_datatypes_checkbox) = &getHtmlVar($query, "checkbox_all_datatypes");<br />
unless ($all_datatypes_checkbox) { $all_datatypes_checkbox = ''; }<br />
foreach my $datatype (sort keys %datatypes) { # don't tie %datatypes<br />
($oop, my $chosen) = &getHtmlVar($query, "checkbox_$datatype");<br />
if ($all_datatypes_checkbox eq 'all') { $chosen = $datatype; } # if all datatypes checkbox was selected, set that datatype's chosen to that datatype<br />
if ($chosen) { $chosenDatatypes{$chosen}++; push @datatypesToShow, $datatype; }<br />
} # foreach my $datatype (sort %datatypes)<br />
<br />
my $datatypesToShowAmount = scalar @datatypesToShow;<br />
my $rowNameTdWidth = '600'; my $datatypeTdWidth = '120';<br />
my $labelRightFlag = 0; if ($datatypesToShowAmount > 6) { $labelRightFlag++; }<br />
my $tableWidth = $rowNameTdWidth + $datatypesToShowAmount * $datatypeTdWidth;<br />
if ($labelRightFlag) { $tableWidth = 2*$rowNameTdWidth + $datatypesToShowAmount * $datatypeTdWidth; }<br />
print qq(<table width="$tableWidth" class="bordered" border="1">\n);<br />
<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "beforePopulateCuratablePapers $diff<br>"; }<br />
&populateCuratablePapers();<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateCuratablePapers $diff<br>"; }<br />
&populateCuratedPapers();<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateCuratedPapers $diff<br>"; }<br />
<br />
my @flaggingMethods; # when calculating any or all methods, only want papers for these flagging methods<br />
($oop, my $displayAll) = &getHtmlVar($query, "checkbox_all_flagging_methods");<br />
($oop, my $displayCfp) = &getHtmlVar($query, "checkbox_cfp");<br />
if ( ($displayAll) || ($displayCfp) ) { &populateCfpData(); push @flaggingMethods, 'checkbox_cfp'; }<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateCfpData $diff<br>"; }<br />
($oop, my $displayAfp) = &getHtmlVar($query, "checkbox_afp");<br />
if ( ($displayAll) || ($displayAfp) ) { &populateAfpData(); push @flaggingMethods, 'checkbox_afp'; }<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateAfpData $diff<br>"; }<br />
($oop, my $displaySvm) = &getHtmlVar($query, "checkbox_svm");<br />
if ( ($displayAll) || ($displaySvm) ) { &populateSvmData(); push @flaggingMethods, 'checkbox_svm'; }<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateSvmData $diff<br>"; }<br />
my $flaggingMethods = join"=on&", @flaggingMethods; $flaggingMethods .= '=on';<br />
<br />
&printCurationStatisticsDatatypes( \@datatypesToShow, $rowNameTdWidth, $datatypeTdWidth, $labelRightFlag);<br />
&printCurationStatisticsPapersCuratable( \@datatypesToShow, $labelRightFlag);<br />
&printCurationStatisticsObjectsCurated( \@datatypesToShow, $labelRightFlag);<br />
&printCurationStatisticsObjectsPerPaperCurated( \@datatypesToShow, $labelRightFlag);<br />
<br />
$curStats{'dividerallval'}{'allSame'}{'countPap'} = 'blank';<br />
tie %{ $curStats{'allcur'} }, "Tie::IxHash"; # make all section appear in this order by tying it<br />
tie %{ $curStats{'allval'} }, "Tie::IxHash"; # make all section appear in this order by tying it<br />
$curStats{'dividerany'}{'allSame'}{'countPap'} = 'blank';<br />
# tie %{ $curStats{'any'} }, "Tie::IxHash"; # not needed, 'any' only has 'pos'<br />
tie %{ $curStats{'any'}{'pos'} }, "Tie::IxHash"; # make any section appear in this order by tying it, though it will get populated last<br />
$curStats{'dividerint'}{'allSame'}{'countPap'} = 'blank';<br />
tie %{ $curStats{'int'}{'pos'} }, "Tie::IxHash"; # make any section appear in this order by tying it, though it will get populated last<br />
<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "printCurationStatistics Datatypes / Objects $diff<br>"; }<br />
<br />
&getCurationStatisticsAllCurated( \@datatypesToShow );<br />
&getCurationStatisticsAllVal( \@datatypesToShow );<br />
&getCurationStatisticsAllValPos( \@datatypesToShow );<br />
&getCurationStatisticsAllValNeg( \@datatypesToShow );<br />
&getCurationStatisticsAllValConf( \@datatypesToShow );<br />
<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getCurationtStatisticsAll $diff<br>"; }<br />
if ( ($displayAll) || ($displaySvm) ) {<br />
&getCurationStatisticsSvmNd( \@datatypesToShow );<br />
&getCurationStatisticsSvm( \@datatypesToShow ); }<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsSvm $diff<br>"; }<br />
<br />
if ( ($displayAll) || ($displayAfp) ) {<br />
&getCurationStatisticsAfpEmailed( \@datatypesToShow );<br />
&getCurationStatisticsAfpFlagged( \@datatypesToShow );<br />
&getCurationStatisticsAfpPos( \@datatypesToShow );<br />
&getCurationStatisticsAfpNeg( \@datatypesToShow ); }<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsAfp $diff<br>"; }<br />
<br />
if ( ($displayAll) || ($displayCfp) ) {<br />
&getCurationStatisticsCfpFlagged( \@datatypesToShow );<br />
&getCurationStatisticsCfpPos( \@datatypesToShow );<br />
&getCurationStatisticsCfpNeg( \@datatypesToShow ); }<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsCfp $diff<br>"; }<br />
<br />
&getCurationStatisticsAny( \@datatypesToShow, \@flaggingMethods );<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsAny $diff<br>"; }<br />
<br />
my @labelKeys; # labelKeys will be created here<br />
my $depth = 0; # recursion depth into hash<br />
&recurseCurStats(\%curStats, \@labelKeys, $depth, \@datatypesToShow, $flaggingMethods, $labelRightFlag );<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "recurseCurStats $diff<br>"; }<br />
<br />
&printCurationStatisticsDatatypes( \@datatypesToShow, $rowNameTdWidth, $datatypeTdWidth, $labelRightFlag);<br />
print "</table>\n";<br />
if ($showTimes) { $end = time; $diff = $end - $startprintCurationStatisticsTable; print "printCurationStatisticsTable $diff<br>"; }<br />
} # sub printCurationStatisticsTable<br />
<br />
</pre><br />
<br />
<br><br />
<br><br />
<br />
== Premade Comments ==<br />
<br />
In the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page, curators have the option to select a comment from a drop down list of comments to apply to this paper in the context of the relevant datatype.<br />
<br />
In the code, the comments are stored in a hash table called ''%premadeComments''. The keys (stored in postgres) of these comments are only numbers, so the descriptions/titles can change or be updated and still apply retroactively.<br />
<br />
Code:<br />
<br />
<pre><br />
sub populatePremadeComments {<br />
$premadeComments{"1"} = "SVM Positive, Curation Negative";<br />
$premadeComments{"2"} = "C. elegans as heterologous expression system";<br />
$premadeComments{"3"} = "pre-made comment #3";}<br />
</pre><br />
<br />
So, as of now:<br />
<br />
<pre><br />
<br />
| Key | Comment |<br />
| 1 | "SVM Positive, Curation Negative" |<br />
| 2 | "C. elegans as heterologous expression system" |<br />
| 3 | "pre-made comment #3" |<br />
<br />
</pre><br />
<br />
<br />
Hence, if a completely new comment is desired, a new key will need to be made and there after associated with that new comment. Also, old keys should never be recycled and documentation describing what each key refers to should be maintained in this Wiki.<br />
<br />
<br><br />
<br><br />
<br />
== New Result ==<br />
<br />
Each paper-datatype pair can be assigned a "New Result" indicating its status as curated (or not) or validated (or not), and if validated, positive or negative for the particular paper-datatype pair. These results can be entered via the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] or directly in the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page via the "New Results" column. The code is below:<br />
<br />
Code:<br />
<br />
<pre><br />
sub populateDonPosNegOptions {<br />
$donPosNegOptions{""} = "";<br />
$donPosNegOptions{"curated"} = "curated and positive";<br />
$donPosNegOptions{"positive"} = "validated positive";<br />
$donPosNegOptions{"negative"} = "validated negative";<br />
$donPosNegOptions{"notvalidated"} = "not validated";}<br />
</pre><br />
<br />
where "curated", "positive", "negative", and "notvalidated" are the keys (for the %donPosNegOptions hash table in the form code) that will be stored in postgres and the corresponding values (e.g. "curated and positive") are what will be displayed on the form.<br />
<br />
Note that "" and "not validated" represent no data for that paper-datatype pair, but "not validated" is present as an option to overwrite accidental validations (it is impossible to go back to a blank "" field via the form).<br />
<br />
<br><br />
<br><br />
<br />
== Datatypes ==<br />
<br />
The form determines which datatypes exist via a 'populateDatatypes' subroutine in the form code. As of 12-5-2012, the form first collects all datatypes used in SVM from the 'cur_svmdata' postgres table (which, as of 12-5-2012, all also are identically named in the Author First Pass (AFP) and Curator First Pass (CFP) tables) and then supplements with other datatypes not in SVM but in AFP and CFP (as of 12-5-2012, all anatomy curation related datatypes) plus one additional datatype ("geneticablation") not in SVM, AFP, or CFP.<br />
<br />
Here is the code:<br />
<br />
<pre><br />
sub populateDatatypes {<br />
$result = $dbh->prepare( "SELECT DISTINCT(cur_datatype) FROM cur_svmdata " );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) { $datatypesAfpCfp{$row[0]} = $row[0]; }<br />
$datatypesAfpCfp{'chemicals'} = 'chemicals'; # added for Karen 2013 10 02<br />
$datatypesAfpCfp{'blastomere'} = 'cellfunc';<br />
$datatypesAfpCfp{'exprmosaic'} = 'siteaction';<br />
$datatypesAfpCfp{'geneticmosaic'} = 'mosaic';<br />
$datatypesAfpCfp{'laserablation'} = 'ablationdata';<br />
foreach my $datatype (keys %datatypesAfpCfp) { $datatypes{$datatype}++; }<br />
$datatypes{'geneticablation'}++;<br />
} # sub populateDatatypes<br />
</pre><br />
<br />
<br />
As for the datatypes currently (12-5-2012) NOT in SVM but IN AFP and CFP, the datatype name is different between the Curation Status form and the AFP and CFP forms. So, the datatypes named "cellfunc", "siteaction", "mosaic", and "ablationdata" in the AFP and CFP tables are respectively named "blastomere", "exprmosaic", "geneticmosaic", "laserablation" in the Curation Status form.<br />
<br />
The IMPORTANT thing here is: if, at some point, the datatypes are changed (added, renamed, etc.), and the code is not updated in kind, the form will likely break. Curators should tell Juancarlos/Chris/Daniela to update the code.<br />
<br />
new datatypes should be accounted in this code :<br><br />
* - no svm, no afp/cfp : add to %datatypes hash like 'geneticablation'.<br />
* - no svm, yes afp/cfp : add to %datatypesAfpCfp + %datatypes hashes like 'blastomere'<br />
* - yes svm, yes afp/cfp : add to code to populate cur_svmdata, which will populate in the SELECT query<br />
* - yes svm, no afp/cfp : add to code to populate cur_svmdata, which will populate in the SELECT query, but also subsequently delete from %datatypesAfpCfp (to prevent a postgres query to a non-existing table which will crash the form)<br />
<br />
<br><br />
<br><br />
<br />
== Creating PDF links to papers ==<br />
<br />
In the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page, each paper ID is linked to its corresponding PDF document using the code below:<br />
<br />
Code:<br />
<br />
<pre><br />
sub populatePdf {<br />
$result = $dbh->prepare( "SELECT * FROM pap_electronic_path WHERE pap_electronic_path IS NOT NULL");<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
my %temp;<br />
while (my @row = $result->fetchrow) {<br />
my ($data, $isPdf) = &makePdfLinkFromPath($row[1]);<br />
$temp{$row[0]}{$isPdf}{$data}++; }<br />
foreach my $joinkey (sort keys %temp) {<br />
my @pdfs;<br />
foreach my $isPdf (reverse sort keys %{ $temp{$joinkey} }) {<br />
foreach my $pdfLink (sort keys %{ $temp{$joinkey}{$isPdf} }) {<br />
push @pdfs, $pdfLink; } }<br />
my ($pdfs) = join"<br/>", @pdfs;<br />
$pdf{$joinkey} = $pdfs;<br />
} # foreach my $joinkey (sort keys %temp)<br />
} # sub populatePdf<br />
<br />
sub makePdfLinkFromPath {<br />
my ($path) = shift;<br />
my ($pdf) = $path =~ m/\/([^\/]*)$/;<br />
my $isPdf = 0; if ($pdf =~ m/\.pdf$/) { $isPdf++; } # kimberly wants .pdf files on top, so need to flag to sort<br />
my $link = 'http://tazendra.caltech.edu/~acedb/daniel/' . $pdf;<br />
my $data = "<a href=\"$link\" target=\"new\">$pdf</a>"; return ($data, $isPdf); }<br />
</pre><br />
<br />
<br />
Note the table name ("pap_electronic_path"), the URL path ("http://tazendra.caltech.edu/~acedb/daniel/"), and (because of the code 'target=\"new\"') that the link will open a new window or tab. Also note that opening another link on the original page (e.g. [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page) will open that link in that same new window/tab, clearing out what you had opened previously.<br />
<br />
<br />
== Creating hyperlinks to PubMed paper pages ==<br />
<br />
In the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page each PubMed ID is linked to its corresponding PubMed webpage using the code below:<br />
<br />
Code:<br />
<br />
<pre><br />
sub populatePmid {<br />
$result = $dbh->prepare( "SELECT * FROM pap_identifier WHERE pap_identifier ~ 'pmid'" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
my %temp;<br />
while (my @row = $result->fetchrow) { if ($row[0]) {<br />
my ($data) = &makeNcbiLinkFromPmid($row[1]);<br />
$temp{$row[0]}{$data}++; } }<br />
foreach my $joinkey (sort keys %temp) {<br />
my ($pmids) = join"<br/>", keys %{ $temp{$joinkey} };<br />
$pmid{$joinkey} = $pmids;<br />
} # foreach my $joinkey (sort keys %temp)<br />
} # sub populatePmid<br />
</pre><br />
<br />
<pre><br />
sub makeNcbiLinkFromPmid {<br />
my $pmid = shift;<br />
my ($id) = $pmid =~ m/(\d+)/;<br />
my $link = 'http://www.ncbi.nlm.nih.gov/pubmed/' . $id;<br />
my $data = "<a href=\"$link\" target=\"new\">$pmid</a>"; return $data; }<br />
</pre><br />
<br />
Note the table name ("pap_identifier"), the table specifier ("WHERE pap_identifier ~ 'pmid'"), the URL path ("http://www.ncbi.nlm.nih.gov/pubmed/"), and (because of the code 'target=\"new\"') that the link will open a new window or tab. Also note that opening another link on the original page (e.g. [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page) will open that link in that same new window/tab, clearing out what you had opened previously.<br />
<br />
<br />
== Populating the Journal Names ==<br />
<br />
Journal names for each paper are populated via the following code:<br />
<br />
<pre><br />
sub populateJournal {<br />
$result = $dbh->prepare( "SELECT * FROM pap_journal WHERE pap_journal IS NOT NULL" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) { if ($row[0]) { $journal{$row[0]} = $row[1]; } }<br />
} # sub populateJournal<br />
</pre><br />
<br />
<br />
Note the table "pap_journal".<br />
<br />
<br />
== Loading Data into the Form ==<br />
<br />
On the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], the [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]], or the [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]], curators have the option to specify what flagging methods (SVM, AFP, and/or CFP), curation sources (Ontology Annotator or cur_curdata [which is the data generated from this form]), and/or datatypes (e.g. geneint, rnai) they would like to view.<br />
<br />
There are separate hashes for storing the different types of data, all of which have a key of datatype, subkey paperID, sub-subkeys of other things depending on the hash (see individual subsections below).<br />
<br />
There is an option to select specific datatype, in which case only the data for those datatypes is loaded. Similarly if only some paperIDs have been selected, only those paperIDs are loaded.<br />
<br />
<br />
=== Loading curatable papers ===<br />
<br />
Only papers that have a 'valid' pap_status value and a 'primary' pap_primary_data value are considered curatable. These are stored in the %curatablePapers hash. ( paperID => status )<br />
<br />
<pre><br />
sub populateCuratablePapers {<br />
my $query = "SELECT * FROM pap_status WHERE pap_status = 'valid' AND joinkey IN (SELECT joinkey FROM pap_primary_data WHERE pap_primary_data = 'primary')";<br />
$result = $dbh->prepare( $query );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) { $curatablePapers{$row[0]} = $row[1]; }<br />
} # sub populateCuratablePapers<br />
</pre><br />
<br />
<br />
=== Loading afp_ data ===<br />
<br />
Populate %afpEmailed, %afpData, %afpFlagged, %afpPos, %afpNeg.<br />
<br />
for each of the chosen datatypes, if they are allowed in %datatypeAfpCfp, query the corresponding afp_ postgres table, and if it's a curatable paper store the value in the %afpData hash (datatype, paper ID => AFP result).<br />
<br />
Query afp_email and if it's a curatable paper store in %afpEmailed hash ( paperID => 1 ) for afp emailed statistics.<br />
<br />
Query afp_lasttouched to see if a paper has been flagged for afp. Skip if it's not a curatable paper. For all %chosenDatatypes store in %afpFlagged ( datatype, paperID => 1 )<br />
<br />
For each of the %afpFlagged datatypes that have been chosen (%chosenDatatypes), if there is an %afpData value, store in %afpPos hash ( positive flag for afp ), otherwise store in %afpNeg hash (negative flag for afp ) ( datatype, paperID => 1 )<br />
<br />
<pre><br />
sub populateAfpData {<br />
foreach my $datatype (sort keys %chosenDatatypes) {<br />
next unless $datatypesAfpCfp{$datatype};<br />
my $pgtable_datatype = $datatypesAfpCfp{$datatype};<br />
$result = $dbh->prepare( "SELECT * FROM afp_$pgtable_datatype" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
next unless ($curatablePapers{$row[0]});<br />
$afpData{$datatype}{$row[0]} = $row[1]; }<br />
} # foreach my $datatype (sort keys %chosenDatatypes)<br />
<br />
$result = $dbh->prepare( "SELECT * FROM afp_email" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
next unless ($curatablePapers{$row[0]});<br />
$afpEmailed{$row[0]}++; }<br />
$result = $dbh->prepare( "SELECT * FROM afp_lasttouched" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
next unless ($curatablePapers{$row[0]});<br />
foreach my $datatype (sort keys %chosenDatatypes) {<br />
$afpFlagged{$datatype}{$row[0]}++; } }<br />
foreach my $datatype (sort keys %chosenDatatypes) {<br />
foreach my $joinkey (sort keys %{ $afpFlagged{$datatype} }) {<br />
if ($afpData{$datatype}{$joinkey}) { $afpPos{$datatype}{$joinkey}++; }<br />
else { $afpNeg{$datatype}{$joinkey}++; } } }<br />
} # sub populateAfpData<br />
</pre><br />
<br />
<br />
=== Loading cfp_ data ===<br />
<br />
Populate %cfpData, %cfpFlagged, %cfpPos, %cfpNeg.<br />
<br />
for each of the chosen datatypes, if they are allowed in %datatypeAfpCfp, query the corresponding cfp_ postgres table, and if it's a curatable paper store the value in the %cfpData hash (datatype, paper ID => CFP result).<br />
<br />
Query cfp_curator to see if a paper has been flagged for cfp. Skip if it's not a curatable paper. For all %chosenDatatypes store in %cfpFlagged ( datatype, paperID => 1 )<br />
<br />
For each of the %cfpFlagged datatypes that have been chosen (%chosenDatatypes), if there is an %cfpData value, store in %cfpPos hash ( positive flag for cfp ), otherwise store in %cfpNeg hash (negative flag for cfp ) ( datatype, paperID => 1 )<br />
<br />
<pre><br />
sub populateCfpData {<br />
foreach my $datatype (sort keys %chosenDatatypes) {<br />
next unless $datatypesAfpCfp{$datatype};<br />
my $pgtable_datatype = $datatypesAfpCfp{$datatype};<br />
$result = $dbh->prepare( "SELECT * FROM cfp_$pgtable_datatype" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
next unless ($curatablePapers{$row[0]});<br />
$cfpData{$datatype}{$row[0]} = $row[1]; }<br />
} # foreach my $datatype (sort keys %chosenDatatypes)<br />
<br />
$result = $dbh->prepare( "SELECT * FROM cfp_curator" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
next unless ($curatablePapers{$row[0]});<br />
foreach my $datatype (sort keys %chosenDatatypes) {<br />
$cfpFlagged{$datatype}{$row[0]}++; } }<br />
foreach my $datatype (sort keys %chosenDatatypes) {<br />
foreach my $joinkey (sort keys %{ $cfpFlagged{$datatype} }) {<br />
if ($cfpData{$datatype}{$joinkey}) { $cfpPos{$datatype}{$joinkey}++; }<br />
else { $cfpNeg{$datatype}{$joinkey}++; } } }<br />
} # sub populateCfpData<br />
</pre><br />
<br />
<br />
=== Loading svm data ===<br />
<br />
Populate %svmData hash.<br />
<br />
For each of the chosen datatypes, query the cur_svmdata table where cur_datatype is that datatype, and sort by cur_date so that we always have the latest value for a given paper-datatype pair. The svm result is the 4th column, the paper ID is the first column. skip papers that are not %curatablePapers. store in %svmData ( datatype, paper => svm_result ). cur_svmdata could have multiple results for a given paper-datatype pair, we'll consider only the most recent result (by the directory name/date on Yuling's machine).<br />
<br />
<pre><br />
sub populateSvmData {<br />
# $result = $dbh->prepare( "SELECT * FROM cur_svmdata ORDER BY cur_datatype, cur_date" ); # always doing for all datatypes vs looping for chosen takes 4.66vs 2.74 secs<br />
foreach my $datatype (sort keys %chosenDatatypes) {<br />
$result = $dbh->prepare( "SELECT * FROM cur_svmdata WHERE cur_datatype = '$datatype' ORDER BY cur_date" );<br />
# table stores multiple dates for same paper-datatype in case we want to see multiple results later. if it didn't and we didn't order it would take 2.05 vs 2.74 secs, so not worth changing the way we're storing data<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
my $joinkey = $row[0]; my $svmdata = $row[3];<br />
next unless ($curatablePapers{$row[0]});<br />
$svmData{$datatype}{$joinkey} = $svmdata; } }<br />
} # sub populateSvmData<br />
</pre><br />
<br />
<br />
=== Loading OA data ===<br />
<br />
Populate %objsCurated and %oaData hashes.<br />
<br />
Each datatype is stored in different tables and has to be queried separately. The queries are mostly the same.<br />
<br />
<pre><br />
if ($chosenDatatypes{'newmutant'}) {<br />
$result = $dbh->prepare( "SELECT * FROM app_variation" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) { $objsCurated{'newmutant'}{$row[1]}++; }<br />
$result = $dbh->prepare( "SELECT * FROM app_paper" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
my (@papers) = $row[1] =~ m/WBPaper(\d+)/g;<br />
foreach my $paper (@papers) {<br />
$oaData{'newmutant'}{$paper} = 'curated'; } } }<br />
</pre><br />
and similarly for other datatypes.<br />
<br />
The example above is for the datatype 'newmutant'. If that datatype is a %chosenDatatypes, query app_variation and store in %objsCurated ( datatype, object => 1 ), then query app_paper matching for WBPaper IDs, and associating to %oaData ( datatype, paperID => 'curated' ).<br />
<br />
For other datatypes :<br />
* overexpr : objects from app_transgene ; %oaData from app_paper WHERE joinkey IN (SELECT joinkey FROM app_transgene WHERE app_transgene IS NOT NULL AND app_transgene != ''), meaning papers where the postgresID has a corresponding transgene that exists in app_transgene.<br />
* antibody : objects from abp_name ; %oaData from abp_paper<br />
* otherexpr : objects from exp_name ; %oaData from exp_paper<br />
* genereg : objects from grg_name; %oaData from grg_paper<br />
* geneint : objects from int_name; %oaData from int_paper<br />
* rnai : objects from rna_name; %oaData from rna_paper<br />
* blastomere : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Blastomere_isolation') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Blastomere_isolation')<br />
* exprmosaic : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Expression_mosaic') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Expression_mosaic')<br />
* geneticablation : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_ablation') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_ablation')<br />
* geneticmosaic : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_mosaic') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_mosaic')<br />
* laserablation : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Laser_ablation') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Laser_ablation')<br />
* chemicals : <br />
** do 5 postgres queries to find unique curated objects into %objsCurated{chemicals}{<chemical>}<br />
<pre><br />
- SELECT * FROM mop_name WHERE joinkey IN (SELECT joinkey FROM mop_paper WHERE mop_paper IS NOT NULL AND mop_paper != '')<br />
- SELECT * FROM grg_moleculeregulator match for WBMol:\d+<br />
- SELECT * FROM app_molecule match for WBMol:\d+<br />
- SELECT * FROM pro_molecule match for WBMol:\d+<br />
- SELECT * FROM rna_molecule match for WBMol:\d+<br />
</pre><br />
** do 7 postgres queries to find unique curated papers and match for WBPaper\d+ into $oaData{'chemicals'}{<paper>} = 'curated' :<br />
<pre><br />
- SELECT * FROM mop_paper<br />
- SELECT * FROM app_paper WHERE joinkey IN (SELECT joinkey FROM app_molecule WHERE app_molecule IS NOT NULL AND app_molecule != '')<br />
- SELECT * FROM grg_paper WHERE joinkey IN (SELECT joinkey FROM grg_moleculeregulator WHERE grg_moleculeregulator IS NOT NULL AND grg_moleculeregulator != '')<br />
- SELECT * FROM pro_paper WHERE joinkey IN (SELECT joinkey FROM pro_molecule WHERE pro_molecule IS NOT NULL AND pro_molecule != '')<br />
- SELECT * FROM rna_paper WHERE joinkey IN (SELECT joinkey FROM rna_molecule WHERE rna_molecule IS NOT NULL AND rna_molecule != '')<br />
- SELECT * FROM int_paper WHERE joinkey IN (SELECT joinkey FROM int_otheronetype WHERE int_otheronetype = 'Chemical')<br />
- SELECT * FROM int_paper WHERE joinkey IN (SELECT joinkey FROM int_othertwotype WHERE int_othertwotype = 'Chemical')<br />
</pre><br />
<br />
=== Loading cur_curdata ===<br />
<br />
cur_curdata: this captures all data entered through this form, meaning paper ID, datatype, curator ID, validation status (e.g. "curated and positive"), pre-canned comment, and/or free text comment (and timestamp). Note: this table only stores data (and associated paper-datatype pairs) that has been manually entered through this form.<br />
<br />
Code:<br />
<br />
<pre><br />
sub populateCurCurData {<br />
$result = $dbh->prepare( "SELECT * FROM cur_curdata ORDER BY cur_timestamp" ); # in case multiple values get in for a paper-datatype (shouldn't happen), keep the latest<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
next unless ($chosenPapers{$row[0]} || $chosenPapers{all});<br />
next unless ($chosenDatatypes{$row[1]});<br />
$curData{$row[1]}{$row[0]}{curator} = $row[2];<br />
$curData{$row[1]}{$row[0]}{donposneg} = $row[3];<br />
$curData{$row[1]}{$row[0]}{selcomment} = $row[4];<br />
$curData{$row[1]}{$row[0]}{txtcomment} = $row[5];<br />
$curData{$row[1]}{$row[0]}{timestamp} = $row[6]; }<br />
} # sub populateCurCurData<br />
</pre><br />
<br />
<br />
When populating curator data from curation status, read the cur_curdata postgres table, skip datatypes that were not chosen, skip papers that were not chosen.<br />
Store data in the %curData hash, key is datatype, subkey is paperID, then valuekeys are curator, donposneg (curator result of curated, validatedPos, validatedNeg, notValidated), select comment, text comment, timestamp.<br />
<br />
cur_curdata can only have one result for a specific paper-datatype pair, if a new result is entered it will overwrite the previous result.<br />
<br />
<br />
==== Loading cur_curdata --- Code changes for June 12, 2013 ====<br />
<br />
The code above was changed to accommodate the change in how we handle "not validated" flags. The code has one extra line: <br />
<br />
<pre><br />
next if ( ($row[3] eq 'notvalidated') || ($row[3] eq '') ); # skip entries marked as notvalidated<br />
</pre><br />
<br />
The new code, in total, is:<br />
<br />
<pre><br />
sub populateCurCurData {<br />
$result = $dbh->prepare( "SELECT * FROM cur_curdata ORDER BY cur_timestamp" ); # in case multiple values get in for a paper-datatype (shouldn't happen), keep the latest<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
next unless ($chosenPapers{$row[0]} || $chosenPapers{all});<br />
next unless ($chosenDatatypes{$row[1]});<br />
next if ( ($row[3] eq 'notvalidated') || ($row[3] eq '') ); # skip entries marked as notvalidated<br />
$curData{$row[1]}{$row[0]}{curator} = $row[2];<br />
$curData{$row[1]}{$row[0]}{donposneg} = $row[3];<br />
$curData{$row[1]}{$row[0]}{selcomment} = $row[4];<br />
$curData{$row[1]}{$row[0]}{txtcomment} = $row[5];<br />
$curData{$row[1]}{$row[0]}{timestamp} = $row[6]; }<br />
} # sub populateCurCurData<br />
</pre><br />
<br />
=== Processing curated data ===<br />
<br />
The following subroutine will process cur_curdata and oaData into %valCur %valPos %valNeg and into %conflict which has the paper-datatypes that have multiple values, which correspond to a datatype-paper pair's validated+curated, validated+positive, validated+negative.<br />
<br />
If a paper has been curated for a datatype, the paper enters into the %valCur '''AND''' the %valPos hashes; if it has been validated positive but NOT curated it goes into %valPos ONLY; and if it has been validated negative it will go into %valNeg.<br />
<br />
<pre><br />
sub populateCuratedPapers {<br />
my ($showTimes, $start, $end, $diff) = (0, '', '', '');<br />
if ($showTimes) { $start = time; }<br />
&populateCurCurData();<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "IN populateCuratedPapers populateCurCurData $diff<br>"; }<br />
&populateOa(); # $oaData{datatype}{joinkey} = 'positive';<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "IN populateCuratedPapers populateOa $diff<br>"; }<br />
my %allCuratorValues; # $allCuratorValues{datatype}{joinkey} = 0 | 1+<br />
foreach my $datatype (sort keys %oaData) {<br />
foreach my $joinkey (sort keys %{ $oaData{$datatype} }) {<br />
$allCuratorValues{$joinkey}{$datatype}{curated}++; } } # validated positive and curated<br />
foreach my $datatype (sort keys %curData) {<br />
foreach my $joinkey (sort keys %{ $curData{$datatype} }) {<br />
$allCuratorValues{$joinkey}{$datatype}{ $curData{$datatype}{$joinkey}{donposneg} }++; } }<br />
foreach my $joinkey (sort keys %allCuratorValues) {<br />
next unless ($curatablePapers{$joinkey}); # skips non-primary papers<br />
foreach my $datatype (sort keys %{ $allCuratorValues{$joinkey} }) {<br />
my @values = keys %{ $allCuratorValues{$joinkey}{$datatype} };<br />
if (scalar @values > 1) { $conflict{$datatype}{$joinkey}++; }<br />
else {<br />
my $value = shift @values;<br />
$validated{$datatype}{$joinkey} = $value;<br />
if ($value eq 'curated') { $valPos{$datatype}{$joinkey} = $value; $valCur{$datatype}{$joinkey} = $value; }<br />
elsif ($value eq 'positive') { $valPos{$datatype}{$joinkey} = $value; }<br />
elsif ($value eq 'negative') { $valNeg{$datatype}{$joinkey} = $value; }<br />
} } }<br />
if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "IN populateCuratedPapers categorizing hash $diff<br>"; }<br />
} # sub populateCuratedPapers<br />
</pre><br />
<br />
<br />
==== Processing curated data --- Code changes for June 12, 2013 ====<br />
<br />
The code in the section above was changed to accommodate a change in the way we recognize and handle paper conflicts. The following code:<br />
<br />
<pre><br />
my @values = keys %{ $allCuratorValues{$joinkey}{$datatype} };<br />
if (scalar @values > 1) { $conflict{$datatype}{$joinkey}++; }<br />
else {<br />
my $value = shift @values;<br />
$validated{$datatype}{$joinkey} = $value;<br />
if ($value eq 'curated') { $valPos{$datatype}{$joinkey} = $value; $valCur{$datatype}{$joinkey} = $value; }<br />
elsif ($value eq 'positive') { $valPos{$datatype}{$joinkey} = $value; }<br />
elsif ($value eq 'negative') { $valNeg{$datatype}{$joinkey} = $value; }<br />
</pre><br />
<br />
was changed to:<br />
<br />
<pre><br />
my @values = keys %{ $allCuratorValues{$joinkey}{$datatype} };<br />
if (scalar @values < 2) { # only one value, categorize it<br />
my $value = shift @values;<br />
$validated{$datatype}{$joinkey} = $value;<br />
if ($value eq 'curated') { $valPos{$datatype}{$joinkey} = $value; $valCur{$datatype}{$joinkey} = $value; }<br />
elsif ($value eq 'positive') { $valPos{$datatype}{$joinkey} = $value; }<br />
elsif ($value eq 'negative') { $valNeg{$datatype}{$joinkey} = $value; } }<br />
elsif (scalar @values == 2) { # only two values, either ok or conflict<br />
if ( ($allCuratorValues{$joinkey}{$datatype}{'curated'}) && ($allCuratorValues{$joinkey}{$datatype}{'positive'}) ) { # positive + curated not a conflict, for Chris 2013 06 12<br />
$valPos{$datatype}{$joinkey} = 'positive'; $valCur{$datatype}{$joinkey} = 'curated'; }<br />
else { $conflict{$datatype}{$joinkey}++; } }<br />
else { $conflict{$datatype}{$joinkey}++; }<br />
<br />
</pre><br />
<br />
=== Curation Statistics Calculations ===<br />
<br />
The way that each value is calculated for Curation Statistics table is based on what papers (or, more specifically, paper IDs) populate each of a number of tables. The following hash tables capture validation status:<br />
<br />
<pre><br />
%valCur - All papers that have been curated for a given datatype<br />
<br />
%valPos - All papers that have been validated positive for a given datatype, but not yet curated<br />
<br />
%valNeg - All papers that have been validated negative for a given datatype<br />
</pre><br />
<br />
<br />
When determining, for a particular flagging method, the validation and curation statistics with respect to flagging status, these tables are compared to the table for flagging results to generate the numbers for the Curation Statistics table. So, for AFP Positives for example, the following logic is performed to determine the indicated values (list of papers), per datatype:<br />
<br />
<pre><br />
AFP positive (%afpPos)<br />
AFP positive validated (%afpPosVal) : %afpPos AND (%valNeg OR %valPos)<br />
AFP positive validated false positive (%afpPosFP) : %afpPos AND %valNeg<br />
AFP positive validated true positive (%afpPosTP) : %afpPos AND %valPos<br />
AFP positive validated true positive curated (%afpPosTpCur) : %afpPos AND %valPos AND %valCur <Note: the %valPOS is redundant><br />
AFP positive validated true positive not curated (%afpPosTpNC) : %afpPos AND (%valPos NOT %valCur)<br />
AFP positive not validated (%afpPosNV) : %afpPos NOT (%valNeg OR %valPos)<br />
AFP positive not curated (%afpPosNC) : (%afpPos AND (%valPos NOT %valCur)) OR (%afpPos NOT (%valNeg OR %valPos))<br />
</pre><br />
<br />
<br />
which are determined by the following section of code:<br />
<br />
<pre><br />
sub getCurationStatisticsAfpPos {<br />
my ($datatypesToShow_ref) = @_;<br />
my @datatypesToShow = @$datatypesToShow_ref;<br />
my %afpPosNV; my %afpPosVal; my %afpPosFP; my %afpPosTP; my %afpPosTpCur; my %afpPosTpNC; my %afpPosNC;<br />
# positive and : not validated, validated, false positive, true positive, TP curated, TP not curated, not curated minus validated negative OR not validated + TP not curated<br />
foreach my $datatype (@datatypesToShow) {<br />
foreach my $joinkey (sort keys %{ $afpPos{$datatype} }) {<br />
if ($valPos{$datatype}{$joinkey}) { $afpPosTP{$datatype}{$joinkey}++; $afpPosVal{$datatype}{$joinkey}++;<br />
if ($valCur{$datatype}{$joinkey}) { $afpPosTpCur{$datatype}{$joinkey}++; }<br />
else { $afpPosTpNC{$datatype}{$joinkey}++; $afpPosNC{$datatype}{$joinkey}++; } }<br />
elsif ($valNeg{$datatype}{$joinkey}) { $afpPosFP{$datatype}{$joinkey}++; $afpPosVal{$datatype}{$joinkey}++; }<br />
else { $afpPosNV{$datatype}{$joinkey}++; $afpPosNC{$datatype}{$joinkey}++; } } }<br />
tie %{ $curStats{'afp'}{'pos'} }, "Tie::IxHash";<br />
foreach my $datatype (@datatypesToShow) {<br />
my $countAfpFlagged = scalar keys %{ $afpFlagged{$datatype} };<br />
my $countAfpPos = scalar keys %{ $afpPos{$datatype} };<br />
my $ratio = 0;<br />
if ($countAfpFlagged > 0) { $ratio = $countAfpPos / $countAfpFlagged * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpPos{$datatype} }) { $curStats{'afp'}{'pos'}{$datatype}{papers}{$joinkey}++;<br />
$curStats{'any'}{'pos'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'pos'}{$datatype}{'countPap'} = scalar keys %{ $afpPos{$datatype} };<br />
$curStats{'afp'}{'pos'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpPosVal = scalar keys %{ $afpPosVal{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpPos > 0) { $ratio = $countAfpPosVal / $countAfpPos * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpPosVal{$datatype} }) { $curStats{'afp'}{'pos'}{'val'}{$datatype}{papers}{$joinkey}++;<br />
$curStats{'any'}{'pos'}{'val'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'pos'}{'val'}{$datatype}{'countPap'} = scalar keys %{ $afpPosVal{$datatype} };<br />
$curStats{'afp'}{'pos'}{'val'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpPosTP = scalar keys %{ $afpPosTP{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpPosVal > 0) { $ratio = $countAfpPosTP / $countAfpPosVal * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpPosTP{$datatype} }) { $curStats{'afp'}{'pos'}{'val'}{'tp'}{$datatype}{papers}{$joinkey}++;<br />
$curStats{'any'}{'pos'}{'val'}{'tp'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'pos'}{'val'}{'tp'}{$datatype}{'countPap'} = scalar keys %{ $afpPosTP{$datatype} };<br />
$curStats{'afp'}{'pos'}{'val'}{'tp'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpPosTpCur = scalar keys %{ $afpPosTpCur{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpPosTP > 0) { $ratio = $countAfpPosTpCur / $countAfpPosTP * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpPosTpCur{$datatype} }) { $curStats{'afp'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{papers}{$joinkey}++;<br />
$curStats{'any'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{'countPap'} = scalar keys %{ $afpPosTpCur{$datatype} };<br />
$curStats{'afp'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpPosTpNC = scalar keys %{ $afpPosTpNC{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpPosTP > 0) { $ratio = $countAfpPosTpNC / $countAfpPosTP * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpPosTpNC{$datatype} }) { $curStats{'afp'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{papers}{$joinkey}++;<br />
$curStats{'any'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpPosTpNC{$datatype} };<br />
$curStats{'afp'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpPosFP = scalar keys %{ $afpPosFP{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpPosVal > 0) { $ratio = $countAfpPosFP / $countAfpPosVal * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpPosFP{$datatype} }) { $curStats{'afp'}{'pos'}{'val'}{'fp'}{$datatype}{papers}{$joinkey}++;<br />
$curStats{'any'}{'pos'}{'val'}{'fp'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'pos'}{'val'}{'fp'}{$datatype}{'countPap'} = scalar keys %{ $afpPosFP{$datatype} };<br />
$curStats{'afp'}{'pos'}{'val'}{'fp'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpPosNV = scalar keys %{ $afpPosNV{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpPos > 0) { $ratio = $countAfpPosNV / $countAfpPos * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpPosNV{$datatype} }) { $curStats{'afp'}{'pos'}{'nval'}{$datatype}{papers}{$joinkey}++;<br />
$curStats{'any'}{'pos'}{'nval'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'pos'}{'nval'}{$datatype}{'countPap'} = scalar keys %{ $afpPosNV{$datatype} };<br />
$curStats{'afp'}{'pos'}{'nval'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpPosNC = scalar keys %{ $afpPosNC{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpPos > 0) { $ratio = $countAfpPosNC / $countAfpPos * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpPosNC{$datatype} }) { $curStats{'afp'}{'pos'}{'ncur'}{$datatype}{papers}{$joinkey}++;<br />
$curStats{'any'}{'pos'}{'ncur'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'pos'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpPosNC{$datatype} };<br />
$curStats{'afp'}{'pos'}{'ncur'}{$datatype}{'ratio'} = $ratio;<br />
} # foreach my $datatype (@datatypesToShow)<br />
} # sub getCurationStatisticsAfpPos<br />
</pre><br />
<br />
<br />
For AFP Negatives, the following logic is performed to determine the indicated values (list of papers), per datatype:<br />
<br />
<pre><br />
AFP negative (%afpNeg)<br />
AFP negative validated (%afpNegVal) : %afpNeg AND (%valNeg OR %valPos)<br />
AFP negative validated true negative (%afpNegTN) : %afpNeg AND %valNeg<br />
AFP negative validated false negative (%afpNegFN) : %afpNeg AND %valPos<br />
AFP negative validated false negative curated (%afpNegFnCur) : %afpNeg AND %valPos AND %valCur <Note: the %valPOS is redundant><br />
AFP negative validated false negative not curated (%afpNegFnNC) : %afpNeg AND %valPos NOT %valCur<br />
AFP negative not validated (%afpNegNV) : %afpNeg NOT (%valNeg OR %valPos)<br />
AFP negative not curated (%afpNegNC) : (%afpNeg AND (%valPos NOT %valCur)) OR (%afpNeg NOT (%valPos OR %valNeg))<br />
</pre><br />
<br />
which are determined by the following section of code:<br />
<br />
<pre><br />
sub getCurationStatisticsAfpNeg {<br />
my ($datatypesToShow_ref) = @_;<br />
my @datatypesToShow = @$datatypesToShow_ref;<br />
my %afpNegNV; my %afpNegVal; my %afpNegTN; my %afpNegFN; my %afpNegFnCur; my %afpNegFnNC; my %afpNegNC;<br />
# negative and : not validated, validated, true negative, false negative, FN curated, FN not curated, not curated minus validated negative OR not validated + FN not curated<br />
foreach my $datatype (@datatypesToShow) {<br />
foreach my $joinkey (sort keys %{ $afpNeg{$datatype} }) {<br />
if ($valNeg{$datatype}{$joinkey}) { $afpNegTN{$datatype}{$joinkey}++; $afpNegVal{$datatype}{$joinkey}++; }<br />
elsif ($valPos{$datatype}{$joinkey}) { $afpNegFN{$datatype}{$joinkey}++; $afpNegVal{$datatype}{$joinkey}++;<br />
if ($valCur{$datatype}{$joinkey}) { $afpNegFnCur{$datatype}{$joinkey}++; }<br />
else { $afpNegFnNC{$datatype}{$joinkey}++; $afpNegNC{$datatype}{$joinkey}++; } }<br />
else { $afpNegNV{$datatype}{$joinkey}++; $afpNegNC{$datatype}{$joinkey}++; } } }<br />
tie %{ $curStats{'afp'}{'neg'} }, "Tie::IxHash";<br />
foreach my $datatype (@datatypesToShow) {<br />
my $countAfpFlagged = scalar keys %{ $afpFlagged{$datatype} };<br />
my $countAfpNeg = scalar keys %{ $afpNeg{$datatype} };<br />
my $ratio = 0;<br />
if ($countAfpFlagged > 0) { $ratio = $countAfpNeg / $countAfpFlagged * 100; $ratio = FormatSigFigs($ratio, 2); } # ($ratio) = &roundToPlaces($ratio, 2); # $ratio = sprintf "%.2f", $ratio;<br />
foreach my $joinkey (keys %{ $afpNeg{$datatype} }) { $curStats{'afp'}{'neg'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'neg'}{$datatype}{'countPap'} = scalar keys %{ $afpNeg{$datatype} };<br />
$curStats{'afp'}{'neg'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpNegVal = scalar keys %{ $afpNegVal{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpNeg > 0) { $ratio = $countAfpNegVal / $countAfpNeg * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpNegVal{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'neg'}{'val'}{$datatype}{'countPap'} = scalar keys %{ $afpNegVal{$datatype} };<br />
$curStats{'afp'}{'neg'}{'val'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpNegTN = scalar keys %{ $afpNegTN{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpNegVal > 0) { $ratio = $countAfpNegTN / $countAfpNegVal * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpNegTN{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'tn'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'neg'}{'val'}{'tn'}{$datatype}{'countPap'} = scalar keys %{ $afpNegTN{$datatype} };<br />
$curStats{'afp'}{'neg'}{'val'}{'tn'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpNegFN = scalar keys %{ $afpNegFN{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpNegVal > 0) { $ratio = $countAfpNegFN / $countAfpNegVal * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpNegFN{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'fn'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'neg'}{'val'}{'fn'}{$datatype}{'countPap'} = scalar keys %{ $afpNegFN{$datatype} };<br />
$curStats{'afp'}{'neg'}{'val'}{'fn'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpNegFnCur = scalar keys %{ $afpNegFnCur{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpNegFN > 0) { $ratio = $countAfpNegFnCur / $countAfpNegFN * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpNegFnCur{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'fn'}{'cur'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'neg'}{'val'}{'fn'}{'cur'}{$datatype}{'countPap'} = scalar keys %{ $afpNegFnCur{$datatype} };<br />
$curStats{'afp'}{'neg'}{'val'}{'fn'}{'cur'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpNegFnNC = scalar keys %{ $afpNegFnNC{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpNegFN > 0) { $ratio = $countAfpNegFnNC / $countAfpNegFN * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpNegFnNC{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'fn'}{'ncur'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'neg'}{'val'}{'fn'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpNegFnNC{$datatype} };<br />
$curStats{'afp'}{'neg'}{'val'}{'fn'}{'ncur'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpNegNV = scalar keys %{ $afpNegNV{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpNeg > 0) { $ratio = $countAfpNegNV / $countAfpNeg * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpNegNV{$datatype} }) { $curStats{'afp'}{'neg'}{'nval'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'neg'}{'nval'}{$datatype}{'countPap'} = scalar keys %{ $afpNegNV{$datatype} };<br />
$curStats{'afp'}{'neg'}{'nval'}{$datatype}{'ratio'} = $ratio;<br />
<br />
my $countAfpNegNC = scalar keys %{ $afpNegNC{$datatype} };<br />
$ratio = 0;<br />
if ($countAfpNeg > 0) { $ratio = $countAfpNegNC / $countAfpNeg * 100; $ratio = FormatSigFigs($ratio, 2); }<br />
foreach my $joinkey (keys %{ $afpNegNC{$datatype} }) { $curStats{'afp'}{'neg'}{'ncur'}{$datatype}{papers}{$joinkey}++; }<br />
$curStats{'afp'}{'neg'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpNegNC{$datatype} };<br />
$curStats{'afp'}{'neg'}{'ncur'}{$datatype}{'ratio'} = $ratio;<br />
} # foreach my $datatype (@datatypesToShow)<br />
} # sub getCurationStatisticsAfpNeg<br />
<br />
</pre><br />
<br />
<br />
'''"Any" and "Intersection" rows of the Curation Statistics table'''<br />
<br />
To determine the "Any" and "Intersection" results, all flagging methods currently visible in the Curation Statistics table are considered. So, for the main Curation Statistics table (with no options selected), all flagging methods (SVM, AFP, and CFP as of 12-10-2012) are considered. The calculations in this case would be:<br />
<br />
<pre><br />
Any flagged : %svmData OR %afpFlagged OR %cfpFlagged<br />
Any positive : %svmPos OR %afpPos OR %cfpPos<br />
Any positive validated : %svmPosVal OR %afpPosVal OR %cfpPosVal<br />
Any positive validated false positive : %svmPosFP OR %afpPosFP OR %cfpPosFP<br />
Any positive validated true positive : %svmPosTP OR %afpPosTP OR %cfpPosTP<br />
Any positive validated true positive curated : %svmPosTpCur OR %afpPosTpCur OR %cfpPosTpCur<br />
Any positive validated true positive not curated : %svmPosTpNC OR %afpPosTpNC OR %cfpPosTpNC<br />
Any positive not validated : %svmPosNV OR %afpPosNV OR %cfpPosNV<br />
Any positive not curated : %svmPosNC OR %afpPosNC OR %cfpPosNC<br />
<br />
Intersection flagged : %svmData AND %afpFlagged AND %cfpFlagged<br />
Intersection positive : %svmPos AND %afpPos AND %cfpPos<br />
Intersection positive validated : %svmPosVal AND %afpPosVal AND %cfpPosVal<br />
Intersection positive validated false positive : %svmPosFP AND %afpPosFP AND %cfpPosFP<br />
Intersection positive validated true positive : %svmPosTP AND %afpPosTP AND %cfpPosTP<br />
Intersection positive validated true positive curated : %svmPosTpCur AND %afpPosTpCur AND %cfpPosTpCur<br />
Intersection positive validated true positive not curated : %svmPosTpNC AND %afpPosTpNC AND %cfpPosTpNC<br />
Intersection positive not validated : %svmPosNV AND %afpPosNV AND %cfpPosNV<br />
Intersection positive not curated : %svmPosNC AND %afpPosNC AND %cfpPosNC<br />
</pre><br />
<br />
<br />
Note that if a curator enters the Curation Statistics table after entering deselecting any of the flagging methods in the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], the "Any" and "Intersection" sections of the table will only reflect the flagging methods chosen by the curator. Thus, if a curator chooses to view only one flagging method, the "Any", "Intersection", and "Flagged Positive" sections of the table will show identical results.<br />
<br />
<br />
The following are the correspondences between rows in the Curation Statistics table and the hash tables in the form's code:<br />
<br />
'''General paper stats'''<br />
<br />
<pre><br />
%curatablePapers curatable papers<br />
%objsCurated objects curated<br />
%objsCurated/%valCur objects curated per paper<br />
%valCur Papers curated<br />
%validated Papers validated<br />
%valPos Papers validated positive<br />
%valCur Papers validated positive curated<br />
%valPos NOT %valCur Papers validated positive not curated<br />
%valNeg Papers validated negative<br />
%conflict Papers validated conflict<br />
</pre><br />
<br />
<br />
'''Support Vector Machine paper stats'''<br />
<br />
<pre><br />
%noSvm SVM no svm processed<br />
%svmData SVM has svm<br />
%svmPos SVM positive any<br />
%svmPosVal SVM positive any validated<br />
%svmPosFP SVM positive any validated false positive<br />
%svmPosTP SVM positive any validated true positive<br />
%svmPosTpCur SVM positive any validated true positive curated<br />
%svmPosTpNC SVM positive any validated true positive not curated<br />
%svmPosNV SVM positive any not validated<br />
%svmPosNC SVM positive any not curated<br />
%svmHig SVM positive high<br />
%svmHigVal SVM positive high validated<br />
%svmHigFP SVM positive high validated false positive<br />
%svmHigTP SVM positive high validated true positive<br />
%svmHigTpCur SVM positive high validated true positive curated<br />
%svmHigTpNC SVM positive high validated true positive not curated<br />
%svmHigNV SVM positive high not validated<br />
%svmHigNC SVM positive high not curated<br />
%svmMed SVM positive medium<br />
%svmMedVal SVM positive medium validated<br />
%svmMedFP SVM positive medium validated false positive<br />
%svmMedTP SVM positive medium validated true positive<br />
%svmMedTpCur SVM positive medium validated true positive curated<br />
%svmMedTpNC SVM positive medium validated true positive not curated<br />
%svmMedNV SVM positive medium not validated<br />
%svmMedNC SVM positive medium not curated<br />
%svmLow SVM positive low<br />
%svmLowVal SVM positive low validated<br />
%svmLowFP SVM positive low validated false positive<br />
%svmLowTP SVM positive low validated true positive<br />
%svmLowTpCur SVM positive low validated true positive curated<br />
%svmLowTpNC SVM positive low validated true positive not curated<br />
%svmLowNV SVM positive low not validated<br />
%svmLowNC SVM positive low not curated<br />
%svmNeg SVM negative<br />
%svmNegVal SVM negative validated<br />
%svmNegTN SVM negative validated true negative<br />
%svmNegFN SVM negative validated false negative<br />
%svmNegFnCur SVM negative validated false negative curated<br />
%svmNegFnNC SVM negative validated false negative not curated<br />
%svmNegNV SVM negative not validated<br />
%svmNegNC SVM negative not curated<br />
</pre><br />
<br />
<br />
'''Author First Pass paper stats'''<br />
<br />
<pre><br />
%afpEmailed AFP emailed<br />
%afpFlagged AFP flagged<br />
%afpPos AFP positive<br />
%afpPosVal AFP positive validated<br />
%afpPosFP AFP positive validated false positive<br />
%afpPosTP AFP positive validated true positive<br />
%afpPosTpCur AFP positive validated true positive curated<br />
%afpPosTpNC AFP positive validated true positive not curated<br />
%afpPosNV AFP positive not validated<br />
%afpPosNC AFP positive not curated<br />
%afpNeg AFP negative<br />
%afpNegVal AFP negative validated<br />
%afpNegTN AFP negative validated true negative<br />
%afpNegFN AFP negative validated false negative<br />
%afpNegFnCur AFP negative validated false negative curated<br />
%afpNegFnNC AFP negative validated false negative not curated<br />
%afpNegNV AFP negative not validated<br />
%afpNegNC AFP negative not curated<br />
</pre><br />
<br />
<br />
'''Curator First Pass paper stats'''<br />
<br />
<pre><br />
%cfpFlagged CFP flagged<br />
%cfpPos CFP positive<br />
%cfpPosVal CFP positive validated<br />
%cfpPosFP CFP positive validated false positive<br />
%cfpPosTP CFP positive validated true positive<br />
%cfpPosTpCur CFP positive validated true positive curated<br />
%cfpPosTpNC CFP positive validated true positive not curated<br />
%cfpPosNV CFP positive not validated<br />
%cfpPosNC CFP positive not curated<br />
%cfpNeg CFP negative<br />
%cfpNegVal CFP negative validated<br />
%cfpNegTN CFP negative validated true negative<br />
%cfpNegFN CFP negative validated false negative<br />
%cfpNegFnCur CFP negative validated false negative curated<br />
%cfpNegFnNC CFP negative validated false negative not curated<br />
%cfpNegNV CFP negative not validated<br />
%cfpNegNC CFP negative not curated<br />
</pre><br />
<br />
<br />
<br />
== Postgres Table Structures ==<br />
<br />
<br />
=== cur_curdata table ===<br />
<br />
<br />
The cur_curdata table in Postgres has the following structure:<br />
<br />
<br />
<pre><br />
cur_paper | text | <br />
cur_datatype | text | <br />
cur_curator | text | <br />
cur_curdata | text | <br />
cur_selcomment | text | <br />
cur_txtcomment | text | <br />
cur_timestamp | timestamp with time zone |<br />
</pre><br />
<br />
<br />
The following is an example of a Postgres query to return cur_curdata for the paper WBPaper00031688:<br />
<br />
<pre><br />
testdb=> SELECT * FROM cur_curdata WHERE cur_paper = '00031688';<br />
cur_paper | cur_datatype | cur_curator | cur_curdata | cur_selcomment | cur_txtcomment | cur_timestamp <br />
-----------+--------------+-------------+--------------+----------------+---------------------------+-------------------------------<br />
00031688 | otherexpr | two1823 | positive | | qwer | 2012-11-20 15:48:18.113978-08<br />
00031688 | geneint | two2987 | notvalidated | 2 | test test test | 2012-11-26 17:06:30.550952-08<br />
00031688 | rnai | two2987 | notvalidated | | testing for documentation | 2012-12-13 14:25:13.360082-08<br />
(3 rows)<br />
</pre><br />
<br />
<br />
For a given paper-datatype pair:<br />
<br />
'''cur_paper''': Stores the ID # for the WBPaper ID<br />
<br />
'''cur_datatype''': Stores the datatype<br />
<br />
'''cur_curator''': Stores the curator ID in the format: "two" + "ID # from <WBPersonID>"<br />
<br />
'''cur_curdata''': Stores validation status: "positive", "negative", "curated", "notvalidated"<br />
<br />
'''cur_selcomment''': Stores the premade comment key (only one value possible by the form; Postgres is OK with more values)<br />
<br />
'''cur_txtcomment''': Stores the free-text comment<br />
<br />
'''cur_timestamp''': Stores the timestamp for the data submission through the Curation Statistics Form<br />
<br />
<br><br />
<br><br />
<br />
<br />
=== cur_svmdata table ===<br />
<br />
<br />
The cur_svmdata table in Postgres has the following structure:<br />
<br />
<br />
<pre><br />
cur_paper | text | <br />
cur_datatype | text | <br />
cur_date | text | <br />
cur_svmdata | text | <br />
cur_version | text | <br />
cur_timestamp | timestamp with time zone<br />
</pre><br />
<br />
<br />
The following is an example of a Postgres query to return cur_svmdata for the paper WBPaper00031688:<br />
<br />
<br />
<pre><br />
testdb=> SELECT * FROM cur_svmdata WHERE cur_paper = '00031688';<br />
cur_paper | cur_datatype | cur_date | cur_svmdata | cur_version | cur_timestamp <br />
-----------+--------------+----------+-------------+-------------+-------------------------------<br />
00031688 | structcorr | 20090101 | high | 0 | 2012-12-02 23:22:07.607586-08<br />
00031688 | seqchange | 20090101 | high | 0 | 2012-12-02 23:22:07.599254-08<br />
00031688 | rnai | 20090101 | high | 0 | 2012-12-02 23:22:07.590835-08<br />
00031688 | overexpr | 20090101 | high | 0 | 2012-12-02 23:22:07.582514-08<br />
00031688 | otherexpr | 20090101 | high | 0 | 2012-12-02 23:22:07.574163-08<br />
00031688 | newmutant | 20090101 | high | 0 | 2012-12-02 23:22:07.565834-08<br />
00031688 | genereg | 20090101 | high | 0 | 2012-12-02 23:22:07.557501-08<br />
00031688 | geneprod | 20090101 | NEG | 0 | 2012-12-02 23:22:07.549171-08<br />
00031688 | geneint | 20090101 | high | 0 | 2012-12-02 23:22:07.540835-08<br />
00031688 | antibody | 20090101 | NEG | 0 | 2012-12-02 23:22:07.532509-08<br />
(10 rows)<br />
</pre><br />
<br />
<br />
For a given paper-datatype pair:<br />
<br />
'''cur_paper''': Stores the ID # for the WBPaper ID<br />
<br />
'''cur_datatype''': Stores the datatype<br />
<br />
'''cur_date''': Stores the date of the SVM directory on Yuling's computer<br />
<br />
'''cur_svmdata''': Stores the SVM result: "high", "medium", "low", "NEG"<br />
<br />
'''cur_version''': Stores the SVM version ('''Notify Juancarlos when a new SVM version is used''')<br />
<br />
'''cur_timestamp''': Stores the timestamp for when the data was read (by a cronjob) into this cur_svmdata table<br />
<br />
<br />
<br><br />
<br><br />
<br />
<br />
<br />
== Script for Populating SVM Data into Postgres (cur_svmdata) ==<br />
<br />
The script is located on Tazendra here:<br />
<br />
/home/postgres/work/pgpopulation/cur_curation/cur_svmdata/populate_svm_result.pl<br />
<br />
<br />
=== Allowable Datatypes ===<br />
<br />
Currently (12-13-2012), the only allowable datatypes for SVM data are:<br />
<br />
*antibody <br />
*geneint <br />
*geneprod_GO<br />
*genereg <br />
*newmutant <br />
*otherexpr <br />
*overexpr <br />
*rnai <br />
*seqchange <br />
*structcorr<br />
<br />
=== Populating Papers for which there is only a main paper (no supplements) ===<br />
<br />
<br />
Yuling gave us a file of main-only paper IDs and this is stored in the ''%mainOnly'' hash.<br />
<br />
<br />
<pre><br />
my %mainOnly;<br />
my $mainOnly_file = '/home/postgres/work/pgpopulation/cur_curation/cur_svmdata/main_only';<br />
open (IN, "<$mainOnly_file") or die "Cannot open $mainOnly_file : $!";<br />
while (my $paper = <IN>) { chomp $paper; $paper =~ s/^WBPaper//; $mainOnly{$paper}++; }<br />
close (IN) or die "Cannot close $mainOnly_file : $!";<br />
</pre><br />
<br />
<br />
<br />
=== Populating SVM dates already in Postgres to avoid re-processing/duplication ===<br />
<br />
<br />
The following code checks Postgres for SVM dates to make sure that duplicate SVM data is not loaded into the cur_svmdata table. The ''%datesDone'' hash has the directory name/date of when a batch of SVM was processed. The ''%pg'' hash holds the paper ID, datatype, and date mapping to the SVM result.<br />
<br />
<br />
<pre><br />
sub populateFromPg {<br />
$result = $dbh->prepare( "SELECT * FROM cur_svmdata" );<br />
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";<br />
while (my @row = $result->fetchrow) {<br />
my ( $joinkey, $type, $date, $flag, $version, $timestamp ) = @row;<br />
$datesDone{$date}++;<br />
$pg{"$joinkey\t$type\t$date"} = $flag;<br />
}<br />
} # sub populateFromPg<br />
</pre><br />
<br />
<br />
<br />
=== Reading from Yuling's computer ===<br />
<br />
<br />
The SVM results are read from the following URL:<br />
<br />
http://131.215.52.209/celegans/svm_results/<br />
<br />
<br />
The code (below) looks for 'href' links of only digits (numbers) followed by a slash ('/'), ignoring any links that contain non-digits. <br />
<br />
<br />
<pre><br />
my (@dates) = $root_page =~ m/<a href=\"(\d+)\/\">/g;<br />
</pre><br />
<br />
For each of those links, the code will skip any that are already present in the ''%datesDone'' hash, as they are already in Postgres. Hence, any new results stored in older directories will be missed.<br />
<br />
The code then looks for 'href' links to files that have svm results for a given datatype. Matches on links that are made solely of 'word' characters (digits, letters, and underscores)<br />
<br />
<pre><br />
my (@date_types) = $date_page =~ m/<a href=\"(\w+)\"/g;<br />
</pre><br />
<br />
For each of those files, the code maps to the datatype. The SVM file structure begins with digits and underscores, followed by an underscore, followed by an optional 'and_missedPaper_' followed by the datatype. If this structure is not followed, the code will not find this file. Skip if there's no datatype.<br />
<br />
<pre><br />
my ($type) = $date_type =~ m/^[\d_]+_(?:and_missedPaper_)?(\w+)$/;<br />
</pre><br />
<br />
'geneprod' results are called 'geneprod_GO' in svm files, map them to 'geneprod'.<br />
If a datatype is not in the allowable datatypes list, skip it and add to error output.<br />
<br />
Get the file of results, separate into lines, for each line<br />
* get rid of doublequotes<br />
* tab-separate paper from result<br />
* skip if paper does not begin with "WBPaper<some non-space>"<br />
* separate the paper into the number corresponding to the paper ID and the modifier (e.g. 'concat' or 'sup.1'), if there is no modifier, the modifier is 'main'<br />
* if the paper ID was in the ''%mainOnly'' hash, call the modifier 'mainonly'.<br />
* skip unless the modifier is either 'concat' or 'mainonly'.<br />
* skip if the paper-datatype-date is already in postgres (redundant with skipping by dates above)<br />
* put result into the ''%hash'' hash.<br />
<br />
<pre><br />
my $date_type_url = $date_url . $date_type;<br />
my $date_type_results_page = get $date_type_url;<br />
my (@results) = split/\n/, $date_type_results_page;<br />
foreach my $result (@results) {<br />
if ($result =~ m/\"/) { $result =~ s/\"//g; }<br />
my ($paper, $flag) = split/\t/, $result;<br />
next unless ($paper =~ m/^WBPaper[\S]+/);<br />
my ($joinkey, $modifier) = &getPaperAndModifier($paper);<br />
if ($mainOnly{$joinkey}) { $modifier = 'mainonly'; }<br />
next unless ($modifierWholePaper{$modifier});<br />
my ($tabKey) = &makeTabKey( $joinkey, $modifier, $type, $date );<br />
next if ($pg{$tabKey}); # skip if already in postgres<br />
$hash{$tabKey} = $flag;<br />
}<br />
</pre><br />
<br />
For specific dates/ directories, look at the corresponding <dates>/checkFalseNegatives/ directory for files with SVM results that are negative (NEG). <br />
<br />
The code is the same, but the regular expression to match for datatypes is : The SVM file structure begins with digits and underscores, followed by an underscore, followed by an optional 'and_missedPaper_', followed by 'checkFN_', followed by the datatype. If this structure is not followed, the code will not find this file.<br />
<br />
<pre><br />
my ($type) = $fn_date_type =~ m/^[\d_]+_(?:and_missedPaper_)?checkFN_(\w+)$/;<br />
</pre><br />
<br />
The SVM result for all of these are set to NEG (negative).<br />
<br />
For each of the values that were entered into ''%hash'', the version is '0' for dates before 2012-06-28, and the version is '1' otherwise.<br />
<br />
<pre><br />
my $version = '0';<br />
if ($date < 20120628) { $version = '0'; }<br />
else { $version = '1'; }<br />
</pre><br />
<br />
Enter into postgres cur_svmdata tables the paperID, datatype, svmdate, svm_result, version.<br />
<br />
<pre><br />
push @pgcommands, qq(INSERT INTO cur_svmdata VALUES('$joinkey', '$type', '$date', '$flag', '$version'));<br />
</pre><br />
<br />
Any errors because of invalid datatypes, or SVM results that do not match a WBPaper, nor a paper.something get printed at the end of the output file (change this to email Daniela + Chris).<br />
<br />
(TODO set cronjob to run every day at 4am ?)<br />
<br />
<br><br />
<br><br />
<br />
= Abbreviations =<br />
<br />
'''AFP''' - Author First Pass (flagging method)<br />
<br><br />
<br><br />
'''CFP''' - Curator First Pass (flagging method)<br />
<br><br />
<br><br />
'''OA''' - Ontology Annotator (curation tool)<br />
<br><br />
<br><br />
'''SVM''' - Support Vector Machine (flagging method)<br />
<br><br />
<br><br />
<br />
= Definitions =<br />
<br />
<br />
'''curated''' - Any paper that has been curated, as determined by its presence in the OA or in cur_curdata. Note that if a paper is considered 'curated' it is automatically considered 'validated positive'<br />
<br><br />
<br><br />
'''cur_curdata''' - The Postgres table that captures, for a given paper-datatype pair, the information captured by this form, including validation status, premade comments, and free-text comments<br />
<br><br />
<br><br />
'''datatype''' - A type of data of that WormBase curates<br />
<br><br />
<br><br />
'''flagged''' - Processed by a flagging method (flagged positive OR negative)<br />
<br><br />
<br><br />
'''flagging method''' - Manual or automated method for identifying research articles that contain a particular datatype<br />
<br><br />
<br><br />
'''validated''' - Definitively confirmed by a curator to have (or not have) the relevant datatype<br />
<br><br />
<br><br />
<br />
=Datatypes for Textpresso String Searches=<br />
<br />
Enter your datatype here, and the URL where the data is.<br />
<br />
Sequence feature -> http://textpresso-dev.caltech.edu/regulatory_region/fullcorpus_result20141023 I will talk to Yuling and see if the URL can be date-agnostic<br />
Transgene -> http://textpresso-dev.caltech.edu/transgene/transgenes_in_regular_papers.out<br />
<br />
Variation -> http://textpresso-dev.caltech.edu/gsa/worm/alleles/ [index of results] [newfile docs are svm results]<br />
<br />
Human disease -> http://textpresso-dev.caltech.edu/disease/script_runs/20140908/Disease/ [Papers are organized by year, in the directory for each year, at the top, is the list of non-review papers]<br />
<br />
Antibody -> http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Sequence_Feature&diff=24906Sequence Feature2014-09-19T18:10:12Z<p>Xdwang: </p>
<hr />
<div>== Flagging papers ==<br />
<br />
send them to worm-bug@sanger.ac.uk<br />
This is where papers identified by svms/pattern matching are sent. We will be moving away from this ticketing system,<br />
but for the meantime they will all be in the same place.<br />
<br />
<br />
== Rules for marking up regions (from GW)==<br />
<br />
*If a region is necessary and sufficient to drive a reporter gene, then mark it as an 'enhancer' or 'silencer'.<br />
(I don't think these are the classic definitions for enhancer/silencer, RL)<br />
<br />
*If a region is both an enhancer and a silencer, then it should have the SO_term tags for both of these.<br />
<br />
*If mobility shift experiments or similar experimental evidence is available to assert that a short region is a TF binding site, then mark it as a TF_binding_site.<br />
<br />
*Similarity to a known binding motif is not evidence of being a TF_binding_site.<br />
<br />
*If there is no evidence for a TF binding site and it has an effect on expression when mutated or deleted, but is not sufficient to drive a reporter gene, then we cannot assert that it is an enhancer or a TF binding site. Mark it as an anonymous 'regulatory_region'.<br />
<br />
*If a region has the properties of being both a TF binding site and an enhancer then mark it up as two Features, one a TF_binding_site and one an enhancer.<br />
<br />
*If a region is asserted to be a promoter region in the paper and it is within 200bp (or thereabouts?) of the 5' of the target gene and it is neccessary and sufficient to promote a reporter gene, mark it as a promoter. If in doubt, consider marking it as an enhancer.<br />
<br />
<br />
=== Example for sequence feature curation ===<br />
<br />
the example is from WBPaper00003631<br />
<br />
<pre><br />
<br />
Feature : "egl-1_temp_1.1"<br />
Sequence VF23B12L<br />
Mapping_target VF23B12L<br />
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg<br />
DNA_text CTCCTAACCGGGTGGTC<br />
Description "This is a TRA-1 binding site that represses egl-1."<br />
Remark "This is the TF_binding_site for TRA-1 which silences egl-1. <br />
N.B. a 'silencer' Feature has also been made at this location to aid expression and interaction curation<br />
[2013-07-23 gw3]"<br />
Associated_with_gene WBGene00001170 // egl-1<br />
Bound_by_product_of WBGene00006604 // tra-1<br />
Transcription_factor WBTranscriptionFactor000029 // tra-1<br />
Method TF_binding_site<br />
SO_term "SO:0000235" // TF_binding_site<br />
Defined_by_paper WBPaper00003631<br />
Public_name "TRA-1 binding site"<br />
<br />
Feature : "egl-1_temp_1.2"<br />
Sequence VF23B12L<br />
Mapping_target VF23B12L<br />
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg<br />
DNA_text CTCCTAACCGGGTGGTC<br />
Description "This is the silencer of egl-1, containing a single TF_binding_site bound by TRA-1."<br />
Remark "Made this 'silencer' feature in addition to the TRA-1 TF_binding_site Feature to aid expression <br />
and interaction curation [2013-07-23 gw3]"<br />
Associated_with_gene WBGene00001170 // egl-1<br />
Method silencer<br />
SO_term "SO:0000625" // silencer<br />
Defined_by_paper WBPaper00003631<br />
Public_name "TRA-1 binding site silencer"<br />
<br />
</pre><br />
<br />
Most Expr_pattern and Interaction objects will be attached to the 'enhancer/silencer' Features rather than the TF_binding_site Features<br />
<br />
== Link to Gene Regulation/Regulatory interactions ==<br />
<br />
<br />
Two types of gene_regulation can be linked to feature:<br />
<br />
* '''trans-regulation:''' TF A regulates target B through element C <br />
<br />
In this situation, our current interaction model already accommodate this data and links feature object via:<br />
<br />
?Interaction<br />
<br />
Interaction_associated_feature  ?Feature  XREF Associated_with_Interaction //trans-regulation<br />
<br />
* '''cis-regulation:''' enhancer element C (cis-regulator) cis-regulates gene B<br />
<br />
Current interaction model needs to be modified to accommodate this type of data by adding new tag:<br />
<br />
?Interaction<br />
<br />
Feature_interactor  ?Feature  XREF  Interacting_feature  #Interactor_info //cis-regulation<br />
<br />
<br />
We will propose corresponding feature model change to have one-to-one XREF between the models. The intention<br />
is that interactions that explicitly state a sequence feature object as an<br />
interactor in a physical or regulatory interaction can refer to a ?Feature<br />
object as a "Feature_interactor". Alternatively, when there is less direct<br />
evidence or the association is more vague, we would make use of the<br />
"Interaction_associated_feature" tag. The XREFs will then link to the<br />
appropriate tags in the corresponding objects.<br />
<br />
*proposed model change:<br />
<br />
?Interaction<br />
<br />
   Feature_interactor  ?Feature  XREF  Interacting_feature  #Interactor_info //cis-regulation<br />
<br />
   Interaction_associated_feature  ?Feature  XREF Associated_with_Interaction //trans-regulation<br />
<br />
?Feature<br />
<br />
   Interacting_feature  ?Interaction  XREF  Feature_interactor //cis-regulation<br />
<br />
   Associated_with_Interaction  ?Interaction  XREF Interaction_associated_feature //trans-regulation<br />
<br />
== Link to Expression pattern ==<br />
<br />
'''When''' do we link sequence features to Expression Pattern objects and '''how'''.<br />
<br />
<br />
'''Example 1 -from WBPaper00003631:''' <br />
<br />
"The egl-1 gene appears to be expressed in the HSNs in males." The construct used is [Pegl-1::gfp] transcriptional fusion.<br />
<br />
* Curator creates an Expression object for egl-1 in the male's HSN and links it to pegl-1::GFP transgene. <br />
<br />
<pre><br />
Expr_pattern : "Expr11092"<br />
Anatomy_term "WBbt:0004757" Certain //HSNR<br />
Anatomy_term "WBbt:0004758" Certain //HSNL<br />
Anatomy_term "WBbt:0007850" Certain //male<br />
Gene "WBGene00001170"//egl-1<br />
Pattern "The egl-1 gene appears to be expressed in the HSNs in males, in which the HSNs normally undergo <br />
programmed cell death, but not in hermaphrodites, in which the HSNs normally survive."<br />
Reference "WBPaper00003631"<br />
Reporter_gene "[Pegl-1::gfp] transcriptional fusion. To construct Pegl-1::gfp, bases +174 to +5820 (5'-3') <br />
downstream of the stop codon of the egl-1 gene and bases -1914 to -837 (5'-3') upstream of the stop codon were<br />
amplified with appropriate primers and cloned into the SpeI-ApaI (5'-3') and PstI-BamHI (5'-3') sites of <br />
vector pPD95.69, respectively (A. Fire et al., personal communication). --precise ends."<br />
<br />
</pre><br />
<br />
* Sequence curator creates a sequence feature for that object -we are not there yet but we should aim for it. <br />
* In the sequence feature object there will be a link to the expression.<br />
<br />
note that in this expression object we have, as per the Expression_pattern model <br />
<br />
<pre><br />
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern<br />
<br />
</pre><br />
<br />
* The Expression pattern object in this case is linked to the gene as the authors hypothesize that the transcriptional fusion expression is the endogenous egl-1 expression.<br />
<br />
<br />
<br />
'''Example 2 from WBPaper00003631 (hypothetical made up example- in this specific paper there's not such evidence but might be a scenario):'''<br />
<br />
"This specific sequence of 80bp is expressed in the HSNL. The construct used is [80bp-egl-1::gfp].<br />
<br />
1) One way to go is to link the expression to the sequence, other than the gene. From the Expr_pattern model:<br />
<br />
<pre><br />
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern<br />
Sequence ?Sequence XREF Expr_pattern <br />
</pre><br />
<br />
<pre><br />
<br />
Expr_pattern : "Expr11093"<br />
Anatomy_term "WBbt:0004758" Certain //HSNL<br />
Sequence "???"<br />
Pattern "This particular sequence::GFP was expressed in HSNL"<br />
Reference "WBPaper00003631"<br />
Reporter_gene "[80bp-egl-1::gfp]. To construct 80bp-egl-1::gfp..."<br />
<br />
</pre><br />
<br />
* Sequence curator creates a sequence feature for that object. <br />
* In the sequence feature object there will be a link to the expression.<br />
* The Expression pattern object in this case is linked to the sequence as the artificial construct might not resemble the endogenous egl-1 expression.<br />
* It will be generally hard to determine where is the boundary between artificial and endogenous expression if no other experimental evidences -IHC, ish- are available.<br />
'''* If we curate the objects this way we should determine how to display them on the site. Separate from other expression objects?<br />
'''<br />
<br />
2) Another option would be to include those objects in Gene regulation other than expression. That specific sequence is responsible for expression in..<br />
<br />
'''How were these kinds of objects curated in the past? Was it via gene_regulation Cis_regulated_seq?'''<br />
<br />
Although 'Cis_regulated_seq' existed in old gene_regulation model, it was never used for any objects both in Wen and Xiaodong's hands. In new Interaction modle, this tag is gone. --XW<br />
<br />
3) A third possibility is to add Drives_expression_in in the feature object<br />
<br />
<pre><br />
Drives_expression_in<br />
<br />
Life_stage ?Life_stage <br />
Anatomy_term ?Anatomy_term <br />
GO_term ?GO_term <br />
</pre><br />
<br />
This is a favorable way as it will not "contaminate" the expression pattern class and at the same time the info of expression of the enhancer is captured. In REDfly (Regulatory Element Database for Drosophila, http://redfly.ccr.buffalo.edu/) the enhancer region is annotated to the anatomy terms but that expression is not listed under the classic expression patterns. See for example the decapentaplegic gene (dpp) construct dpp_303lacZ.<br />
<br />
In the example of Hwang and Sternberg, 2004 (WBPaper00006370), the feature object will be<br />
<br />
<pre><br />
Feature : <br />
Public_name "lin-3 enhancer"<br />
Sequence F36H1<br />
Description "lin-3 enhancer region, driving anchor cell (AC) specific expression"<br />
Flanking_sequences "ctagaacttcccgtctctccctattcaatg" "cttaccaatgtctcaggcatttttggaaaa" <br />
Mapping_target F36H1<br />
Associated_with_gene WBGene00002992 // lin-3<br />
Species "Caenorhabditis elegans"<br />
Defined_by_paper WBPaper00006370<br />
SO_term SO:0000165 // enhancer<br />
Method enhancer <br />
Associated_with_Interaction WBInteraction000501966// hlh-2 binds to lin-3<br />
Associated_with_Interaction WBInteraction000520204// nhr-25 binds to lin-3<br />
Anatomy_term "WBbt:0004522"//Anchor cell<br />
<br />
</pre><br />
<br />
4) We could simply generate an Expr_pattern object and add the Associated_feature ?Feature. For display purposes on the site we can display objects that have Associated_feature in a separate section<br />
<br />
'''Example 3 from WBPaper00003631:'''<br />
<br />
"The egl-1 gene appears to be expressed in the HSNs in males (Pegl-1::GFP reporter)...if tra-1 is bound to egl-1 the expression in HSNs is repressed"<br />
<br />
*The region of tra-1 binding to egl-1 is known and 2 sequence features are created for it, one as TF_binding_site and one for silencer. <br />
*A gene regulation object is created -> egl-1 downregulation in HSN.<br />
*The object is added in the silencer sequence feature object. <br />
<br />
Should we create an expression object for the tra-1 binding site? in this case should create a negative expression. egl-1 is NOT expressed in HSNs if bound by tra-1. This falls under gene regulation to me -DR<br />
<br />
Should we link to the existing expression pattern Expr_pattern : "Expr11092" -see above? This might not be appropriate as Expr11092 depicts expression in male HSNs. If we want to pull out that info we could do it anyway through the gene regulation object -DR<br />
<br />
Should we just leave the gene regulation association? <br />
<br />
As of now few Expression Patterns are linked to the Genome Browser (Vancouver set is the only data set). The ultimate goal is to map, whenever we can, expression constructs to the genome browser.<br />
<br />
== Top down approach ==<br />
<br />
We are brainstorming in order to develop a model that will be suitable for accommodating curation of all the above.<br />
<br />
<br />
The potential model should contain the following info<br />
<br />
for Expression<br />
<br />
* sequence - the sequence could be any stretch of DNA from few bp to kbs <br />
(?Feature, 1 or more)<br />
<br />
* reporter -GFP, RFP, YFP, mCherry, Venus,... <br />
(+ Other: text, including when endogenous gene is used as the (part of, e.g. gfp fused in) reporter)<br />
<br />
* gene (the gene immediately downstream of the sequence) non unique because it could be associated to more than one gene<br />
(NOT annotate gene because 1. the base model is about describing the pattern of expression, 2. location information intrinsically informs possible cis-targets, 3. if author asserts relevant genes, that should go in some ?Regulation)<br />
<br />
* Reflects_endogenous_expression_of ?Gene #if the author assume that expression reflects the endogenous then we put it otherwise not<br />
* anatomy term<br />
<br />
* life stage<br />
<br />
* (sex will be encoded in life stage and anatomy)<br />
<br />
* WBPaper<br />
<br />
* experimental info?<br />
<br />
* other info will be textual<br />
<br />
After brainstorming (people involved Xiaodong, Raymond, Wen, Daniela) we agreed the current Expr_model can accommodate most of the changes proposed above. The only modification that should be done is to add the <br />
*Reflects_endogenous_expression_of ?Gene #if the authors assume that the expression reflects the endogenous one we put it otherwise not<br />
<br />
for all the *artificial* constructs we will not populate the tag. Daniela will start curation and see if everything fits with the proposal. If so, will request a model change. <br />
<br />
<br />
for Regulation<br />
<br />
Next topics: capture regulation, post-transcriptional regulation<br />
Agreement has been reached for gene regulation objects and is summarized in a chapter above.<br />
<br />
==Sequence Feature Model==<br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
== OA interface==<br />
===Tab1===<br />
*PGID, dumps as N/A<br />
*Feature ID, text, dumps as <br />
*Public Name, text<br />
*Other Name, text<br />
*Description, text, Dumps as Description<br />
*Curator - (Dropdown) sf_curator Dumps as: N/A<br />
*Paper - (Multiontology) sf_paper Dumps as: Defined_by_paper <Paper><br />
*Species<br />
*Strain<br />
*Merged_into<br />
*Acquires_merge<br />
*Deprecated, text<br />
*Author sf_author, Dumps as : Defined_by_author<br />
*Not sure about the rest of 'Defined_by' tags (person, analysis, sequence)<br />
<br />
===Tab2===<br />
*S-parent<br />
*Flanking sequences<br />
*Mapping target<br />
*Source location<br />
*SO terms<br />
*Methods, text, (Dropdown), sf_method, Dumps as Method<br />
*Sequence, Dumps as Defined_by_sequence?<br />
<br />
===Tab3===<br />
*Gene - ?Gene (multiontology), sf_gene, Dumps as Associated_with_gene<br />
*CDs - ?CDS (multiontology), sf_CDS, Dumps as Associated_with_CDS<br />
*Transcript -?Transcript (multiontology), sf_transcript, Dumps as Associated_with_transcript<br />
*Pseudogene -?Pseudogene (multiontology), sf_pseudogene, Dumps as Associated_with_pseudogene <br />
*Transposon -?Transposon (multionlogy), sf_transposon, Dumps as Associated_with_transposon <br />
*Variation -?Variation (multiontology), sf_variation, Dumps as Associated_with_variation <br />
*Position_Matrix -?Position_Matrix (multiontology), sf_pwm, Dumps as Associated_with_Position_Matrix <br />
*Operon -?Operon (multiontology), sf_operon, Dumps as Associated_with_operon <br />
*Interaction -?Interaction (multiontology), sf_interaction, Dumps as Associated_with_Interaction<br />
*Expression -?Expr_pattern (multiontology), sf_expr, Dumps as Associated_with_expression_pattern <br />
*Construct - ?Construct (multiontology), sf_construct, Dumps as Associated_with_Feature<br />
*Bound By Product Of -?Gene (multiontolgy), sf_bound_by_product, Dumps as Bound_by_product_of<br />
*Transcription Factor -?Trascription_factor (multiontology), Dumps as Transcription_factor<br />
*Remark, text, sf_remark, Dumps as Remark</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Sequence_Feature&diff=24905Sequence Feature2014-09-19T18:04:07Z<p>Xdwang: </p>
<hr />
<div>== Flagging papers ==<br />
<br />
send them to worm-bug@sanger.ac.uk<br />
This is where papers identified by svms/pattern matching are sent. We will be moving away from this ticketing system,<br />
but for the meantime they will all be in the same place.<br />
<br />
<br />
== Rules for marking up regions (from GW)==<br />
<br />
*If a region is necessary and sufficient to drive a reporter gene, then mark it as an 'enhancer' or 'silencer'.<br />
(I don't think these are the classic definitions for enhancer/silencer, RL)<br />
<br />
*If a region is both an enhancer and a silencer, then it should have the SO_term tags for both of these.<br />
<br />
*If mobility shift experiments or similar experimental evidence is available to assert that a short region is a TF binding site, then mark it as a TF_binding_site.<br />
<br />
*Similarity to a known binding motif is not evidence of being a TF_binding_site.<br />
<br />
*If there is no evidence for a TF binding site and it has an effect on expression when mutated or deleted, but is not sufficient to drive a reporter gene, then we cannot assert that it is an enhancer or a TF binding site. Mark it as an anonymous 'regulatory_region'.<br />
<br />
*If a region has the properties of being both a TF binding site and an enhancer then mark it up as two Features, one a TF_binding_site and one an enhancer.<br />
<br />
*If a region is asserted to be a promoter region in the paper and it is within 200bp (or thereabouts?) of the 5' of the target gene and it is neccessary and sufficient to promote a reporter gene, mark it as a promoter. If in doubt, consider marking it as an enhancer.<br />
<br />
<br />
=== Example for sequence feature curation ===<br />
<br />
the example is from WBPaper00003631<br />
<br />
<pre><br />
<br />
Feature : "egl-1_temp_1.1"<br />
Sequence VF23B12L<br />
Mapping_target VF23B12L<br />
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg<br />
DNA_text CTCCTAACCGGGTGGTC<br />
Description "This is a TRA-1 binding site that represses egl-1."<br />
Remark "This is the TF_binding_site for TRA-1 which silences egl-1. <br />
N.B. a 'silencer' Feature has also been made at this location to aid expression and interaction curation<br />
[2013-07-23 gw3]"<br />
Associated_with_gene WBGene00001170 // egl-1<br />
Bound_by_product_of WBGene00006604 // tra-1<br />
Transcription_factor WBTranscriptionFactor000029 // tra-1<br />
Method TF_binding_site<br />
SO_term "SO:0000235" // TF_binding_site<br />
Defined_by_paper WBPaper00003631<br />
Public_name "TRA-1 binding site"<br />
<br />
Feature : "egl-1_temp_1.2"<br />
Sequence VF23B12L<br />
Mapping_target VF23B12L<br />
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg<br />
DNA_text CTCCTAACCGGGTGGTC<br />
Description "This is the silencer of egl-1, containing a single TF_binding_site bound by TRA-1."<br />
Remark "Made this 'silencer' feature in addition to the TRA-1 TF_binding_site Feature to aid expression <br />
and interaction curation [2013-07-23 gw3]"<br />
Associated_with_gene WBGene00001170 // egl-1<br />
Method silencer<br />
SO_term "SO:0000625" // silencer<br />
Defined_by_paper WBPaper00003631<br />
Public_name "TRA-1 binding site silencer"<br />
<br />
</pre><br />
<br />
Most Expr_pattern and Interaction objects will be attached to the 'enhancer/silencer' Features rather than the TF_binding_site Features<br />
<br />
== Link to Gene Regulation/Regulatory interactions ==<br />
<br />
<br />
Two types of gene_regulation can be linked to feature:<br />
<br />
* '''trans-regulation:''' TF A regulates target B through element C <br />
<br />
In this situation, our current interaction model already accommodate this data and links feature object via:<br />
<br />
?Interaction<br />
<br />
Interaction_associated_feature  ?Feature  XREF Associated_with_Interaction //trans-regulation<br />
<br />
* '''cis-regulation:''' enhancer element C (cis-regulator) cis-regulates gene B<br />
<br />
Current interaction model needs to be modified to accommodate this type of data by adding new tag:<br />
<br />
?Interaction<br />
<br />
Feature_interactor  ?Feature  XREF  Interacting_feature  #Interactor_info //cis-regulation<br />
<br />
<br />
We will propose corresponding feature model change to have one-to-one XREF between the models. The intention<br />
is that interactions that explicitly state a sequence feature object as an<br />
interactor in a physical or regulatory interaction can refer to a ?Feature<br />
object as a "Feature_interactor". Alternatively, when there is less direct<br />
evidence or the association is more vague, we would make use of the<br />
"Interaction_associated_feature" tag. The XREFs will then link to the<br />
appropriate tags in the corresponding objects.<br />
<br />
*proposed model change:<br />
<br />
?Interaction<br />
<br />
   Feature_interactor  ?Feature  XREF  Interacting_feature  #Interactor_info //cis-regulation<br />
<br />
   Interaction_associated_feature  ?Feature  XREF Associated_with_Interaction //trans-regulation<br />
<br />
?Feature<br />
<br />
   Interacting_feature  ?Interaction  XREF  Feature_interactor //cis-regulation<br />
<br />
   Associated_with_Interaction  ?Interaction  XREF Interaction_associated_feature //trans-regulation<br />
<br />
== Link to Expression pattern ==<br />
<br />
'''When''' do we link sequence features to Expression Pattern objects and '''how'''.<br />
<br />
<br />
'''Example 1 -from WBPaper00003631:''' <br />
<br />
"The egl-1 gene appears to be expressed in the HSNs in males." The construct used is [Pegl-1::gfp] transcriptional fusion.<br />
<br />
* Curator creates an Expression object for egl-1 in the male's HSN and links it to pegl-1::GFP transgene. <br />
<br />
<pre><br />
Expr_pattern : "Expr11092"<br />
Anatomy_term "WBbt:0004757" Certain //HSNR<br />
Anatomy_term "WBbt:0004758" Certain //HSNL<br />
Anatomy_term "WBbt:0007850" Certain //male<br />
Gene "WBGene00001170"//egl-1<br />
Pattern "The egl-1 gene appears to be expressed in the HSNs in males, in which the HSNs normally undergo <br />
programmed cell death, but not in hermaphrodites, in which the HSNs normally survive."<br />
Reference "WBPaper00003631"<br />
Reporter_gene "[Pegl-1::gfp] transcriptional fusion. To construct Pegl-1::gfp, bases +174 to +5820 (5'-3') <br />
downstream of the stop codon of the egl-1 gene and bases -1914 to -837 (5'-3') upstream of the stop codon were<br />
amplified with appropriate primers and cloned into the SpeI-ApaI (5'-3') and PstI-BamHI (5'-3') sites of <br />
vector pPD95.69, respectively (A. Fire et al., personal communication). --precise ends."<br />
<br />
</pre><br />
<br />
* Sequence curator creates a sequence feature for that object -we are not there yet but we should aim for it. <br />
* In the sequence feature object there will be a link to the expression.<br />
<br />
note that in this expression object we have, as per the Expression_pattern model <br />
<br />
<pre><br />
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern<br />
<br />
</pre><br />
<br />
* The Expression pattern object in this case is linked to the gene as the authors hypothesize that the transcriptional fusion expression is the endogenous egl-1 expression.<br />
<br />
<br />
<br />
'''Example 2 from WBPaper00003631 (hypothetical made up example- in this specific paper there's not such evidence but might be a scenario):'''<br />
<br />
"This specific sequence of 80bp is expressed in the HSNL. The construct used is [80bp-egl-1::gfp].<br />
<br />
1) One way to go is to link the expression to the sequence, other than the gene. From the Expr_pattern model:<br />
<br />
<pre><br />
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern<br />
Sequence ?Sequence XREF Expr_pattern <br />
</pre><br />
<br />
<pre><br />
<br />
Expr_pattern : "Expr11093"<br />
Anatomy_term "WBbt:0004758" Certain //HSNL<br />
Sequence "???"<br />
Pattern "This particular sequence::GFP was expressed in HSNL"<br />
Reference "WBPaper00003631"<br />
Reporter_gene "[80bp-egl-1::gfp]. To construct 80bp-egl-1::gfp..."<br />
<br />
</pre><br />
<br />
* Sequence curator creates a sequence feature for that object. <br />
* In the sequence feature object there will be a link to the expression.<br />
* The Expression pattern object in this case is linked to the sequence as the artificial construct might not resemble the endogenous egl-1 expression.<br />
* It will be generally hard to determine where is the boundary between artificial and endogenous expression if no other experimental evidences -IHC, ish- are available.<br />
'''* If we curate the objects this way we should determine how to display them on the site. Separate from other expression objects?<br />
'''<br />
<br />
2) Another option would be to include those objects in Gene regulation other than expression. That specific sequence is responsible for expression in..<br />
<br />
'''How were these kinds of objects curated in the past? Was it via gene_regulation Cis_regulated_seq?'''<br />
<br />
Although 'Cis_regulated_seq' existed in old gene_regulation model, it was never used for any objects both in Wen and Xiaodong's hands. In new Interaction modle, this tag is gone. --XW<br />
<br />
3) A third possibility is to add Drives_expression_in in the feature object<br />
<br />
<pre><br />
Drives_expression_in<br />
<br />
Life_stage ?Life_stage <br />
Anatomy_term ?Anatomy_term <br />
GO_term ?GO_term <br />
</pre><br />
<br />
This is a favorable way as it will not "contaminate" the expression pattern class and at the same time the info of expression of the enhancer is captured. In REDfly (Regulatory Element Database for Drosophila, http://redfly.ccr.buffalo.edu/) the enhancer region is annotated to the anatomy terms but that expression is not listed under the classic expression patterns. See for example the decapentaplegic gene (dpp) construct dpp_303lacZ.<br />
<br />
In the example of Hwang and Sternberg, 2004 (WBPaper00006370), the feature object will be<br />
<br />
<pre><br />
Feature : <br />
Public_name "lin-3 enhancer"<br />
Sequence F36H1<br />
Description "lin-3 enhancer region, driving anchor cell (AC) specific expression"<br />
Flanking_sequences "ctagaacttcccgtctctccctattcaatg" "cttaccaatgtctcaggcatttttggaaaa" <br />
Mapping_target F36H1<br />
Associated_with_gene WBGene00002992 // lin-3<br />
Species "Caenorhabditis elegans"<br />
Defined_by_paper WBPaper00006370<br />
SO_term SO:0000165 // enhancer<br />
Method enhancer <br />
Associated_with_Interaction WBInteraction000501966// hlh-2 binds to lin-3<br />
Associated_with_Interaction WBInteraction000520204// nhr-25 binds to lin-3<br />
Anatomy_term "WBbt:0004522"//Anchor cell<br />
<br />
</pre><br />
<br />
4) We could simply generate an Expr_pattern object and add the Associated_feature ?Feature. For display purposes on the site we can display objects that have Associated_feature in a separate section<br />
<br />
'''Example 3 from WBPaper00003631:'''<br />
<br />
"The egl-1 gene appears to be expressed in the HSNs in males (Pegl-1::GFP reporter)...if tra-1 is bound to egl-1 the expression in HSNs is repressed"<br />
<br />
*The region of tra-1 binding to egl-1 is known and 2 sequence features are created for it, one as TF_binding_site and one for silencer. <br />
*A gene regulation object is created -> egl-1 downregulation in HSN.<br />
*The object is added in the silencer sequence feature object. <br />
<br />
Should we create an expression object for the tra-1 binding site? in this case should create a negative expression. egl-1 is NOT expressed in HSNs if bound by tra-1. This falls under gene regulation to me -DR<br />
<br />
Should we link to the existing expression pattern Expr_pattern : "Expr11092" -see above? This might not be appropriate as Expr11092 depicts expression in male HSNs. If we want to pull out that info we could do it anyway through the gene regulation object -DR<br />
<br />
Should we just leave the gene regulation association? <br />
<br />
As of now few Expression Patterns are linked to the Genome Browser (Vancouver set is the only data set). The ultimate goal is to map, whenever we can, expression constructs to the genome browser.<br />
<br />
== Top down approach ==<br />
<br />
We are brainstorming in order to develop a model that will be suitable for accommodating curation of all the above.<br />
<br />
<br />
The potential model should contain the following info<br />
<br />
for Expression<br />
<br />
* sequence - the sequence could be any stretch of DNA from few bp to kbs <br />
(?Feature, 1 or more)<br />
<br />
* reporter -GFP, RFP, YFP, mCherry, Venus,... <br />
(+ Other: text, including when endogenous gene is used as the (part of, e.g. gfp fused in) reporter)<br />
<br />
* gene (the gene immediately downstream of the sequence) non unique because it could be associated to more than one gene<br />
(NOT annotate gene because 1. the base model is about describing the pattern of expression, 2. location information intrinsically informs possible cis-targets, 3. if author asserts relevant genes, that should go in some ?Regulation)<br />
<br />
* Reflects_endogenous_expression_of ?Gene #if the author assume that expression reflects the endogenous then we put it otherwise not<br />
* anatomy term<br />
<br />
* life stage<br />
<br />
* (sex will be encoded in life stage and anatomy)<br />
<br />
* WBPaper<br />
<br />
* experimental info?<br />
<br />
* other info will be textual<br />
<br />
After brainstorming (people involved Xiaodong, Raymond, Wen, Daniela) we agreed the current Expr_model can accommodate most of the changes proposed above. The only modification that should be done is to add the <br />
*Reflects_endogenous_expression_of ?Gene #if the authors assume that the expression reflects the endogenous one we put it otherwise not<br />
<br />
for all the *artificial* constructs we will not populate the tag. Daniela will start curation and see if everything fits with the proposal. If so, will request a model change. <br />
<br />
<br />
for Regulation<br />
<br />
Next topics: capture regulation, post-transcriptional regulation<br />
Agreement has been reached for gene regulation objects and is summarized in a chapter above.<br />
<br />
==Sequence Feature Model==<br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
== OA interface==<br />
===Tab1===<br />
*PGID, dumps as N/A<br />
*Feature ID, text, dumps as <br />
*Public Name, text<br />
*Other Name, text<br />
*Description, text, Dumps as Description<br />
*Curator - (Dropdown) sf_curator Dumps as: N/A<br />
*Paper - (Multiontology) sf_paper Dumps as: Defined_by_paper <Paper><br />
*Species<br />
*Strain<br />
*Merged_into<br />
*Acquires_merge<br />
*Deprecated, text<br />
*Author sf_author, Dumps as : Defined_by_author<br />
*Not sure about the rest of 'Defined_by' tags (person, analysis, sequence)<br />
<br />
===Tab2===<br />
*S-parent<br />
*Flanking sequences<br />
*Mapping target<br />
*Source location<br />
*SO terms<br />
*Sequence, Dumps as Defined_by_sequence?<br />
<br />
===Tab3===<br />
*Gene - ?Gene (multiontology), sf_gene, Dumps as Associated_with_gene<br />
*CDs - ?CDS (multiontology), sf_CDS, Dumps as Associated_with_CDS<br />
*Transcript -?Transcript (multiontology), sf_transcript, Dumps as Associated_with_transcript<br />
*Pseudogene -?Pseudogene (multiontology), sf_pseudogene, Dumps as Associated_with_pseudogene <br />
*Transposon -?Transposon (multionlogy), sf_transposon, Dumps as Associated_with_transposon <br />
*Variation -?Variation (multiontology), sf_variation, Dumps as Associated_with_variation <br />
*Position_Matrix -?Position_Matrix (multiontology), sf_pwm, Dumps as Associated_with_Position_Matrix <br />
*Operon -?Operon (multiontology), sf_operon, Dumps as Associated_with_operon <br />
*Interaction -?Interaction (multiontology), sf_interaction, Dumps as Associated_with_Interaction<br />
*Expression -?Expr_pattern (multiontology), sf_expr, Dumps as Associated_with_expression_pattern <br />
*Construct - ?Construct (multiontology), sf_construct, Dumps as Associated_with_Feature</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Sequence_Feature&diff=24904Sequence Feature2014-09-19T17:44:19Z<p>Xdwang: /* Tab2 */</p>
<hr />
<div>== Flagging papers ==<br />
<br />
send them to worm-bug@sanger.ac.uk<br />
This is where papers identified by svms/pattern matching are sent. We will be moving away from this ticketing system,<br />
but for the meantime they will all be in the same place.<br />
<br />
<br />
== Rules for marking up regions (from GW)==<br />
<br />
*If a region is necessary and sufficient to drive a reporter gene, then mark it as an 'enhancer' or 'silencer'.<br />
(I don't think these are the classic definitions for enhancer/silencer, RL)<br />
<br />
*If a region is both an enhancer and a silencer, then it should have the SO_term tags for both of these.<br />
<br />
*If mobility shift experiments or similar experimental evidence is available to assert that a short region is a TF binding site, then mark it as a TF_binding_site.<br />
<br />
*Similarity to a known binding motif is not evidence of being a TF_binding_site.<br />
<br />
*If there is no evidence for a TF binding site and it has an effect on expression when mutated or deleted, but is not sufficient to drive a reporter gene, then we cannot assert that it is an enhancer or a TF binding site. Mark it as an anonymous 'regulatory_region'.<br />
<br />
*If a region has the properties of being both a TF binding site and an enhancer then mark it up as two Features, one a TF_binding_site and one an enhancer.<br />
<br />
*If a region is asserted to be a promoter region in the paper and it is within 200bp (or thereabouts?) of the 5' of the target gene and it is neccessary and sufficient to promote a reporter gene, mark it as a promoter. If in doubt, consider marking it as an enhancer.<br />
<br />
<br />
=== Example for sequence feature curation ===<br />
<br />
the example is from WBPaper00003631<br />
<br />
<pre><br />
<br />
Feature : "egl-1_temp_1.1"<br />
Sequence VF23B12L<br />
Mapping_target VF23B12L<br />
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg<br />
DNA_text CTCCTAACCGGGTGGTC<br />
Description "This is a TRA-1 binding site that represses egl-1."<br />
Remark "This is the TF_binding_site for TRA-1 which silences egl-1. <br />
N.B. a 'silencer' Feature has also been made at this location to aid expression and interaction curation<br />
[2013-07-23 gw3]"<br />
Associated_with_gene WBGene00001170 // egl-1<br />
Bound_by_product_of WBGene00006604 // tra-1<br />
Transcription_factor WBTranscriptionFactor000029 // tra-1<br />
Method TF_binding_site<br />
SO_term "SO:0000235" // TF_binding_site<br />
Defined_by_paper WBPaper00003631<br />
Public_name "TRA-1 binding site"<br />
<br />
Feature : "egl-1_temp_1.2"<br />
Sequence VF23B12L<br />
Mapping_target VF23B12L<br />
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg<br />
DNA_text CTCCTAACCGGGTGGTC<br />
Description "This is the silencer of egl-1, containing a single TF_binding_site bound by TRA-1."<br />
Remark "Made this 'silencer' feature in addition to the TRA-1 TF_binding_site Feature to aid expression <br />
and interaction curation [2013-07-23 gw3]"<br />
Associated_with_gene WBGene00001170 // egl-1<br />
Method silencer<br />
SO_term "SO:0000625" // silencer<br />
Defined_by_paper WBPaper00003631<br />
Public_name "TRA-1 binding site silencer"<br />
<br />
</pre><br />
<br />
Most Expr_pattern and Interaction objects will be attached to the 'enhancer/silencer' Features rather than the TF_binding_site Features<br />
<br />
== Link to Gene Regulation/Regulatory interactions ==<br />
<br />
<br />
Two types of gene_regulation can be linked to feature:<br />
<br />
* '''trans-regulation:''' TF A regulates target B through element C <br />
<br />
In this situation, our current interaction model already accommodate this data and links feature object via:<br />
<br />
?Interaction<br />
<br />
Interaction_associated_feature  ?Feature  XREF Associated_with_Interaction //trans-regulation<br />
<br />
* '''cis-regulation:''' enhancer element C (cis-regulator) cis-regulates gene B<br />
<br />
Current interaction model needs to be modified to accommodate this type of data by adding new tag:<br />
<br />
?Interaction<br />
<br />
Feature_interactor  ?Feature  XREF  Interacting_feature  #Interactor_info //cis-regulation<br />
<br />
<br />
We will propose corresponding feature model change to have one-to-one XREF between the models. The intention<br />
is that interactions that explicitly state a sequence feature object as an<br />
interactor in a physical or regulatory interaction can refer to a ?Feature<br />
object as a "Feature_interactor". Alternatively, when there is less direct<br />
evidence or the association is more vague, we would make use of the<br />
"Interaction_associated_feature" tag. The XREFs will then link to the<br />
appropriate tags in the corresponding objects.<br />
<br />
*proposed model change:<br />
<br />
?Interaction<br />
<br />
   Feature_interactor  ?Feature  XREF  Interacting_feature  #Interactor_info //cis-regulation<br />
<br />
   Interaction_associated_feature  ?Feature  XREF Associated_with_Interaction //trans-regulation<br />
<br />
?Feature<br />
<br />
   Interacting_feature  ?Interaction  XREF  Feature_interactor //cis-regulation<br />
<br />
   Associated_with_Interaction  ?Interaction  XREF Interaction_associated_feature //trans-regulation<br />
<br />
== Link to Expression pattern ==<br />
<br />
'''When''' do we link sequence features to Expression Pattern objects and '''how'''.<br />
<br />
<br />
'''Example 1 -from WBPaper00003631:''' <br />
<br />
"The egl-1 gene appears to be expressed in the HSNs in males." The construct used is [Pegl-1::gfp] transcriptional fusion.<br />
<br />
* Curator creates an Expression object for egl-1 in the male's HSN and links it to pegl-1::GFP transgene. <br />
<br />
<pre><br />
Expr_pattern : "Expr11092"<br />
Anatomy_term "WBbt:0004757" Certain //HSNR<br />
Anatomy_term "WBbt:0004758" Certain //HSNL<br />
Anatomy_term "WBbt:0007850" Certain //male<br />
Gene "WBGene00001170"//egl-1<br />
Pattern "The egl-1 gene appears to be expressed in the HSNs in males, in which the HSNs normally undergo <br />
programmed cell death, but not in hermaphrodites, in which the HSNs normally survive."<br />
Reference "WBPaper00003631"<br />
Reporter_gene "[Pegl-1::gfp] transcriptional fusion. To construct Pegl-1::gfp, bases +174 to +5820 (5'-3') <br />
downstream of the stop codon of the egl-1 gene and bases -1914 to -837 (5'-3') upstream of the stop codon were<br />
amplified with appropriate primers and cloned into the SpeI-ApaI (5'-3') and PstI-BamHI (5'-3') sites of <br />
vector pPD95.69, respectively (A. Fire et al., personal communication). --precise ends."<br />
<br />
</pre><br />
<br />
* Sequence curator creates a sequence feature for that object -we are not there yet but we should aim for it. <br />
* In the sequence feature object there will be a link to the expression.<br />
<br />
note that in this expression object we have, as per the Expression_pattern model <br />
<br />
<pre><br />
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern<br />
<br />
</pre><br />
<br />
* The Expression pattern object in this case is linked to the gene as the authors hypothesize that the transcriptional fusion expression is the endogenous egl-1 expression.<br />
<br />
<br />
<br />
'''Example 2 from WBPaper00003631 (hypothetical made up example- in this specific paper there's not such evidence but might be a scenario):'''<br />
<br />
"This specific sequence of 80bp is expressed in the HSNL. The construct used is [80bp-egl-1::gfp].<br />
<br />
1) One way to go is to link the expression to the sequence, other than the gene. From the Expr_pattern model:<br />
<br />
<pre><br />
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern<br />
Sequence ?Sequence XREF Expr_pattern <br />
</pre><br />
<br />
<pre><br />
<br />
Expr_pattern : "Expr11093"<br />
Anatomy_term "WBbt:0004758" Certain //HSNL<br />
Sequence "???"<br />
Pattern "This particular sequence::GFP was expressed in HSNL"<br />
Reference "WBPaper00003631"<br />
Reporter_gene "[80bp-egl-1::gfp]. To construct 80bp-egl-1::gfp..."<br />
<br />
</pre><br />
<br />
* Sequence curator creates a sequence feature for that object. <br />
* In the sequence feature object there will be a link to the expression.<br />
* The Expression pattern object in this case is linked to the sequence as the artificial construct might not resemble the endogenous egl-1 expression.<br />
* It will be generally hard to determine where is the boundary between artificial and endogenous expression if no other experimental evidences -IHC, ish- are available.<br />
'''* If we curate the objects this way we should determine how to display them on the site. Separate from other expression objects?<br />
'''<br />
<br />
2) Another option would be to include those objects in Gene regulation other than expression. That specific sequence is responsible for expression in..<br />
<br />
'''How were these kinds of objects curated in the past? Was it via gene_regulation Cis_regulated_seq?'''<br />
<br />
Although 'Cis_regulated_seq' existed in old gene_regulation model, it was never used for any objects both in Wen and Xiaodong's hands. In new Interaction modle, this tag is gone. --XW<br />
<br />
3) A third possibility is to add Drives_expression_in in the feature object<br />
<br />
<pre><br />
Drives_expression_in<br />
<br />
Life_stage ?Life_stage <br />
Anatomy_term ?Anatomy_term <br />
GO_term ?GO_term <br />
</pre><br />
<br />
This is a favorable way as it will not "contaminate" the expression pattern class and at the same time the info of expression of the enhancer is captured. In REDfly (Regulatory Element Database for Drosophila, http://redfly.ccr.buffalo.edu/) the enhancer region is annotated to the anatomy terms but that expression is not listed under the classic expression patterns. See for example the decapentaplegic gene (dpp) construct dpp_303lacZ.<br />
<br />
In the example of Hwang and Sternberg, 2004 (WBPaper00006370), the feature object will be<br />
<br />
<pre><br />
Feature : <br />
Public_name "lin-3 enhancer"<br />
Sequence F36H1<br />
Description "lin-3 enhancer region, driving anchor cell (AC) specific expression"<br />
Flanking_sequences "ctagaacttcccgtctctccctattcaatg" "cttaccaatgtctcaggcatttttggaaaa" <br />
Mapping_target F36H1<br />
Associated_with_gene WBGene00002992 // lin-3<br />
Species "Caenorhabditis elegans"<br />
Defined_by_paper WBPaper00006370<br />
SO_term SO:0000165 // enhancer<br />
Method enhancer <br />
Associated_with_Interaction WBInteraction000501966// hlh-2 binds to lin-3<br />
Associated_with_Interaction WBInteraction000520204// nhr-25 binds to lin-3<br />
Anatomy_term "WBbt:0004522"//Anchor cell<br />
<br />
</pre><br />
<br />
4) We could simply generate an Expr_pattern object and add the Associated_feature ?Feature. For display purposes on the site we can display objects that have Associated_feature in a separate section<br />
<br />
'''Example 3 from WBPaper00003631:'''<br />
<br />
"The egl-1 gene appears to be expressed in the HSNs in males (Pegl-1::GFP reporter)...if tra-1 is bound to egl-1 the expression in HSNs is repressed"<br />
<br />
*The region of tra-1 binding to egl-1 is known and 2 sequence features are created for it, one as TF_binding_site and one for silencer. <br />
*A gene regulation object is created -> egl-1 downregulation in HSN.<br />
*The object is added in the silencer sequence feature object. <br />
<br />
Should we create an expression object for the tra-1 binding site? in this case should create a negative expression. egl-1 is NOT expressed in HSNs if bound by tra-1. This falls under gene regulation to me -DR<br />
<br />
Should we link to the existing expression pattern Expr_pattern : "Expr11092" -see above? This might not be appropriate as Expr11092 depicts expression in male HSNs. If we want to pull out that info we could do it anyway through the gene regulation object -DR<br />
<br />
Should we just leave the gene regulation association? <br />
<br />
As of now few Expression Patterns are linked to the Genome Browser (Vancouver set is the only data set). The ultimate goal is to map, whenever we can, expression constructs to the genome browser.<br />
<br />
== Top down approach ==<br />
<br />
We are brainstorming in order to develop a model that will be suitable for accommodating curation of all the above.<br />
<br />
<br />
The potential model should contain the following info<br />
<br />
for Expression<br />
<br />
* sequence - the sequence could be any stretch of DNA from few bp to kbs <br />
(?Feature, 1 or more)<br />
<br />
* reporter -GFP, RFP, YFP, mCherry, Venus,... <br />
(+ Other: text, including when endogenous gene is used as the (part of, e.g. gfp fused in) reporter)<br />
<br />
* gene (the gene immediately downstream of the sequence) non unique because it could be associated to more than one gene<br />
(NOT annotate gene because 1. the base model is about describing the pattern of expression, 2. location information intrinsically informs possible cis-targets, 3. if author asserts relevant genes, that should go in some ?Regulation)<br />
<br />
* Reflects_endogenous_expression_of ?Gene #if the author assume that expression reflects the endogenous then we put it otherwise not<br />
* anatomy term<br />
<br />
* life stage<br />
<br />
* (sex will be encoded in life stage and anatomy)<br />
<br />
* WBPaper<br />
<br />
* experimental info?<br />
<br />
* other info will be textual<br />
<br />
After brainstorming (people involved Xiaodong, Raymond, Wen, Daniela) we agreed the current Expr_model can accommodate most of the changes proposed above. The only modification that should be done is to add the <br />
*Reflects_endogenous_expression_of ?Gene #if the authors assume that the expression reflects the endogenous one we put it otherwise not<br />
<br />
for all the *artificial* constructs we will not populate the tag. Daniela will start curation and see if everything fits with the proposal. If so, will request a model change. <br />
<br />
<br />
for Regulation<br />
<br />
Next topics: capture regulation, post-transcriptional regulation<br />
Agreement has been reached for gene regulation objects and is summarized in a chapter above.<br />
<br />
==Sequence Feature Model==<br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
== OA interface==<br />
===Tab1===<br />
*PGID, dumps as N/A<br />
*Feature ID, text, dumps as <br />
*Public Name, text<br />
*Other Name, text<br />
*Curator - (Dropdown) sf_curator Dumps as: N/A<br />
*Paper - (Multiontology) sf_paper Dumps as: Defined_by_paper <Paper><br />
*Species<br />
*Strain<br />
*Merged_into<br />
*Acquires_merge<br />
*Deprecated, text<br />
<br />
===Tab2===<br />
*S-parent<br />
*Flanking sequences<br />
*Mapping target<br />
*Source location<br />
*SO terms<br />
*<br />
<br />
===Tab3===</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Sequence_Feature&diff=24902Sequence Feature2014-09-19T15:28:37Z<p>Xdwang: </p>
<hr />
<div>== Flagging papers ==<br />
<br />
send them to worm-bug@sanger.ac.uk<br />
This is where papers identified by svms/pattern matching are sent. We will be moving away from this ticketing system,<br />
but for the meantime they will all be in the same place.<br />
<br />
<br />
== Rules for marking up regions (from GW)==<br />
<br />
*If a region is necessary and sufficient to drive a reporter gene, then mark it as an 'enhancer' or 'silencer'.<br />
(I don't think these are the classic definitions for enhancer/silencer, RL)<br />
<br />
*If a region is both an enhancer and a silencer, then it should have the SO_term tags for both of these.<br />
<br />
*If mobility shift experiments or similar experimental evidence is available to assert that a short region is a TF binding site, then mark it as a TF_binding_site.<br />
<br />
*Similarity to a known binding motif is not evidence of being a TF_binding_site.<br />
<br />
*If there is no evidence for a TF binding site and it has an effect on expression when mutated or deleted, but is not sufficient to drive a reporter gene, then we cannot assert that it is an enhancer or a TF binding site. Mark it as an anonymous 'regulatory_region'.<br />
<br />
*If a region has the properties of being both a TF binding site and an enhancer then mark it up as two Features, one a TF_binding_site and one an enhancer.<br />
<br />
*If a region is asserted to be a promoter region in the paper and it is within 200bp (or thereabouts?) of the 5' of the target gene and it is neccessary and sufficient to promote a reporter gene, mark it as a promoter. If in doubt, consider marking it as an enhancer.<br />
<br />
<br />
=== Example for sequence feature curation ===<br />
<br />
the example is from WBPaper00003631<br />
<br />
<pre><br />
<br />
Feature : "egl-1_temp_1.1"<br />
Sequence VF23B12L<br />
Mapping_target VF23B12L<br />
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg<br />
DNA_text CTCCTAACCGGGTGGTC<br />
Description "This is a TRA-1 binding site that represses egl-1."<br />
Remark "This is the TF_binding_site for TRA-1 which silences egl-1. <br />
N.B. a 'silencer' Feature has also been made at this location to aid expression and interaction curation<br />
[2013-07-23 gw3]"<br />
Associated_with_gene WBGene00001170 // egl-1<br />
Bound_by_product_of WBGene00006604 // tra-1<br />
Transcription_factor WBTranscriptionFactor000029 // tra-1<br />
Method TF_binding_site<br />
SO_term "SO:0000235" // TF_binding_site<br />
Defined_by_paper WBPaper00003631<br />
Public_name "TRA-1 binding site"<br />
<br />
Feature : "egl-1_temp_1.2"<br />
Sequence VF23B12L<br />
Mapping_target VF23B12L<br />
Flanking_sequences cagctcaattattaaattttattgggtattgttta cataaaattctattgtcccagatttaggatacatcg<br />
DNA_text CTCCTAACCGGGTGGTC<br />
Description "This is the silencer of egl-1, containing a single TF_binding_site bound by TRA-1."<br />
Remark "Made this 'silencer' feature in addition to the TRA-1 TF_binding_site Feature to aid expression <br />
and interaction curation [2013-07-23 gw3]"<br />
Associated_with_gene WBGene00001170 // egl-1<br />
Method silencer<br />
SO_term "SO:0000625" // silencer<br />
Defined_by_paper WBPaper00003631<br />
Public_name "TRA-1 binding site silencer"<br />
<br />
</pre><br />
<br />
Most Expr_pattern and Interaction objects will be attached to the 'enhancer/silencer' Features rather than the TF_binding_site Features<br />
<br />
== Link to Gene Regulation/Regulatory interactions ==<br />
<br />
<br />
Two types of gene_regulation can be linked to feature:<br />
<br />
* '''trans-regulation:''' TF A regulates target B through element C <br />
<br />
In this situation, our current interaction model already accommodate this data and links feature object via:<br />
<br />
?Interaction<br />
<br />
Interaction_associated_feature  ?Feature  XREF Associated_with_Interaction //trans-regulation<br />
<br />
* '''cis-regulation:''' enhancer element C (cis-regulator) cis-regulates gene B<br />
<br />
Current interaction model needs to be modified to accommodate this type of data by adding new tag:<br />
<br />
?Interaction<br />
<br />
Feature_interactor  ?Feature  XREF  Interacting_feature  #Interactor_info //cis-regulation<br />
<br />
<br />
We will propose corresponding feature model change to have one-to-one XREF between the models. The intention<br />
is that interactions that explicitly state a sequence feature object as an<br />
interactor in a physical or regulatory interaction can refer to a ?Feature<br />
object as a "Feature_interactor". Alternatively, when there is less direct<br />
evidence or the association is more vague, we would make use of the<br />
"Interaction_associated_feature" tag. The XREFs will then link to the<br />
appropriate tags in the corresponding objects.<br />
<br />
*proposed model change:<br />
<br />
?Interaction<br />
<br />
   Feature_interactor  ?Feature  XREF  Interacting_feature  #Interactor_info //cis-regulation<br />
<br />
   Interaction_associated_feature  ?Feature  XREF Associated_with_Interaction //trans-regulation<br />
<br />
?Feature<br />
<br />
   Interacting_feature  ?Interaction  XREF  Feature_interactor //cis-regulation<br />
<br />
   Associated_with_Interaction  ?Interaction  XREF Interaction_associated_feature //trans-regulation<br />
<br />
== Link to Expression pattern ==<br />
<br />
'''When''' do we link sequence features to Expression Pattern objects and '''how'''.<br />
<br />
<br />
'''Example 1 -from WBPaper00003631:''' <br />
<br />
"The egl-1 gene appears to be expressed in the HSNs in males." The construct used is [Pegl-1::gfp] transcriptional fusion.<br />
<br />
* Curator creates an Expression object for egl-1 in the male's HSN and links it to pegl-1::GFP transgene. <br />
<br />
<pre><br />
Expr_pattern : "Expr11092"<br />
Anatomy_term "WBbt:0004757" Certain //HSNR<br />
Anatomy_term "WBbt:0004758" Certain //HSNL<br />
Anatomy_term "WBbt:0007850" Certain //male<br />
Gene "WBGene00001170"//egl-1<br />
Pattern "The egl-1 gene appears to be expressed in the HSNs in males, in which the HSNs normally undergo <br />
programmed cell death, but not in hermaphrodites, in which the HSNs normally survive."<br />
Reference "WBPaper00003631"<br />
Reporter_gene "[Pegl-1::gfp] transcriptional fusion. To construct Pegl-1::gfp, bases +174 to +5820 (5'-3') <br />
downstream of the stop codon of the egl-1 gene and bases -1914 to -837 (5'-3') upstream of the stop codon were<br />
amplified with appropriate primers and cloned into the SpeI-ApaI (5'-3') and PstI-BamHI (5'-3') sites of <br />
vector pPD95.69, respectively (A. Fire et al., personal communication). --precise ends."<br />
<br />
</pre><br />
<br />
* Sequence curator creates a sequence feature for that object -we are not there yet but we should aim for it. <br />
* In the sequence feature object there will be a link to the expression.<br />
<br />
note that in this expression object we have, as per the Expression_pattern model <br />
<br />
<pre><br />
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern<br />
<br />
</pre><br />
<br />
* The Expression pattern object in this case is linked to the gene as the authors hypothesize that the transcriptional fusion expression is the endogenous egl-1 expression.<br />
<br />
<br />
<br />
'''Example 2 from WBPaper00003631 (hypothetical made up example- in this specific paper there's not such evidence but might be a scenario):'''<br />
<br />
"This specific sequence of 80bp is expressed in the HSNL. The construct used is [80bp-egl-1::gfp].<br />
<br />
1) One way to go is to link the expression to the sequence, other than the gene. From the Expr_pattern model:<br />
<br />
<pre><br />
Expr_pattern Expression_of Gene ?Gene XREF Expr_pattern<br />
Sequence ?Sequence XREF Expr_pattern <br />
</pre><br />
<br />
<pre><br />
<br />
Expr_pattern : "Expr11093"<br />
Anatomy_term "WBbt:0004758" Certain //HSNL<br />
Sequence "???"<br />
Pattern "This particular sequence::GFP was expressed in HSNL"<br />
Reference "WBPaper00003631"<br />
Reporter_gene "[80bp-egl-1::gfp]. To construct 80bp-egl-1::gfp..."<br />
<br />
</pre><br />
<br />
* Sequence curator creates a sequence feature for that object. <br />
* In the sequence feature object there will be a link to the expression.<br />
* The Expression pattern object in this case is linked to the sequence as the artificial construct might not resemble the endogenous egl-1 expression.<br />
* It will be generally hard to determine where is the boundary between artificial and endogenous expression if no other experimental evidences -IHC, ish- are available.<br />
'''* If we curate the objects this way we should determine how to display them on the site. Separate from other expression objects?<br />
'''<br />
<br />
2) Another option would be to include those objects in Gene regulation other than expression. That specific sequence is responsible for expression in..<br />
<br />
'''How were these kinds of objects curated in the past? Was it via gene_regulation Cis_regulated_seq?'''<br />
<br />
Although 'Cis_regulated_seq' existed in old gene_regulation model, it was never used for any objects both in Wen and Xiaodong's hands. In new Interaction modle, this tag is gone. --XW<br />
<br />
3) A third possibility is to add Drives_expression_in in the feature object<br />
<br />
<pre><br />
Drives_expression_in<br />
<br />
Life_stage ?Life_stage <br />
Anatomy_term ?Anatomy_term <br />
GO_term ?GO_term <br />
</pre><br />
<br />
This is a favorable way as it will not "contaminate" the expression pattern class and at the same time the info of expression of the enhancer is captured. In REDfly (Regulatory Element Database for Drosophila, http://redfly.ccr.buffalo.edu/) the enhancer region is annotated to the anatomy terms but that expression is not listed under the classic expression patterns. See for example the decapentaplegic gene (dpp) construct dpp_303lacZ.<br />
<br />
In the example of Hwang and Sternberg, 2004 (WBPaper00006370), the feature object will be<br />
<br />
<pre><br />
Feature : <br />
Public_name "lin-3 enhancer"<br />
Sequence F36H1<br />
Description "lin-3 enhancer region, driving anchor cell (AC) specific expression"<br />
Flanking_sequences "ctagaacttcccgtctctccctattcaatg" "cttaccaatgtctcaggcatttttggaaaa" <br />
Mapping_target F36H1<br />
Associated_with_gene WBGene00002992 // lin-3<br />
Species "Caenorhabditis elegans"<br />
Defined_by_paper WBPaper00006370<br />
SO_term SO:0000165 // enhancer<br />
Method enhancer <br />
Associated_with_Interaction WBInteraction000501966// hlh-2 binds to lin-3<br />
Associated_with_Interaction WBInteraction000520204// nhr-25 binds to lin-3<br />
Anatomy_term "WBbt:0004522"//Anchor cell<br />
<br />
</pre><br />
<br />
4) We could simply generate an Expr_pattern object and add the Associated_feature ?Feature. For display purposes on the site we can display objects that have Associated_feature in a separate section<br />
<br />
'''Example 3 from WBPaper00003631:'''<br />
<br />
"The egl-1 gene appears to be expressed in the HSNs in males (Pegl-1::GFP reporter)...if tra-1 is bound to egl-1 the expression in HSNs is repressed"<br />
<br />
*The region of tra-1 binding to egl-1 is known and 2 sequence features are created for it, one as TF_binding_site and one for silencer. <br />
*A gene regulation object is created -> egl-1 downregulation in HSN.<br />
*The object is added in the silencer sequence feature object. <br />
<br />
Should we create an expression object for the tra-1 binding site? in this case should create a negative expression. egl-1 is NOT expressed in HSNs if bound by tra-1. This falls under gene regulation to me -DR<br />
<br />
Should we link to the existing expression pattern Expr_pattern : "Expr11092" -see above? This might not be appropriate as Expr11092 depicts expression in male HSNs. If we want to pull out that info we could do it anyway through the gene regulation object -DR<br />
<br />
Should we just leave the gene regulation association? <br />
<br />
As of now few Expression Patterns are linked to the Genome Browser (Vancouver set is the only data set). The ultimate goal is to map, whenever we can, expression constructs to the genome browser.<br />
<br />
== Top down approach ==<br />
<br />
We are brainstorming in order to develop a model that will be suitable for accommodating curation of all the above.<br />
<br />
<br />
The potential model should contain the following info<br />
<br />
for Expression<br />
<br />
* sequence - the sequence could be any stretch of DNA from few bp to kbs <br />
(?Feature, 1 or more)<br />
<br />
* reporter -GFP, RFP, YFP, mCherry, Venus,... <br />
(+ Other: text, including when endogenous gene is used as the (part of, e.g. gfp fused in) reporter)<br />
<br />
* gene (the gene immediately downstream of the sequence) non unique because it could be associated to more than one gene<br />
(NOT annotate gene because 1. the base model is about describing the pattern of expression, 2. location information intrinsically informs possible cis-targets, 3. if author asserts relevant genes, that should go in some ?Regulation)<br />
<br />
* Reflects_endogenous_expression_of ?Gene #if the author assume that expression reflects the endogenous then we put it otherwise not<br />
* anatomy term<br />
<br />
* life stage<br />
<br />
* (sex will be encoded in life stage and anatomy)<br />
<br />
* WBPaper<br />
<br />
* experimental info?<br />
<br />
* other info will be textual<br />
<br />
After brainstorming (people involved Xiaodong, Raymond, Wen, Daniela) we agreed the current Expr_model can accommodate most of the changes proposed above. The only modification that should be done is to add the <br />
*Reflects_endogenous_expression_of ?Gene #if the authors assume that the expression reflects the endogenous one we put it otherwise not<br />
<br />
for all the *artificial* constructs we will not populate the tag. Daniela will start curation and see if everything fits with the proposal. If so, will request a model change. <br />
<br />
<br />
for Regulation<br />
<br />
Next topics: capture regulation, post-transcriptional regulation<br />
Agreement has been reached for gene regulation objects and is summarized in a chapter above.<br />
<br />
==Sequence Feature Model==<br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
== OA interface==<br />
===Tab1===<br />
*PGID, dumps as N/A<br />
*Feature ID, text, dumps as <br />
*Public Name, text<br />
*Other Name, text<br />
*Curator - (Dropdown) sf_curator Dumps as: N/A<br />
*Paper - (Multiontology) sf_paper Dumps as: Defined_by_paper <Paper><br />
*Species<br />
*Strain<br />
*Merged_into<br />
*Acquires_merge<br />
*Deprecated, text<br />
<br />
===Tab2===<br />
*S-parent<br />
*Flanking sequences<br />
*Mapping target<br />
*Source location<br />
*SO terms<br />
*<br />
===Tab3===</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24794Working Group:Sequence Features2014-09-17T14:23:23Z<p>Xdwang: </p>
<hr />
<div><br />
== Headline text ==<br />
'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cis-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
**** still need WBsf object details in order to relate/curate regulation/expression/construct objects<br />
<br />
<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
* Thanks Xiaodong, I'm bringing my chromebook - Gary<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
*** Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries. <br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
*** Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there. <br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
*** Fixed by Mary Ann. Changed Method to enhancer. <br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
*** Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks. <br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
*** Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run.<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
*** Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check. <br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
*** The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary).<br />
*** agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model. <br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects. See RT 418146 as well.<br />
potential cis-regulation objects for Xiaodong<br />
WBPaper00005842 Daniela<br />
Time: 3 hours <br />
New objects:<br />
mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct<br />
<br />
mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct<br />
<br />
mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct.<br />
<br />
mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct.<br />
<br />
mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct.<br />
<br />
mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct.<br />
<br />
Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material.<br />
Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions<br />
Comment: potential TF binding sites<br />
Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions'<br />
<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
Time: 5 min<br />
Location: table 2 and body text<br />
New objects: <br />
Comment: potential paper on cis-regulation (no evidence found), WBPmat00000004 is associated with this paper - XW<br />
Comment: Table 2 gives a list of experimentally verified GATA sites in other papers. These should be added to the curation ststus form - GW<br />
Comment: no Features can be made from this paper.<br />
Comment: In table 2 feature referring to: <br />
WBPaper00001523 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00001864 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00002234 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00003700 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00006024 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00024977 -> this paper has been already curated for features -> notentered in RT<br />
features sent to RT,<br />
please check WBPaper00003232, britton et al 1998. Found through WBPaper00028802 -table 2<br />
please check WBPaper00024333 and compare the feature associated with it to the one listed in table 2 of WBPaper00028802. Maybe a new GATA binding site feature should be generated?<br />
please check WBPaper00024976, Fukushige et al. 2005. Found through WBPaper00028802 -table 2<br />
<br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
Time: 10 minutes<br />
Comment: conserved intergenic inverted repeats structures in in C.briggsae and C. remanei. Would be good to have a sequence curator to check this one<br />
<br />
WBPaper00029255 Daniela Interaction: 55 (21 in postgres only)<br />
Time: 15 min<br />
New objects: potential feature of NRE, LCS, and LCE<br />
Location: figure 6, body text, page 560<br />
Comment: potential cis-regulation objects for xiaodong<br />
<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.<br />
<br />
'''Useful documents:'''<br />
*meeting notes: https://docs.google.com/document/d/1gkxZjGyyxvPF6qwg6bBzntYCAPaCjLHCKoN0fqGQ48Y/edit<br />
*github tickets<br />
**create feature summary page:https://github.com/WormBase/website/issues/3161</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24778Working Group:Sequence Features2014-09-16T17:37:48Z<p>Xdwang: </p>
<hr />
<div><br />
== Headline text ==<br />
'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cys-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
**** still need WBsf object details in order to relate/curate regulation/expression/construct objects<br />
<br />
<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
* Thanks Xiaodong, I'm bringing my chromebook - Gary<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
*** Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries. <br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
*** Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there. <br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
*** Fixed by Mary Ann. Changed Method to enhancer. <br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
*** Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks. <br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
*** Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run.<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
*** Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check. <br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
*** The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary).<br />
*** agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model. <br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects. See RT 418146 as well.<br />
potential cis-regulation objects for Xiaodong<br />
WBPaper00005842 Daniela<br />
Time: 3 hours <br />
New objects:<br />
mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct<br />
<br />
mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct<br />
<br />
mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct.<br />
<br />
mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct.<br />
<br />
mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct.<br />
<br />
mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct.<br />
<br />
Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material.<br />
Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions<br />
Comment: potential TF binding sites<br />
Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions'<br />
<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
Time: 5 min<br />
Location: table 2 and body text<br />
New objects: <br />
Comment: potential paper on cis-regulation (no evidence found), WBPmat00000004 is associated with this paper - XW<br />
Comment: Table 2 gives a list of experimentally verified GATA sites in other papers. These should be added to the curation ststus form - GW<br />
Comment: no Features can be made from this paper.<br />
Comment: In table 2 feature referring to: <br />
WBPaper00001523 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00001864 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00002234 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00003700 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00006024 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00024977 -> this paper has been already curated for features -> notentered in RT<br />
features sent to RT,<br />
please check WBPaper00003232, britton et al 1998. Found through WBPaper00028802 -table 2<br />
please check WBPaper00024333 and compare the feature associated with it to the one listed in table 2 of WBPaper00028802. Maybe a new GATA binding site feature should be generated?<br />
please check WBPaper00024976, Fukushige et al. 2005. Found through WBPaper00028802 -table 2<br />
<br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
Time: 10 minutes<br />
Comment: conserved intergenic inverted repeats structures in in C.briggsae and C. remanei. Would be good to have a sequence curator to check this one<br />
<br />
WBPaper00029255 Daniela Interaction: 55 (21 in postgres only)<br />
Time: 15 min<br />
New objects: potential feature of NRE, LCS, and LCE<br />
Location: figure 6, body text, page 560<br />
Comment: potential cis-regulation objects for xiaodong<br />
<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24776Working Group:Sequence Features2014-09-16T17:28:28Z<p>Xdwang: </p>
<hr />
<div><br />
== Headline text ==<br />
'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cys-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
**** still need WBsf object details in order to relate/curate regulation/expression/construct objects<br />
<br />
<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
* Thanks Xiaodong, I'm bringing my chromebook - Gary<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
*** Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries. <br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
*** Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there. <br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
*** Fixed by Mary Ann. Changed Method to enhancer. <br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
*** Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks. <br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
*** Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run.<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
*** Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check. <br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
*** The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary).<br />
*** agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model. <br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects. See RT 418146 as well.<br />
potential cis-regulation objects for Xiaodong<br />
WBPaper00005842 Daniela<br />
Time: 3 hours <br />
New objects:<br />
mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct<br />
<br />
mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct<br />
<br />
mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct.<br />
<br />
mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct.<br />
<br />
mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct.<br />
<br />
mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct.<br />
<br />
Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material.<br />
Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions<br />
Comment: potential TF binding sites<br />
Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions'<br />
<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
Time: 5 min<br />
Location: table 2 and body text<br />
New objects: <br />
Comment: potential paper on cis-regulation (no evidence found), WBPmat00000004 is associated with this paper - XW<br />
Comment: Table 2 gives a list of experimentally verified GATA sites in other papers. These should be added to the curation ststus form - GW<br />
Comment: no Features can be made from this paper.<br />
Comment: In table 2 feature referring to: <br />
WBPaper00001523 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00001864 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00002234 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00003700 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00006024 -> this paper has been already curated for features -> notentered in RT<br />
WBPaper00024977 -> this paper has been already curated for features -> notentered in RT<br />
features sent to RT,<br />
please check WBPaper00003232, britton et al 1998. Found through WBPaper00028802 -table 2<br />
please check WBPaper00024333 and compare the feature associated with it to the one listed in table 2 of WBPaper00028802. Maybe a new GATA binding site feature should be generated?<br />
please check WBPaper00024976, Fukushige et al. 2005. Found through WBPaper00028802 -table 2<br />
<br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
Time: 10 minutes<br />
Comment: conserved intergenic inverted repeats structures in in C.briggsae and C. remanei. Would be good to have a sequence curator to check this one<br />
<br />
WBPaper00029255 Daniela Interaction: 55 (21 in postgres only)<br />
<br />
<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24773Working Group:Sequence Features2014-09-16T15:43:47Z<p>Xdwang: </p>
<hr />
<div><br />
== Headline text ==<br />
'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cys-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
**** still need WBsf object details in order to relate/curate regulation/expression/construct objects<br />
<br />
<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
* Thanks Xiaodong, I'm bringing my chromebook - Gary<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
*** Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries. <br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
*** Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there. <br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
*** Fixed by Mary Ann. Changed Method to enhancer. <br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
*** Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks. <br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
*** Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run.<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
*** Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check. <br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
*** The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary).<br />
*** agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model. <br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects. See RT 418146 as well.<br />
potential cis-regulation objects for Xiaodong<br />
WBPaper00005842 Daniela<br />
Time: 3 hours <br />
New objects:<br />
mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct<br />
<br />
mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct<br />
<br />
mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct.<br />
<br />
mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct.<br />
<br />
mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct.<br />
<br />
mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct.<br />
<br />
Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material.<br />
Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions<br />
Comment: potential TF binding sites<br />
Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions'<br />
<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
Time: 5 min<br />
Location: table 2 and body text<br />
New objects: <br />
Comment: potential paper on cis-regulation (no evidence found), WBPmat00000004 is associated with this paper - XW<br />
Comment: Table 2 gives a list of experimentally verified GATA sites in other papers. These should be added to the curation ststus form - GW<br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
Time: 10 minutes<br />
Comment: conserved intergenic inverted repeats structures in in C.briggsae and C. remanei. Would be good to have a sequence curator to check this one<br />
<br />
WBPaper00029255 Daniela Interaction: 55 (21 in postgres only)<br />
<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24772Working Group:Sequence Features2014-09-16T15:34:34Z<p>Xdwang: </p>
<hr />
<div><br />
== Headline text ==<br />
'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cys-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
**** still need WBsf object details in order to relate/curate regulation/expression/construct objects<br />
<br />
<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
* Thanks Xiaodong, I'm bringing my chromebook - Gary<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
*** Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries. <br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
*** Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there. <br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
*** Fixed by Mary Ann. Changed Method to enhancer. <br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
*** Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks. <br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
*** Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run.<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
*** Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check. <br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
*** The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary).<br />
*** agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model. <br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects.<br />
See RT 418146 as well.<br />
WBPaper00005842 Daniela<br />
Time: 3 hours <br />
New objects:<br />
mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct<br />
<br />
mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct<br />
<br />
mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct.<br />
<br />
mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct.<br />
<br />
mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct.<br />
<br />
mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct.<br />
<br />
Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material.<br />
Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions<br />
Comment: potential TF binding sites<br />
Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions'<br />
<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
Time: 5 min<br />
Location: table 2 and body text<br />
New objects: <br />
Comment: potential paper on cis-regulation (no evidence found), WBPmat00000004 is associated with this paper - XW<br />
Comment: Table 2 gives a list of experimentally verified GATA sites in other papers. These should be added to the curation ststus form - GW<br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
Time: 10 minutes<br />
Comment: conserved intergenic inverted repeats structures in in C.briggsae and C. remanei. Would be good to have a sequence curator to check this one<br />
<br />
WBPaper00029255 Daniela Interaction: 55 (21 in postgres only)<br />
<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24770Working Group:Sequence Features2014-09-16T15:19:06Z<p>Xdwang: </p>
<hr />
<div><br />
== Headline text ==<br />
'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cys-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
**** still need WBsf object details in order to relate/curate regulation/expression/construct objects<br />
<br />
<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
* Thanks Xiaodong, I'm bringing my chromebook - Gary<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
*** Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries. <br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
*** Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there. <br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
*** Fixed by Mary Ann. Changed Method to enhancer. <br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
*** Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks. <br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
*** Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run.<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
*** Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check. <br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
*** The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary).<br />
*** agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model. <br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects.<br />
See RT 418146 as well.<br />
WBPaper00005842 Daniela<br />
Time: 3 hours <br />
New objects:<br />
mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct<br />
<br />
mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct<br />
<br />
mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct.<br />
<br />
mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct.<br />
<br />
mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct.<br />
<br />
mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct.<br />
<br />
Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material.<br />
Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions<br />
Comment: potential TF binding sites<br />
Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions'<br />
<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
Time: 5 min<br />
Location: table 2 and body text<br />
New objects: <br />
Comment: potential paper on cis-regulation (no evidence found), WBPmat00000004 is associated with this paper - XW<br />
<br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24769Working Group:Sequence Features2014-09-16T15:08:00Z<p>Xdwang: </p>
<hr />
<div><br />
== Headline text ==<br />
'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cys-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
**** still need WBsf object details in order to relate/curate regulation/expression/construct objects<br />
<br />
<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
* Thanks Xiaodong, I'm bringing my chromebook - Gary<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
*** Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries. <br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
*** Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there. <br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
*** Fixed by Mary Ann. Changed Method to enhancer. <br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
*** Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks. <br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
*** Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run.<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
*** Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check. <br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
*** The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary).<br />
*** agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model. <br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects.<br />
See RT 418146 as well.<br />
WBPaper00005842 Daniela<br />
Time: 3 hours <br />
New objects:<br />
mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct<br />
<br />
mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct<br />
<br />
mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct.<br />
<br />
mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct.<br />
<br />
mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct.<br />
<br />
mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct.<br />
<br />
Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material.<br />
Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions<br />
Comment: potential TF binding sites<br />
Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions'<br />
<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
Comment: potential paper on cis-regulation, <br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24737Working Group:Sequence Features2014-09-12T15:03:29Z<p>Xdwang: /* Headline text */</p>
<hr />
<div><br />
== Headline text ==<br />
'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** See github ticket 2867 https://github.com/WormBase/website/issues/2867#issuecomment-48041365<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cys-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
**** still need WBsf object details in order to relate/curate regulation/expression/construct objects<br />
<br />
<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
* Thanks Xiaodong, I'm bringing my chromebook - Gary<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
*** Follow-up from Mary Ann. I agree Method for WBsf919641 should be TF_binding_site. Associated_with_gene should be updated myo-2 (this is an error). So term is not right either. These then become duplicate entries. <br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
*** Follow-up from Mary Ann. We have no merged_into tag structure in Feature. We have History tag and could add a Comment there. <br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
*** Fixed by Mary Ann. Changed Method to enhancer. <br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
*** Follow-up from Mary Ann. lag-1 does bind somewhere in this region, but the paper does not allow me to specify exactly where. Q. Should we curate these then? I think we should, but should maybe add suitable Remarks. <br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
*** Follow-up by Mary Ann. This seems really high. I wonder whether TF_binding_site was not a recognised SO term when some of these were curated, especially the older ones. This (and other consistency checks) should be added to a script we can run.<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
*** Follow-up by Mary Ann. Maybe it's not clear in the paper what the TF is? Something to add to consistency check. <br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
*** The Feature objects are used to describe 'real', functional regions of the genome in as much detail as possible. Each paper that provides evidence for that region being 'real' should be added as evidence for the Feature. The Primary object we are trying to describe here is the region of the genome, not the papers. (I may have missed the point you were making - Gary).<br />
*** agree with Gary (unless I have also missed the point!). All relevant papers should be added. There is a loose convention in Allele curation that the paper listed under Evidence (which can only have one value) is the "primary" paper and the ones listed under Reference (including the primary) are all papers which describe or discuss the object as well. Also individual Evidence can be added to tags. Maybe we should look to extend this in the Feature model. <br />
<pre><br />
?Feature SMap S_parent UNIQUE Sequence UNIQUE ?Sequence XREF Feature_object<br />
Name Public_name UNIQUE ?Text<br />
Other_name ?Text<br />
Sequence_details Flanking_sequences UNIQUE Text UNIQUE Text<br />
Mapping_target UNIQUE ?Sequence <br />
Source_location UNIQUE Int UNIQUE ?Sequence UNIQUE Int UNIQUE Int UNIQUE #Evidence //source data, <WSversion> ?Sequence pos1 pos2 Evidence(Paper/person etc. remarks)<br />
DNA_text UNIQUE ?Text // for storing the sequence of the feature...can use IUPAC codes to be able<br />
// store consensus sequences, e.g. binding site consensus sequence<br />
Origin Species UNIQUE ?Species //added by pad, as we are moving towards multi species readyness.<br />
Strain UNIQUE ?Strain//added by pad, as we are moving towards multi strain readyness.<br />
History Merged_into UNIQUE ?Feature XREF Acquires_merge #Evidence<br />
Acquires_merge ?Feature XREF Merged_into #Evidence<br />
Deprecated Text #Evidence <br />
Visible Description ?Text<br />
SO_term ?SO_term<br />
Defined_by Defined_by_sequence ?Sequence XREF Defines_feature #Evidence<br />
Defined_by_paper ?Paper XREF Feature #Evidence<br />
Defined_by_person ?Person<br />
Defined_by_author ?Author<br />
Defined_by_analysis ?Analysis Int<br />
Score Float Text #Evidence // this would be a log score as indicated by the analysis used in gff dump<br />
Associations Associated_with_gene ?Gene XREF Associated_feature #Evidence // richard<br />
Associated_with_CDS ?CDS XREF Associated_feature #Evidence // richard<br />
Associated_with_transcript ?Transcript XREF Associated_feature #Evidence // richard<br />
Associated_with_pseudogene ?Pseudogene XREF Associated_feature #Evidence // richard<br />
Associated_with_transposon ?Transposon XREF Associated_feature #Evidence //richard<br />
Associated_with_variation ?Variation XREF Feature #Evidence<br />
Associated_with_Position_Matrix ?Position_Matrix XREF Associated_feature #Evidence<br />
Associated_with_operon ?Operon XREF Associated_feature #Evidence<br />
Associated_with_Interaction ?Interaction XREF Feature_interactor<br />
Associated_with_expression_pattern ?Expr_pattern XREF Associated_feature #Evidence <br />
Associated_with_Feature ?Feature XREF Associated_with_Feature #Evidence<br />
Associated_with_construct ?Construct XREF Sequence_feature<br />
Bound_by_product_of ?Gene XREF Gene_product_binds #Evidence //pad added this to show what gene it binds<br />
Transcription_factor UNIQUE ?Transcription_factor XREF Binding_site <br />
Annotation UNIQUE ?LongText // added for data attribution [030220 dl]<br />
Confidential_remark ?Text //pad<br />
Remark ?Text #Evidence<br />
Method UNIQUE ?Method<br />
</pre><br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects.<br />
WBPaper00005842 Daniela<br />
Time: 3 hours <br />
New objects:<br />
mk125-132 (4331-4474) is sufficient to drive strong expression in vulC and vulD. Expr11838 add feature and generate construct<br />
<br />
mk84-148 contains two regions that together confer strong expression in VulE and VulF. Expr11839 add feature and generate construct<br />
<br />
mk50-51 (1052-1438) is sufficient to confer AC, vulE, and vulA expression, as well as uterine cell expression. Expr11840 add feature and generate construct.<br />
<br />
mk96-144 (2290-2522) expresses in AC in all animals observed. Expr11841 add feature and generate construct.<br />
<br />
mk66-67 (4434-4997) is sufficient to confer vulval cell expression. Expr11842 add feature and generate construct.<br />
<br />
mk135-134 (2412-3419) is sufficient to confer vulval cell expression. N.B.: in text is mk135-134 in fig 4C seems mk135-143. Expr11843 add feature and generate construct.<br />
<br />
Location of information: The nucleotide sequences for pertinent regions are shown in Figs. 5, 6, and 7 of the Supplemental Material.<br />
Comment: It would be good that a sequence curator will read the paper unbiased-w/o checking what I have curated for expression and see if we could identify the same regions<br />
Comment: potential TF binding sites<br />
Comment: Should we curate Briggsae? -Paragraph 'Analysis of C Briggsae upstream regions'<br />
<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24724Working Group:Sequence Features2014-09-11T20:39:23Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cys-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
**** still need WBsf object details in order to relate/curate regulation/expression/construct objects<br />
<br />
<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects.<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24723Working Group:Sequence Features2014-09-11T20:36:16Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
** in the expression section on the gene page highlight the objects that are Sequence feature related. Add in the table a type called cys-regulatory-element? <br />
** add a sequence feature table in sequence widget to display all features related to the gene<br />
** in cytoscape display, sequence feature as an entity node<br />
<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
* The list of duplicated Features has been moved to the Discussion page.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
Time: 25 mins<br />
New objects: 3 promoters- pA, pB, and pC<br />
Location of information: body text and fig.3 for promoters. Figures 5, 6 and 7 and body text for TF binding sites.<br />
Comments: Expression objects were already generated by Wen in the past. Daniela will add only the WBsf once ready. Also add WBsf object to the construct objects.<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
Time: 2.5 hours<br />
New objects: Added 4 Expression objects for 5 different constructs-pPHA2::GFP-A present already:<br />
<br />
pPHA2::GFP-A body text and Figs. 5 and 6 Expr3093 Add Sf to expr object and to construct <br />
pPHA2::GFP-B body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-C body text and Figs. 5 and 6 Expr11835 Add Sf to expr object and to construct<br />
pPHA2::GFP-D body text and Figs. 5 and 6 Expr11837 Add Sf to expr object and to construct<br />
pPHA2::GFP-E body text and Figs. 5 and 6 Expr11836 Add Sf to expr object and to construct<br />
pPHA2::GFP-F body text and Figs. 5 and 6 Expr11834 Add Sf to expr object and to construct<br />
Detailed construction of plasmids in Materials and Methods<br />
Location of information: body text and Figs. 5 and 6 <br />
Comments: not sure if this paper could be curatable as sequence feature per se due to the nature of constructs. Would be good to discuss <br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
Time: 15 mins<br />
Comments: No features in this paper<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
Comments (dr): Added Expr11833<br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments(dr): Added Expr11832 <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24687Working Group:Sequence Features2014-09-09T20:21:08Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
**Enough office space for sure. Internet access through Harvard guest. They won't allow you to access other ways.<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
*These two need to be merged - I suggest WBsf019227 be retired as it is in the opposite sense to its gene.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . - . Feature "WBsf019227" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . + . Feature "WBsf038813" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - I suggest WBsf019221 be retired as it doesn't have the Interaction objects.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf019221" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf038814" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - no difference between them. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019124" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019126" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899528"<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899543"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899526"<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899542"<br />
*These are from the same paper - one says it is "PUF-8 recognition element (PRE-1)" and the other says it is "PUF-8 recognition element (PRE-2)". Are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899537"<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899538"<br />
*These are from the same paper - one says it is "TF LIN-1 binding site S11 in pJW5" and the other is "TF LIN-1 binding site S20 in pJW5" - looks like an error to me - Gary to redo this.<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919592" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919594" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
*These two need to be merged - I suggest WBsf047654 be retired as it contains less information.<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf047654"<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf919589"<br />
*These two need to be merged - they are from different papers. I suggest WBsf047505 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047478" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047505" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged - they are from different papers. I suggest WBsf019088 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf019088" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf919536" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216760" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216762" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216764" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216754" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216755" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These two need to be merged. WBsf899549 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100882 5100888 . + . Feature "WBsf919622"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100882 5100888 . + . Feature "WBsf899549"<br />
*These two need to be merged. WBsf899545 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100919 5100925 . + . Feature "WBsf919623"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100919 5100925 . + . Feature "WBsf899545"<br />
*These two need to be merged. WBsf899547 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100970 5100976 . + . Feature "WBsf919625"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100970 5100976 . + . Feature "WBsf899547"<br />
*These two need to be merged. WBsf899548 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100985 5100991 . + . Feature "WBsf919626"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100985 5100991 . + . Feature "WBsf899548"<br />
*WBsf047482 cites paper WBPaper00025203 - this has now been merged with paper WBPaper00026601 and should really be updated to this in all 11 Features which use it. <br />
*WBsf042312 cites paper WBPaper00025051<br />
*These two Features are otherwise nearly identical and should be merged.<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf042312" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf047482" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged. WBsf919641 should be retired as it is incorrect - this is a TF binding site, it is not just a binding_site.<br />
**CHROMOSOME_X binding_site binding_site 12467733 12467737 . + . Feature "WBsf919641"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 12467733 12467737 . + . Feature "WBsf919607" ; TF_ID "WBTranscriptionFactor000472" ; TF_name "DAF-3"<br />
<br />
<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24686Working Group:Sequence Features2014-09-09T20:18:38Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
**I (Xiaodong) can pick up you guys at Harvard square T station (red-line, info center on the square) at say 9:30 am. We can walk to my office.<br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
*These two need to be merged - I suggest WBsf019227 be retired as it is in the opposite sense to its gene.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . - . Feature "WBsf019227" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . + . Feature "WBsf038813" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - I suggest WBsf019221 be retired as it doesn't have the Interaction objects.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf019221" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf038814" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - no difference between them. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019124" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019126" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899528"<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899543"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899526"<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899542"<br />
*These are from the same paper - one says it is "PUF-8 recognition element (PRE-1)" and the other says it is "PUF-8 recognition element (PRE-2)". Are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899537"<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899538"<br />
*These are from the same paper - one says it is "TF LIN-1 binding site S11 in pJW5" and the other is "TF LIN-1 binding site S20 in pJW5" - looks like an error to me - Gary to redo this.<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919592" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919594" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
*These two need to be merged - I suggest WBsf047654 be retired as it contains less information.<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf047654"<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf919589"<br />
*These two need to be merged - they are from different papers. I suggest WBsf047505 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047478" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047505" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged - they are from different papers. I suggest WBsf019088 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf019088" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf919536" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216760" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216762" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216764" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216754" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216755" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These two need to be merged. WBsf899549 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100882 5100888 . + . Feature "WBsf919622"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100882 5100888 . + . Feature "WBsf899549"<br />
*These two need to be merged. WBsf899545 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100919 5100925 . + . Feature "WBsf919623"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100919 5100925 . + . Feature "WBsf899545"<br />
*These two need to be merged. WBsf899547 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100970 5100976 . + . Feature "WBsf919625"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100970 5100976 . + . Feature "WBsf899547"<br />
*These two need to be merged. WBsf899548 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100985 5100991 . + . Feature "WBsf919626"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100985 5100991 . + . Feature "WBsf899548"<br />
*WBsf047482 cites paper WBPaper00025203 - this has now been merged with paper WBPaper00026601 and should really be updated to this in all 11 Features which use it. <br />
*WBsf042312 cites paper WBPaper00025051<br />
*These two Features are otherwise nearly identical and should be merged.<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf042312" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf047482" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged. WBsf919641 should be retired as it is incorrect - this is a TF binding site, it is not just a binding_site.<br />
**CHROMOSOME_X binding_site binding_site 12467733 12467737 . + . Feature "WBsf919641"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 12467733 12467737 . + . Feature "WBsf919607" ; TF_ID "WBTranscriptionFactor000472" ; TF_name "DAF-3"<br />
<br />
<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24685Working Group:Sequence Features2014-09-09T20:14:19Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
**how to define these methods?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
*These two need to be merged - I suggest WBsf019227 be retired as it is in the opposite sense to its gene.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . - . Feature "WBsf019227" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . + . Feature "WBsf038813" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - I suggest WBsf019221 be retired as it doesn't have the Interaction objects.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf019221" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf038814" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - no difference between them. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019124" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019126" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899528"<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899543"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899526"<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899542"<br />
*These are from the same paper - one says it is "PUF-8 recognition element (PRE-1)" and the other says it is "PUF-8 recognition element (PRE-2)". Are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899537"<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899538"<br />
*These are from the same paper - one says it is "TF LIN-1 binding site S11 in pJW5" and the other is "TF LIN-1 binding site S20 in pJW5" - looks like an error to me - Gary to redo this.<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919592" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919594" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
*These two need to be merged - I suggest WBsf047654 be retired as it contains less information.<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf047654"<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf919589"<br />
*These two need to be merged - they are from different papers. I suggest WBsf047505 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047478" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047505" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged - they are from different papers. I suggest WBsf019088 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf019088" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf919536" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216760" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216762" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216764" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216754" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216755" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These two need to be merged. WBsf899549 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100882 5100888 . + . Feature "WBsf919622"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100882 5100888 . + . Feature "WBsf899549"<br />
*These two need to be merged. WBsf899545 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100919 5100925 . + . Feature "WBsf919623"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100919 5100925 . + . Feature "WBsf899545"<br />
*These two need to be merged. WBsf899547 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100970 5100976 . + . Feature "WBsf919625"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100970 5100976 . + . Feature "WBsf899547"<br />
*These two need to be merged. WBsf899548 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100985 5100991 . + . Feature "WBsf919626"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100985 5100991 . + . Feature "WBsf899548"<br />
*WBsf047482 cites paper WBPaper00025203 - this has now been merged with paper WBPaper00026601 and should really be updated to this in all 11 Features which use it. <br />
*WBsf042312 cites paper WBPaper00025051<br />
*These two Features are otherwise nearly identical and should be merged.<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf042312" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf047482" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged. WBsf919641 should be retired as it is incorrect - this is a TF binding site, it is not just a binding_site.<br />
**CHROMOSOME_X binding_site binding_site 12467733 12467737 . + . Feature "WBsf919641"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 12467733 12467737 . + . Feature "WBsf919607" ; TF_ID "WBTranscriptionFactor000472" ; TF_name "DAF-3"<br />
<br />
<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24684Working Group:Sequence Features2014-09-09T17:03:42Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
** How do papers come in now? SVM? Textpresso string matches?<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
** dr - ok sounds good to me Gary<br />
<br />
'''Practical Issues'''<br />
<br />
* Gary and I are staying at the Ramada - 800 Morrissey Boulevard, Freeport St, Boston, MA 02122 US<br />
* We arrive in Boston on Saturday evening and fly home on Wednesday night. <br />
* What time shall we meet on Monday morning and where do we go? <br />
* Can you confirm that we have office space with internet access? I will bring my Macbook. Hopefully I can tunnel into my Sanger working space.<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
** Should we put the list of papers in the Caltech curation status form<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
* Gary does a quick literature search to see what other work has been done in the sites described.<br />
**along this line, do we wanna add WBPaper00002925 to WBsf from WBPaper00002011? mean, associate features with two paper, both mentioned these features?<br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
*These two need to be merged - I suggest WBsf019227 be retired as it is in the opposite sense to its gene.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . - . Feature "WBsf019227" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . + . Feature "WBsf038813" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - I suggest WBsf019221 be retired as it doesn't have the Interaction objects.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf019221" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf038814" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - no difference between them. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019124" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019126" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899528"<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899543"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899526"<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899542"<br />
*These are from the same paper - one says it is "PUF-8 recognition element (PRE-1)" and the other says it is "PUF-8 recognition element (PRE-2)". Are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899537"<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899538"<br />
*These are from the same paper - one says it is "TF LIN-1 binding site S11 in pJW5" and the other is "TF LIN-1 binding site S20 in pJW5" - looks like an error to me - Gary to redo this.<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919592" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919594" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
*These two need to be merged - I suggest WBsf047654 be retired as it contains less information.<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf047654"<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf919589"<br />
*These two need to be merged - they are from different papers. I suggest WBsf047505 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047478" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047505" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged - they are from different papers. I suggest WBsf019088 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf019088" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf919536" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216760" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216762" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216764" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216754" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216755" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These two need to be merged. WBsf899549 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100882 5100888 . + . Feature "WBsf919622"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100882 5100888 . + . Feature "WBsf899549"<br />
*These two need to be merged. WBsf899545 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100919 5100925 . + . Feature "WBsf919623"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100919 5100925 . + . Feature "WBsf899545"<br />
*These two need to be merged. WBsf899547 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100970 5100976 . + . Feature "WBsf919625"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100970 5100976 . + . Feature "WBsf899547"<br />
*These two need to be merged. WBsf899548 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100985 5100991 . + . Feature "WBsf919626"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100985 5100991 . + . Feature "WBsf899548"<br />
*WBsf047482 cites paper WBPaper00025203 - this has now been merged with paper WBPaper00026601 and should really be updated to this in all 11 Features which use it. <br />
*WBsf042312 cites paper WBPaper00025051<br />
*These two Features are otherwise nearly identical and should be merged.<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf042312" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf047482" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged. WBsf919641 should be retired as it is incorrect - this is a TF binding site, it is not just a binding_site.<br />
**CHROMOSOME_X binding_site binding_site 12467733 12467737 . + . Feature "WBsf919641"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 12467733 12467737 . + . Feature "WBsf919607" ; TF_ID "WBTranscriptionFactor000472" ; TF_name "DAF-3"<br />
<br />
<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
Time: 10 mins<br />
New objects: none. the paper was referring to a couple of sub-element enhancers already described in Okkema and Fire 1994- Development - B and C sub-elements -WBPaper00002011<br />
Location of information: - <br />
Comments: <br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
Time: 1hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 5f<br />
Comments: Check that alleles of asd-2 in fig 3 are curated. yb1422, yb1415, yb1470, yb1419, yb1423<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
Comments (dr): Expr11831 generated for end-1 and end-3 promoter regions. Add WBsfID once available and relative construct. Paper checked as it came through RT <br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24673Working Group:Sequence Features2014-09-05T20:14:49Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
*These two need to be merged - I suggest WBsf019227 be retired as it is in the opposite sense to its gene.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . - . Feature "WBsf019227" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . + . Feature "WBsf038813" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - I suggest WBsf019221 be retired as it doesn't have the Interaction objects.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf019221" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf038814" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - no difference between them. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019124" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019126" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899528"<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899543"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899526"<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899542"<br />
*These are from the same paper - one says it is "PUF-8 recognition element (PRE-1)" and the other says it is "PUF-8 recognition element (PRE-2)". Are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899537"<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899538"<br />
*These are from the same paper - one says it is "TF LIN-1 binding site S11 in pJW5" and the other is "TF LIN-1 binding site S20 in pJW5" - looks like an error to me - Gary to redo this.<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919592" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919594" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
*These two need to be merged - I suggest WBsf047654 be retired as it contains less information.<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf047654"<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf919589"<br />
*These two need to be merged - they are from different papers. I suggest WBsf047505 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047478" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047505" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged - they are from different papers. I suggest WBsf019088 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf019088" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf919536" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216760" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216762" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216764" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216754" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216755" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These two need to be merged. WBsf899549 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100882 5100888 . + . Feature "WBsf919622"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100882 5100888 . + . Feature "WBsf899549"<br />
*These two need to be merged. WBsf899545 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100919 5100925 . + . Feature "WBsf919623"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100919 5100925 . + . Feature "WBsf899545"<br />
*These two need to be merged. WBsf899547 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100970 5100976 . + . Feature "WBsf919625"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100970 5100976 . + . Feature "WBsf899547"<br />
*These two need to be merged. WBsf899548 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100985 5100991 . + . Feature "WBsf919626"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100985 5100991 . + . Feature "WBsf899548"<br />
*WBsf047482 cites paper WBPaper00025203 - this has now been merged with paper WBPaper00026601 and should really be updated to this in all 11 Features which use it. <br />
*WBsf042312 cites paper WBPaper00025051<br />
*These two Features are otherwise nearly identical and should be merged.<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf042312" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf047482" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged. WBsf919641 should be retired as it is incorrect - this is a TF binding site, it is not just a binding_site.<br />
**CHROMOSOME_X binding_site binding_site 12467733 12467737 . + . Feature "WBsf919641"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 12467733 12467737 . + . Feature "WBsf919607" ; TF_ID "WBTranscriptionFactor000472" ; TF_name "DAF-3"<br />
<br />
<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 2 and figure 3<br />
Comments: PBC-HOX biding sites S1 and S2; cis-regulatory elements E1 and E2<br />
<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24672Working Group:Sequence Features2014-09-05T15:15:12Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
* Improving data flow<br />
** How can we make the data immediately available to all curators?<br />
*** Add Paper and Public_name fields to the Features in the Nameserver?<br />
*** geneace is available for download updated every day - a copy is taken by Caltech.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
* how many 'Method's for feature? enhancer, regulatory_region, TF_biding_site, promoter…?<br />
** when display on JBrowse/GBrowse, do we want to display WBsfxxxxx or methods?<br />
<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
*These two need to be merged - I suggest WBsf019227 be retired as it is in the opposite sense to its gene.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . - . Feature "WBsf019227" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . + . Feature "WBsf038813" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - I suggest WBsf019221 be retired as it doesn't have the Interaction objects.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf019221" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf038814" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - no difference between them. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019124" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019126" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899528"<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899543"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899526"<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899542"<br />
*These are from the same paper - one says it is "PUF-8 recognition element (PRE-1)" and the other says it is "PUF-8 recognition element (PRE-2)". Are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899537"<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899538"<br />
*These are from the same paper - one says it is "TF LIN-1 binding site S11 in pJW5" and the other is "TF LIN-1 binding site S20 in pJW5" - looks like an error to me - Gary to redo this.<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919592" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919594" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
*These two need to be merged - I suggest WBsf047654 be retired as it contains less information.<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf047654"<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf919589"<br />
*These two need to be merged - they are from different papers. I suggest WBsf047505 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047478" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047505" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged - they are from different papers. I suggest WBsf019088 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf019088" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf919536" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216760" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216762" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216764" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216754" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216755" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These two need to be merged. WBsf899549 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100882 5100888 . + . Feature "WBsf919622"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100882 5100888 . + . Feature "WBsf899549"<br />
*These two need to be merged. WBsf899545 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100919 5100925 . + . Feature "WBsf919623"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100919 5100925 . + . Feature "WBsf899545"<br />
*These two need to be merged. WBsf899547 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100970 5100976 . + . Feature "WBsf919625"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100970 5100976 . + . Feature "WBsf899547"<br />
*These two need to be merged. WBsf899548 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100985 5100991 . + . Feature "WBsf919626"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100985 5100991 . + . Feature "WBsf899548"<br />
*WBsf047482 cites paper WBPaper00025203 - this has now been merged with paper WBPaper00026601 and should really be updated to this in all 11 Features which use it. <br />
*WBsf042312 cites paper WBPaper00025051<br />
*These two Features are otherwise nearly identical and should be merged.<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf042312" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf047482" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged. WBsf919641 should be retired as it is incorrect - this is a TF binding site, it is not just a binding_site.<br />
**CHROMOSOME_X binding_site binding_site 12467733 12467737 . + . Feature "WBsf919641"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 12467733 12467737 . + . Feature "WBsf919607" ; TF_ID "WBTranscriptionFactor000472" ; TF_name "DAF-3"<br />
<br />
<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24670Working Group:Sequence Features2014-09-05T14:35:20Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
<br />
'''Duplicated Features'''<br />
The following are a set of duplicated regulatory Features. I (Gary) am probably the main culprit in not checking to see if there is a pre-existing Feature.<br />
How can we avoid this in future?<br />
*These two need to be merged - I suggest WBsf019227 be retired as it is in the opposite sense to its gene.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . - . Feature "WBsf019227" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245805 10245811 . + . Feature "WBsf038813" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - I suggest WBsf019221 be retired as it doesn't have the Interaction objects.<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf019221" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
**CHROMOSOME_I TF_binding_site TF_binding_site 10245835 10245841 . + . Feature "WBsf038814" ; TF_ID "WBTranscriptionFactor000101" ; TF_name "LAG-1"<br />
*These two need to be merged - no difference between them. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019124" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
**CHROMOSOME_II TF_binding_site TF_binding_site 10950131 10950138 . + . Feature "WBsf019126" ; TF_ID "WBTranscriptionFactor000014" ; TF_name "SKN-1"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899528"<br />
**CHROMOSOME_III binding_site binding_site 4804863 4804877 . - . Feature "WBsf899543"<br />
*Two papers - one says this is a MEX-1 recognition elemant and the other says this is a MEX-3 recognition element - are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899526"<br />
**CHROMOSOME_III binding_site binding_site 4804911 4804928 . - . Feature "WBsf899542"<br />
*These are from the same paper - one says it is "PUF-8 recognition element (PRE-1)" and the other says it is "PUF-8 recognition element (PRE-2)". Are they both right?<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899537"<br />
**CHROMOSOME_III binding_site binding_site 4805138 4805145 . - . Feature "WBsf899538"<br />
*These are from the same paper - one says it is "TF LIN-1 binding site S11 in pJW5" and the other is "TF LIN-1 binding site S20 in pJW5" - looks like an error to me - Gary to redo this.<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919592" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
**CHROMOSOME_III TF_binding_site TF_binding_site 7540351 7540361 . - . Feature "WBsf919594" ; TF_ID "WBTranscriptionFactor000135" ; TF_name "LIN-1"<br />
*These two need to be merged - I suggest WBsf047654 be retired as it contains less information.<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf047654"<br />
**CHROMOSOME_III regulatory_region misc_feature 7540972 7540992 . - . Feature "WBsf919589"<br />
*These two need to be merged - they are from different papers. I suggest WBsf047505 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047478" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_IV TF_binding_site TF_binding_site 2306593 2306606 . - . Feature "WBsf047505" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged - they are from different papers. I suggest WBsf019088 be retired as it has an incorrect SO_term - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf019088" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 10671938 10671944 . + . Feature "WBsf919536" ; TF_ID "WBTranscriptionFactor000061" ; TF_name "CEH-22"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216760" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216762" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_V TF_binding_site TF_binding_site 11882353 11882360 . + . Feature "WBsf216764" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These three need to be merged. They are identical. The SO_term is wrong - needs to be SO:0000235<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216754" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 4100019 4100026 . - . Feature "WBsf216755" ; TF_ID "WBTranscriptionFactor000126" ; TF_name "CEH-6"<br />
*These two need to be merged. WBsf899549 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100882 5100888 . + . Feature "WBsf919622"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100882 5100888 . + . Feature "WBsf899549"<br />
*These two need to be merged. WBsf899545 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100919 5100925 . + . Feature "WBsf919623"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100919 5100925 . + . Feature "WBsf899545"<br />
*These two need to be merged. WBsf899547 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100970 5100976 . + . Feature "WBsf919625"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100970 5100976 . + . Feature "WBsf899547"<br />
*These two need to be merged. WBsf899548 should be retired as it is incorrect - this is not a TF binding site, it is a regulatory region.<br />
**CHROMOSOME_X regulatory_region misc_feature 5100985 5100991 . + . Feature "WBsf919626"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 5100985 5100991 . + . Feature "WBsf899548"<br />
*WBsf047482 cites paper WBPaper00025203 - this has now been merged with paper WBPaper00026601 and should really be updated to this in all 11 Features which use it. <br />
*WBsf042312 cites paper WBPaper00025051<br />
*These two Features are otherwise nearly identical and should be merged.<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf042312" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 8940207 8940220 . - . Feature "WBsf047482" ; TF_ID "WBTranscriptionFactor000052" ; TF_name "DAF-19"<br />
*These two need to be merged. WBsf919641 should be retired as it is incorrect - this is a TF binding site, it is not just a binding_site.<br />
**CHROMOSOME_X binding_site binding_site 12467733 12467737 . + . Feature "WBsf919641"<br />
**CHROMOSOME_X TF_binding_site TF_binding_site 12467733 12467737 . + . Feature "WBsf919607" ; TF_ID "WBTranscriptionFactor000472" ; TF_name "DAF-3"<br />
<br />
<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
Time: 40 mins<br />
New objects: request through RT<br />
Location of information: figure 2<br />
Comments: MED-1 binding sites in end-1and end-3 promoters<br />
WBPaper00028910 Xiaodong Interaction: 7<br />
Time: 30 mins<br />
New objects: request through RT<br />
Location of information: in body text and figure 4<br />
Comments: MEF-2 direct binding site in str-1 promoter and a few minimal regulatory region mentioned in body text<br />
<br />
WBPaper00029109 Xiaodong<br />
<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24668Working Group:Sequence Features2014-09-04T20:29:23Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
Time: 10 mins<br />
Comments: no features in this paper<br />
WBPaper00024981 Xiaodong<br />
WBPaper00028910 Xiaodong Interaction: 7<br />
WBPaper00029109 Xiaodong<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24667Working Group:Sequence Features2014-09-04T19:51:59Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
** 393 out of the 500 TF_binding_site Features have an incorrect SO_term tag. It should be set to: SO:0000235<br />
** 10 Features with Method = "TF_binding_site" are lacking a Transcription_factor tag.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
Time: 50 mins<br />
New objects: three interaction objects<br />
Location of information: In body of text and fig. 5<br />
Comments: only WBsf038790 and WBsf038791 are useful. not sure why other features exist? seems to be redundant?<br />
WBPaper00024189 Xiaodong<br />
WBPaper00024981 Xiaodong<br />
WBPaper00028910 Xiaodong Interaction: 7<br />
WBPaper00029109 Xiaodong<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24642Working Group:Sequence Features2014-09-04T15:13:38Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
** The Features WBsf019227 and WBsf038813 and in the same position, but opposite strands. They both are for a LAG-1 binding site associated with gene lin-11. One from paper: WBPaper00005357, other from WBPaper00032298. WBsf019227 needs to be retired and merged into WBsf038813.<br />
** WBsf038819 has a Method of 'regulatory_region', but the Public_name and Description describes it as an Enhancer. Which is it?<br />
** WBsf919669 is an Enhancer, but it has a tag for a Transcription_factor WBTranscriptionFactor000101. It is far too large to be a binding site for a transcription fact. I think this tag should be removed.<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
Comments (XW): curated one new cis-regulation object with Gary's WBsfs<br />
<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
Time: 30 mins<br />
New objects: none<br />
Location of information: body of this paper<br />
Other curator data: "recruitment sites are widely distributed along X to bind the DCC and nucleate DCC spreading to X regions lacking recruitment sites"<br />
Commments: This paper determines the A and B motifs of the Dosage Compensation Complex (DCC). There are hundreds of thousands of sites of varying strength.<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
WBPaper00024189 Xiaodong<br />
WBPaper00024981 Xiaodong<br />
WBPaper00028910 Xiaodong Interaction: 7<br />
WBPaper00029109 Xiaodong<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=Working_Group:Sequence_Features&diff=24612Working Group:Sequence Features2014-09-03T15:55:15Z<p>Xdwang: </p>
<hr />
<div>'''Topics'''<br />
<br />
* Display of Sequence Features on the website<br />
** And Transcription factors and Gene_product binds.<br />
* Stream out working flow<br />
** How can we automatically identify Sequence Feature papers.<br />
* Sample papers curation <br />
** Prepare a paper list from each person<br />
* assign meaningful names to the public_name field, e.g. 'distal enhancer 1' instead of 'DE1'<br />
** gw3 - I disagree with this. We should use the name that is used to describe the region in the paper, rather than making up our own names. I have had to go back and re-annotate regions after Xiaodong found problems with my stuff several months after I had originally done it and I really needed to be able to unambiguously identify which region the paper was talking about by using the Public_name field and matching this up to the name of the region in the paper. Users may well need to do this as well if they are reading the paper and looking at the Features marking up the regions.<br />
<br />
<br />
'''Pre-Jamboree prep'''<br />
<br />
* Suggestions for work<br />
** Curate some (5? 10?) papers from our lists and collate data as follows<br />
*** How long did each paper take to curate<br />
*** How many new objects did you add to WormBase<br />
*** Where in the paper was the information (e.g. supplementary data, figure legends, within text)<br />
*** Did you need to contact the author for more information<br />
*** Did you come across data for other curators <br />
**** gw3 - I don't know in detail what data other curators require. It would be useful to know this so I can summarise data for others.<br />
<br />
'''The Jamboree 15-17th Sept 2014'''<br />
<br />
* Suggestions for work<br />
** Discuss result from above<br />
** Work through some papers which have already been curated and have different data types (e.g. promoter and gene regulation) to identify best curation practices.<br />
** browsing capabilities through WormMine?<br />
<br />
<br />
'''Matters Arising'''<br />
<br />
* things noted - not necessarily to do with the topic in hand<br />
** There are many duplicates of Interaction objects in geneace, with or without two leading digits. Sometimes there is only the shorter form.<br />
** The Features WBsf919641 and WBsf919607 are at the same position. Both are DAF-3 binding sites. WBsf919641 has Method "binding_site", WBsf919607 has Method "TF_binding_site" which I think is more correct as DAF-3 is a TF. WBsf919641 used the paper WBPaper00004526, WBsf919607 used the paper WBPaper00003384. WBsf919641 is associated with WBGene00202278, WBsf919607 is associated with WBGene00003514 (myo-2). The paper WBPaper00004526 appears to about a PEB-1 binding site, not a DAF-3 site. I made the Feature WBsf919609 for the PEB-1 site based on WBPaper00004526 in the last round of curation. Something not right here???<br />
<br />
<br />
'''Papers to curate'''<br />
* I have taken the Sequence Feature Papers from RT and assigned the four of us 10 papers each.<br />
* Just in case there is any paper here that has already been curated for Interaction, I've added the number of Interaction objects linked to by the paper. If this is not helpful, feel free to remove it.<br />
<br />
<pre><br />
WBPaper00002925 Daniela Interaction: 8<br />
WBPaper00004568 Daniela Interaction: 2<br />
WBPaper00005842 Daniela<br />
WBPaper00024328 Daniela Interaction: 4<br />
WBPaper00028802 Daniela<br />
WBPaper00028915 Daniela<br />
WBPaper00029140 Daniela<br />
WBPaper00029255 Daniela Interaction: 55<br />
WBPaper00030829 Daniela Interaction: 5<br />
WBPaper00030933 Daniela Interaction: 1<br />
WBPaper00003929 Gary Interaction: 25<br />
Time: 4 hours<br />
New objects: none - 6 existing Features were corrected and updated. (WBsf019182, WBsf019181, WBsf019179, WBsf019177, WBsf019178, WBsf019180)<br />
Location of information: body text and in a figure in the supplemental table of the paper WBPaper00006376<br />
Other curator data: "lin-41 and lin-42 are negatively regulated by let-7"<br />
Comments: This is Gary Ruvkun's (Nature 2000) let-7 paper that started the miRNA field!<br />
Comments: This is being marked as a pair of 'binding_site' Features because this is miRNA not TF binding.<br />
See also: let-7, lin-41 binding WBPaper00006376 PubMed: 14729570 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC324419/) - This is the paper that defined the binding sites.<br />
WBPaper00005044 Gary Interaction: 20<br />
Time: 3 hours<br />
New objects: made ire-1 binding site Feature (WBsf977530)<br />
Location of information: body of this paper and WBPaper00005036<br />
Other curator data: IRE-1 is a stress-activated endonuclease resident in the ER that is conserved in all known eukaryotes. IRE-1 mediated unconventional splicing of an intron from xbp-1 mRNA controls expression of the encoded transcription factor and is required for upregulation of most UPR target genes.<br />
See also: WBPaper00005036<br />
WBPaper00005971 Gary Interaction: 16 This has already been done by Gary: Feature WBsf718850<br />
WBPaper00024440 Gary Interaction: 1<br />
Time: 3 hours<br />
New objects: Made WBsf977532, WBsf977533 M-2 motif/daf-12 binding sites for ceh-22<br />
New objects: Made WBsf977534, WBsf977535 M-2 motif/daf-12 binding sites for myo-2<br />
Location of information: body of this paper and Figure Supplemental 3<br />
Other curator data: "We conclude that regulation of myo-2 and ceh-22 during dauer development depends critically on the M-2 motif."<br />
Comments: lots of hypothetical binding sites for daf-12 in 90-odd genes, but these have not been experimentally confirmed.<br />
WBPaper00028816 Gary<br />
WBPaper00028986 Gary Interaction: 16<br />
WBPaper00029181 Gary<br />
WBPaper00029327 GaryInteraction: 33<br />
WBPaper00030849 Gary<br />
WBPaper00031355 Gary Interaction: 5<br />
WBPaper00004181 Mary Ann Interaction: 8 <br />
No Features. Took 5 mins to read. <br />
WBPaper00005056 Mary Ann <br />
Time: 35 mins<br />
New objects: 37 - not yet curated.<br />
Location of information: Mainly in fig. 1, but in body text. <br />
Comments: TRTTKRY element in promoter region of T05E11.3, D2096.6, C44H4.1, ZK816.4, ceh-22, <br />
tph-1, M05B5.2, myo-2. Bound by pha-4 <br />
WBPaper00006429 Mary Ann Interaction: 3<br />
Time: 1/2hr<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: Alludes to GATA binding sites, but no experimental data. <br />
WBPaper00024505 Mary Ann<br />
Time: <br />
New objects: <br />
Location of information: <br />
Comments: <br />
WBPaper00028849 Mary Ann<br />
Time: 45 mins<br />
New objects: 1<br />
Location of information: In body of text and fig. 3B<br />
Comments: <br />
WBPaper00029058 Mary Ann Interaction: 50<br />
WBPaper00029190 Mary Ann<br />
WBPaper00029406 Mary Ann Interaction: 145<br />
WBPaper00030877 Mary Ann Interaction: 8<br />
WBPaper00031471 Mary Ann<br />
WBPaper00004482 Xiaodong Interaction: 2<br />
Time: two hours<br />
New objects: request new features for more interaction objects<br />
Location of information: figure 5A, <br />
Comments: HOX/CEH-20 binding sites in hlh-8 promoter. feature will be used in cis-regulation and physical interaction objects<br />
Duplication: yes. only one interaction objects WBInteraction000001291 associated with the paper currently<br />
WBPaper00005609 Xiaodong Interaction: 20 This has already been done by Margaret: Feature WBsf019097, WBsf038788, WBsf038789, WBsf038790, WBsf038791, WBsf038793, WBsf038794, WBsf019098, WBsf038792<br />
WBPaper00024189 Xiaodong<br />
WBPaper00024981 Xiaodong<br />
WBPaper00028910 Xiaodong Interaction: 7<br />
WBPaper00029109 Xiaodong<br />
WBPaper00029229 Xiaodong Interaction: 3<br />
WBPaper00030809 Xiaodong<br />
WBPaper00030931 Xiaodong Interaction: 18<br />
WBPaper00031565 Xiaodong Interaction: 1<br />
</pre><br />
* I suggest we curate 5 of our papers before the jamboree, taking notes as described above and any other things that arise during your curation.</div>Xdwanghttps://wiki.wormbase.org/index.php?title=WormBase-Caltech_Weekly_Calls&diff=24234WormBase-Caltech Weekly Calls2014-07-24T19:36:27Z<p>Xdwang: /* July 24, 2014 */</p>
<hr />
<div>= Previous Years =<br />
<br />
[[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_2011|2011 Meetings]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_2013|2013 Meetings]]<br />
<br />
<br />
= 2014 Meetings = <br />
<br />
[[WormBase-Caltech_Weekly_Calls_January_2014|January]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_February_2014|February]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_March_2014|March]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_April_2014|April]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_May_2014|May]]<br />
<br />
[[WormBase-Caltech_Weekly_Calls_June_2014|June]]<br />
<br />
<br />
== July 3, 2014 ==<br />
<br />
=== WS245 Upload ===<br />
* Tuesday, July 29th local citace upload (to Wen)<br />
* Friday, August 1st citace upload to Hinxton<br />
<br />
=== Data models freeze ===<br />
* Next Friday July 11th<br />
<br />
=== Automated Concise Descriptions ===<br />
* James and Ranjana making progress<br />
* Several examples of automated descriptions, proof of concept<br />
* Based on 3 GO categories and homology<br />
* Add tissue-specific expression, and go live<br />
* Should add 1000s of descriptions (automated)<br />
* Next, pick terms of higher granularity<br />
* We should ask users what they want: e.g. Is an Ascaris suum gene ortholog in C. elegans essential?<br />
* Manual annotations in future? How can edits be tracked? Gets a new tag?<br />
* Automated descriptions for all objects?<br />
* We will likely separate manual from automated descriptions to ease editing and tracking<br />
* GO terms can be added as IDs and be converted (automatically) to the term name with hyperlink on the web site<br />
<br />
=== New Tazendra machine ===<br />
* Up and running at dhcp-52-226.caltech.edu; IP address: 131.215.52.226<br />
* Juancarlos will sync the current Tazendra data to the new machine after meeting<br />
* Curators will test forms and dumpers; compare dump outputs from new and old machine<br />
* We can make the new machine live later in the month<br />
<br />
<br />
== July 10, 2014 ==<br />
<br />
=== CANTO ===<br />
* CANTO - PomBase's Community Annotation Tool<br />
** http://www.ncbi.nlm.nih.gov/pubmed/24574118<br />
** Link to demo version: http://curation.pombase.org/demo <br />
** Can try demo version with any S. pombe or S. cerevisiae gene names/ids<br />
** GO annotations, mutant phenotypes, etc.<br />
<br />
=== Construct model/annotation ===<br />
* Need to add "Construct" under list of "Detection methods" in the ?Interaction model<br />
* Chris will send corrected models.wrm file to Karen<br />
* Karen will send models.wrm to Paul Davis by tomorrow<br />
* Interaction OA dumper will need to be updated to accommodate constructs and transgene changes (Chris will work on with Juancarlos)<br />
<br />
=== BioGRID data ===<br />
* Mike Tyers agreed to release physical interaction data for C. elegans<br />
* We will import data into the Interaction OA and WormBase<br />
* Attribution? Reference to an ?Analysis object? Would need an additional field for Analysis objects? Could reference as Database accession number<br />
* Chris will discuss with Kimberly<br />
<br />
<br />
<br />
== July 17, 2014 ==<br />
<br />
=== SAB ===<br />
* Travel Arrangements for the October SAB in England and GO Meeting in Spain<br />
* Have advisory board members been contacted? Are they all available?<br />
<br />
=== Construct ===<br />
* Model and OA are now live<br />
* Transgenes have been replaced with appropriate construct objects<br />
<br />
=== WOBr ===<br />
* Pushing code to the live web site soon<br />
<br />
== July 24, 2014 ==<br />
<br />
==== SAB ====<br />
*Happening on Oct 7 and 8th, 2014, with advisors<br />
**Monday whole day<br />
**Tuesday till 1pm <br />
*Pre-meeting on Oct 6, Sunday afternoon<br />
<br />
==== Nomenclature pages on the website====<br />
* See email from Mary Ann sent 18th July. <br />
* Will still use single link under 'Nomenclature' in black banner in bottom of wormbase.org page, but add a link in widget to redirect to different link for 'other nematode'<br />
<br />
==== WormBook chapter on wormbase====<br />
* to replace NAR article<br />
* to show users what are data and the means to access these data<br />
** gene function view, eg, gene expression, anatomy<br />
**GBrowse<br />
**WormMine<br />
**Process/Topic page<br />
<br />
==== Biotapestry====<br />
*canonical net works can be pre-constracted and display in Topic page<br />
*dynamic display issue<br />
<br />
====Expression data====<br />
*data from Waterston group, 130 diagrams<br />
* topo maps, mountain views data <br />
<br />
====Misc issues====<br />
*'curated_by' tag is going to be retired. -D file is generated<br />
*construct OA works well.<br />
<br />
== July 31, 2014 ==<br />
<br />
=== Agenda ===<br />
* Topic 1<br />
* Topic 2</div>Xdwang