Difference between revisions of "First-pass schedule, instructions, automation"

From WormBaseWiki
Jump to navigationJump to search
 
(47 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[http://www.wormbase.org/wiki/index.php?title=Caltech_documentation&action=submit back]]
+
[[Caltech documentation]]<br>
 
+
[[First-pass to Curation]]<br>
=[[First-pass data types explained]]=
+
[[First-pass_flagging_pipelines]]
[http://www.wormbase.org/wiki/index.php/First-pass_data_types_explained Link to current(old) curator firstpass form and explanations of datatypes.]
 
 
 
=[[Texpresso/Author/Curator interim form]]=
 
[http://www.wormbase.org/wiki/index.php/Texpresso/Author/Curator_interim_form Link to new form that combines data flagging from all three sources.]
 
  
 
=First-Pass Rotation=
 
=First-Pass Rotation=
 
{| border=2 {{table}}
 
{| border=2 {{table}}
 
| align="center" style="background:#f0f0f0;"|'''First-Pass Curator'''
 
| align="center" style="background:#f0f0f0;"|'''First-Pass Curator'''
| align="center" style="background:#f0f0f0;"|'''Two-week period'''
+
| align="center" style="background:#f0f0f0;"|'''Round 1'''
| align="center" style="background:#f0f0f0;"|'''Landmarks'''
+
| align="center" style="background:#f0f0f0;"|'''Round 2'''
|-
 
| Karen  ||2/16-3/1||
 
|-
 
| Ranjana ||3/2-3/15||
 
|-
 
| Raymond ||3/16-3/29||
 
|-
 
| Jolene ||3/30-4/12|| New forms active
 
|-
 
| Gary ||4/13-4/26||
 
|-
 
| Xiaodong ||4/27-5/10||
 
|-
 
| Erich ||5/11-5/24||
 
|-
 
| Wen ||5/25- 6/7||
 
|-
 
| Kimberly ||6/8-6/21||
 
|}
 
 
 
=Assigned Tasks=
 
{| border=2
 
 
 
| align="center" style="background:#f0f0f0;"|'''Who'''
 
| align="center" style="background:#f0f0f0;"|'''When'''
 
| align="center" style="background:#f0f0f0;"|'''Task'''
 
| align="center" style="background:#f0f0f0;"|'''Goal'''
 
|-
 
|-
 
|Gary ||current ||With Ruihua: automate RNAi flags |||
 
|-
 
|Wen ||'''DONE''' ||With Juancarlos: automate Anti-body flags |||
 
|-
 
|Jolene||current ||with Arun: automate allele extraction|||
 
|-
 
|Ranjana||current ||with Juancarlos: automate human diseases|||
 
 
|-
 
|-
|Karen||current ||with Michael and Juancarlos: automate drug term flagging|||
+
| Karen ||2/16-3/1|| 7/6-7/19
 
|-
 
|-
|Jolene||current ||with Ruihua: create curation textpresso webpage studio for automating newmutant sentence identification/extraction for phenotype curation.||
+
| Xiaodong ||3/2-3/15 ||7/20-8/1
 
|-
 
|-
| Wen  ||'''DONE''' ||make foreign language tag for Journals/Articles ||Clear FP list of non-English language articles, which can't be curated
+
| Raymond ||3/16-3/29 ||8/3-8/16
 
|-
 
|-
| Andrei  ||'''DONE'''||Analyze author fill-out form|| To provide a summary of what works and what not and to seek improvements of the procedure.
+
| Jolene ||3/30-4/12: New forms active||8/17-8/30
 
|-
 
|-
| Andrei ||'''DONE'''||Work out the correspondence of fields between the author and curator forms.|||
+
| Gary ||4/13-4/26: Author forms sent out||8/31-9-13
 
|-
 
|-
| [[Juancarlos]] ||'''DONE'''|| Waiting for consensus on what to put in fields for author and curator for FP.||  
+
| Ranjana ||4/27-5/10: Author feedback coming in ||9/14-9/27
Raymond suggests : To have a first pass form for curators that shows author's
 
submission for curator's approval (e.g. a tick) for that information to be sent
 
(along with whatever information a curator puts in) for data extraction. If the
 
first-pass curator dis-approves author's input (by not ticking), then the author's
 
input will not be further processed or sent but it will be nevertheless stored as
 
is in the database. For the next phase, results from automated first-pass via
 
textpresso could be treated similarly as that of the author's. The ultimate goal
 
is to maximize the number of fields that need no curator approval. Sources of
 
first-pass curation should be clearly distinguished by a person ID (textpresso
 
will be assigned one).
 
 
 
Juancarlos is okay with this, but while I'm leaning towards assigning the author
 
response to a PersonID, if this is not going to be used for evidence anywhere,
 
the corresponding email is possibly the more accurate evidence since the receiver
 
may pass it on to someone who is not the WBPerson that email is assigned to.  In
 
that case Textpresso wouldn't need a person ID.  I'm still leaning toward using
 
IDs though, I'm just not sure it reflects the right things if we ever want that
 
in WB or something like that.
 
 
 
Juancarlos also needs to know how curators want to enter data.  For any given
 
Paper-Field, would curators want to be able to enter unlimited entries, and make
 
then invalid to delete ?  Would you prefer the current system where there's only
 
a single box where everyone mushes in all data ?  Would you care about the history
 
of deleted things ?| 
 
 
|-
 
|-
|Juancarlos || '''DONE''' || Resolve duplicates || There are many papers on the firstpass list that are already firstpassed. Most of these papers are duplicates and have two WBPaperID assignments. Is there a way to resolve this?.
+
| Erich ||5/11-5/24||9/28-10/11
 
|-
 
|-
|Juancarlos ||'''DONE''' ||Assign Publication Type 'Review' to all papers that are annotated as 'review' in the Comments section of the FP curation form || Remove these papers from the corpus that textpresso searches for data type patterns.
+
| Wen ||5/25- 6/7||10/12-10/35
 
|-
 
|-
|Arun||'''DONE''' ||Increase scans for transgene objects to occur daily and change pipeline to run on new papers only|||
+
| Kimberly ||6/8-6/21||10/26-11/8
 
|-
 
|-
|Juancarlos||'''DONE''' ||Sort papers based on first pass checkout list based on whether or not they have been passed through Textpresso||Curators can now focus on papers whose data fields have already be filled in by textpresso.
+
|WormBase re-evalutation||6/22-7/5: IWM||(?)
 
|}
 
|}
  
==Unassigned tasks and comments that need more discussion==
 
* We do have a record of data type curation for each paper, is there some way of combining the first pass curation with the curation status form?
 
-The curation status form gets flagged data from the first pass form (don't think I understand the question) -- Juancarlos
 
  
* How does the false positive work?
+
----
- the words ``false positive'' get appended (or prepended, I forget and can't find an example) into the text field -- Juancarlos
 
  
* Curator preference for FP remarks, can people deal with the not getting detailed notes about their data type and where it exists in the paper or should this be a mandatory part of first-pass?
+
=Automation=
 +
[http://goldturtle.caltech.edu/wcat/ Textpresso's Automation Explanation and Schedule  http://goldturtle.caltech.edu/wcat/]
  
* Discrepancy between FP papers and total papers curated.  For some data types, curators get to the paper before the FP curator, it would be good to know that a curator already touched it.  Is there a way to mark the paper in the FP list as curated for a specific data type but not others?
 
- I thought on the checkout section they showed as ``RNAi only'' or something like that. -- Juancarlos
 
  
=First-Pass a paper=
+
----
Access papers on the [http://tazendra.caltech.edu/~postgres/cgi-bin/wbpaper_editor.cgi WBPaperEditor] page
 
  
==Pick a paper and access the curation form==
+
=First-pass forms=
* Go to http://tazendra.caltech.edu/~postgres/cgi-bin/wbpaper_editor.cgi  
+
*[http://www.wormbase.org/wiki/index.php/Texpresso/Author/Curator_interim_form New textpresso/author/curator table]
* Choose your name
+
**[http://tazendra.caltech.edu/~postgres/cgi-bin/curator_first_pass.cgi Example of new curator first-pass curation.cgi]
* Scroll down the page and select '''"Not Curated Plus Textpresso!"''' ''The body of these papers have passed through Textpresso and should report all found datatypes in the automated pipeline in the appropriate fields.''
+
*[http://www.wormbase.org/wiki/index.php/First-pass_data_types_explained  Old curator first-pass table]
* Scroll/Page down to a paper and select "Curate!" 
+
*[[Author first-pass form]]
 +
*[[Genetics Author pass form]]
 +
*[[Data types used on first-pass forms]]
  
+
----
''Alternatively''
 
* Access the curation.cgi from the WBPaperID page itself
 
* Select the WBPaperID from left column to take you to the paper page-- '''ONLY SELECT PAPERS FROM WBPaper0030000 AND LATER'''
 
* Select first-pass curate
 
''Note: the paper pdf can be accessed from the paper page along with supplemental materials.''
 
  
Either action takes you to the curation.cgi (SEE BELOW)
+
=Instructions for curators=
 +
[http://www.wormbase.org/wiki/index.php/Instructions_for_curators Overview for curators (for the new form)]
  
==The firstpass page [http://tazendra.caltech.edu/~postgres/cgi-bin/curation.cgi curation.cgi]==
 
*At the top of the page are links to:
 
**The tazendra site map (other forms)
 
**Documentation for the set up, paths from, and changes to the curation.cgi form
 
**Guidelines  for the form which include some summary information about the fields and features,  written by Raymond in 2001 and needs some updating.
 
  
*For each data type you can check the box or enter text:
+
----
**Check the box = a '"yes" is entered in to the field
 
**Enter text = the text is recorded
 
  
*E-mail is set for default send if there is a "yes" (from a check) or text in the data type flag box.
+
=Postgres data for individual data types=
 +
*[http://www.wormbase.org/wiki/index.php/FP_curator_comments_for_St.Louis_and_Sanger_structure_correction_data_type St.Louis and Sanger structure correction data type curator comments]
  
*When done, you can see the preview of the submission, by selecting "Preview!"
 
**If acceptable,  Select "New Entry!" and flags will be sent to data curators.
 
  
'''Saving data''':
+
----
If you want to come back to the paper later, you can choose to 'Save!' the data.
 
If you do this the next time you 'Load!' the data, ''the form will NOT send flags to curators unless you specify it to do so.''
 
You can restore the default 'send all to curators that have data type info' through the buttons at the top right of the form.
 
  
==Adding new gene paper connections==
+
[[User:Kyook|kjy]] 19:43, 9 February 2009 (EST)
You can add gene paper connections through the WBPaper editor page.  You can search directly for the paper on this page or you can access the paper by hitting "DISPLAY ALL!" and choosing the paper from the left column link.  You will be taken to a summary page for the paper, which at the bottom of the page gives you an area to confirm or add new genes associated with that paper.
 
  
  
=Automating data type flagging=
+
[[Category:Curation]]
 
 
===COMPLEX constructions===
 
* Gene Regulation
 
Basically, Arun got training sets from Andrei's first-pass flagged papers. Two training sets include: positive and negative flagged gene-regulation papers. Each set contains about a thousand papers. Then machine will produce index matrices (SVM, support vector machine, more specifically) according to two training sets. Arun can use some representative words/phrases appearing frequently in gene-regulation to fine tune his SVM. A new whole set of papers will be tested and we will see how this works.
 
 
 
* Expression
 
 
 
===SIMPLE patterns===
 
 
 
* RNAi via pattern recognition
 
* Antibody via pattern recognition
 
* Transgene
 
 
 
Pipeline
 
* Pattern Design - Work with Juancarlos and or Michael
 
* Pattern is run on PDFs loaded into postgres. 
 
* Curator gets list of papers that were not picked up by Andrei during first pass to evaluate possible outcomes
 
**are they false positives  - refine pattern
 
**were they missed by Andrei - keep pattern
 
**papers are not true articles and are 'no curatable' - omit from evaluation
 
* Once refined, implement auto population of first pass form for curation
 
* Pattern is run on PDFs loaded into postgres daily. 
 
* Scripts populate first pass form with results from pattern searches.
 
 
 
=[[FP curator comments for St.Louis and Sanger structure correction data type]]=
 
 
 
[[User:Kyook|kjy]] 19:43, 9 February 2009 (EST)
 

Latest revision as of 23:04, 13 August 2010

Caltech documentation
First-pass to Curation
First-pass_flagging_pipelines

First-Pass Rotation

First-Pass Curator Round 1 Round 2
Karen 2/16-3/1 7/6-7/19
Xiaodong 3/2-3/15 7/20-8/1
Raymond 3/16-3/29 8/3-8/16
Jolene 3/30-4/12: New forms active 8/17-8/30
Gary 4/13-4/26: Author forms sent out 8/31-9-13
Ranjana 4/27-5/10: Author feedback coming in 9/14-9/27
Erich 5/11-5/24 9/28-10/11
Wen 5/25- 6/7 10/12-10/35
Kimberly 6/8-6/21 10/26-11/8
WormBase re-evalutation 6/22-7/5: IWM (?)



Automation

Textpresso's Automation Explanation and Schedule http://goldturtle.caltech.edu/wcat/



First-pass forms


Instructions for curators

Overview for curators (for the new form)



Postgres data for individual data types



kjy 19:43, 9 February 2009 (EST)