Difference between revisions of "First-pass flagging pipelines"

From WormBaseWiki
Jump to navigationJump to search
Line 17: Line 17:
  
 
=Three ways a paper can be flagged=
 
=Three ways a paper can be flagged=
''Name of the postgres table is noted in the parentheses.''
 
 
 
==Curator first-pass (cfp) form==
 
==Curator first-pass (cfp) form==
 
This form is accessed by clicking "curate!" for a paper on the WBPaperEditor first-pass checkout UI.<br>
 
This form is accessed by clicking "curate!" for a paper on the WBPaperEditor first-pass checkout UI.<br>
 
This form contains columns for all the different first-pass tables, currently, the Textpresso first-pass table, the author first-pass table, and the curator first-pass table.  
 
This form contains columns for all the different first-pass tables, currently, the Textpresso first-pass table, the author first-pass table, and the curator first-pass table.  
  
====Curator first-pass table (cfp)====
+
===Curator first-pass table (cfp)===
 
Data is entered directly into postgres through this table.  The curator uses the text boxes to enter data based on their own paper reading or to agree with or modify data entered by authors, textpresso, or SVM (not implemented yet).  
 
Data is entered directly into postgres through this table.  The curator uses the text boxes to enter data based on their own paper reading or to agree with or modify data entered by authors, textpresso, or SVM (not implemented yet).  
  
Line 30: Line 28:
 
However, for the purposes of the curation status form, only those data types with entries in the cfp table are counted as flagged.  So for the paper to be considered flagged for the curation status form, the first-pass data curator must merge the data from the author data or textpresso data into the cfp box, or else type something else in the box.   
 
However, for the purposes of the curation status form, only those data types with entries in the cfp table are counted as flagged.  So for the paper to be considered flagged for the curation status form, the first-pass data curator must merge the data from the author data or textpresso data into the cfp box, or else type something else in the box.   
  
 
+
===Author first-pass table (afp)===
====Author first-pass table (afp)====
 
 
 
 
This form is sent to the first e-mail address of a paper recognized by script as described here, [[Author first pass requests]]<br>
 
This form is sent to the first e-mail address of a paper recognized by script as described here, [[Author first pass requests]]<br>
 
The e-mails are sent out on a weekly basis every Thursday, in batches of no more than 50 papers.   
 
The e-mails are sent out on a weekly basis every Thursday, in batches of no more than 50 papers.   
  
This form contains all the same data fields as the curator first-pass table.<br>
+
This form contains all the same data fields as the cfp.<br>
 
+
====FP curators are expected to approve or reject author-entered data====
First-pass curators are expected to approve or reject author data by a check box. The check box is checked on by default, meaning the author data is set by default as 'approved'.  To reject the author-entered data, the curator should uncheck the check box. <b>Approved and rejected author data is stored in Postgres and can be queried by looking at the afp_ table, which has paperID, data, author timestamp, curatorID, approve/reject, curator timestamp</b>
+
*Author-entered data is set by default as 'approved'.   
 +
*To reject the author-entered data, the curator should uncheck the check box.  
 +
*<b>Approved and rejected author data is stored in Postgres and can be queried by looking at the afp_ table, which has paperID, data, author timestamp, curatorID, approve/reject, curator timestamp</b>
  
Although an e-mail alert for author data entered will be sent to the data curator, the paper will not be considered flagged for that data type unless the curator enters the author data (or any data that reflects what the curator thinks) into the curator column (i.e. into the cfp table); therefore,  <b>approved author data needs to be manually entered into the curator first-pass table, by clicking "merge" for the paper to be added to the 'flagged' number on the curation status form</b>. (If you agree with it, if you just think "yes", type "yes"If you partially agree with it, merge it and edit itThe point of the merge link was to save clicks in copy-pasting when a curator partially agreed with an author -- Juancarlos)
+
====Approving author-entered data alone DOES NOT flag the paper in a statistical way====
 +
When a fp curator hits "flag!" an e-mail will be sent to the data curator if there is any data in any author-entered data field, so the data curator will be alerted to the presence of the paper. <br>
 +
However, the paper will not be counted as flagged in the cfp for that data type unless the curator enters data into the fp curator column (i.e. into the cfp table); therefore,  <b>approved author data needs to be manually entered into the curator first-pass table.</b><br>
 +
Some actions a fp curator can take with author-entered data:<br>
 +
*If you agree with everything the author says, click "merge" to enter the data into the cfp.
 +
*If you just think "yes", type "yes" in the cfp
 +
*If you partially agree with author-entered data, merge it and edit it in the cfp<br> <i>The point of the merge link was to save clicks in copy-pasting when a curator partially agreed with an author -- Juancarlos</i>
 +
*If you rejected the author-entered data, and you think the paper should not be flagged for that data type,  do not enter anything in the cfp.
  
 
==Textpresso automated (tfp) scripts==
 
==Textpresso automated (tfp) scripts==

Revision as of 14:42, 31 July 2009

Flagging a paper vs. alerting a data curator

At this moment, papers are flagged for specific data types through two different tables:

  • curator first-pass table (cfp), which is accessed through the cfp_form
  • Textpresso first-pass table (tfp)
  • A third table, SVM first-pass (sfp), will be implemented shortly

Data curators are alerted to a paper containing data relevant to their data type by the presence of data in the different tables, independently of each other.

  • Alerts from tfp are set up by each individual curator with the Textpresso group and Juancarlos.
  • Alerts from the cfp are sent when a first-pass curator flags a paper using the cfp_form and only if the "send" checkbox is on.


Textpresso FP results are currently not being displayed in the cfp_form for those data types that have been considered "good" (sufficiently automated) by curators. These data fields have been removed from the cfp_form, so they would have to be added back if we want to see them.
These data types are:

  • antibody
  • transgene
  • extvariation (new alleles)

Three ways a paper can be flagged

Curator first-pass (cfp) form

This form is accessed by clicking "curate!" for a paper on the WBPaperEditor first-pass checkout UI.
This form contains columns for all the different first-pass tables, currently, the Textpresso first-pass table, the author first-pass table, and the curator first-pass table.

Curator first-pass table (cfp)

Data is entered directly into postgres through this table. The curator uses the text boxes to enter data based on their own paper reading or to agree with or modify data entered by authors, textpresso, or SVM (not implemented yet).

Upon hitting 'flag!' data entered into the cfp, afp, or tfp is sent to the e-mail that corresponds with the data field that contains data.

However, for the purposes of the curation status form, only those data types with entries in the cfp table are counted as flagged. So for the paper to be considered flagged for the curation status form, the first-pass data curator must merge the data from the author data or textpresso data into the cfp box, or else type something else in the box.

Author first-pass table (afp)

This form is sent to the first e-mail address of a paper recognized by script as described here, Author first pass requests
The e-mails are sent out on a weekly basis every Thursday, in batches of no more than 50 papers.

This form contains all the same data fields as the cfp.

FP curators are expected to approve or reject author-entered data

  • Author-entered data is set by default as 'approved'.
  • To reject the author-entered data, the curator should uncheck the check box.
  • Approved and rejected author data is stored in Postgres and can be queried by looking at the afp_ table, which has paperID, data, author timestamp, curatorID, approve/reject, curator timestamp

Approving author-entered data alone DOES NOT flag the paper in a statistical way

When a fp curator hits "flag!" an e-mail will be sent to the data curator if there is any data in any author-entered data field, so the data curator will be alerted to the presence of the paper.
However, the paper will not be counted as flagged in the cfp for that data type unless the curator enters data into the fp curator column (i.e. into the cfp table); therefore, approved author data needs to be manually entered into the curator first-pass table.
Some actions a fp curator can take with author-entered data:

  • If you agree with everything the author says, click "merge" to enter the data into the cfp.
  • If you just think "yes", type "yes" in the cfp
  • If you partially agree with author-entered data, merge it and edit it in the cfp
    The point of the merge link was to save clicks in copy-pasting when a curator partially agreed with an author -- Juancarlos
  • If you rejected the author-entered data, and you think the paper should not be flagged for that data type, do not enter anything in the cfp.

Textpresso automated (tfp) scripts

Data entered here is fed directly to the data curator.
Data also shows up in the tfp column on the cfp_form.

  • If the fp curator agrees with the textpresso results, they should merge the data into the cfp entry box.
  • If the fp curator does not agree with the textpresso results they should leave the cfp table entry box blank or correct the info. If something is written in the box, the paper will be counted as a positive flag for that data type on the curator status form.

not implemented yet: The number of papers flagged for a data type by Textpresso will be noted on the curation status pages in its own column and added to the total number of flags.

SVM (sfp)...not implemented yet

Data entered here is fed directly to the data curator.
Data also shows up in the sfp column on the cfp_form.

  • If the fp curator agrees with the results, they should merge the data into the cfp entry box or type yes.
  • If the fp curator does not agree with the results they should leave the cfp table entry box blank or correct the info. If something is written in the box, the paper will be counted as a positive flag for that data type on the curator status form.

The number of papers flagged for a data type by SVM will be noted on the curation status pages in its own column and added to the total number of flags.

Other ways data objects are being collected

Journal first-pass form (jfp) : GENETICS papers only

This is not a postgres table but is a form that collects extra data from Genetics authors, which is then stored in afp. The form URL is generated by Tracey De Pellegin Connelly through the doi ticket form. The purpose of this form is to collect data objects that do not exist in WB already so that Arun can mark-up the Genetics paper and provide links for these objects.
Objects are collected for the following data types:

  • genesymbol
  • extvariation
  • newstrains
  • newbalancers
  • antibody
  • transgene
  • newsnp
  • newcell

All of these fields, except genesymbol, do not show on the normal afp_form. We opted to make a hybrid of the afp_form for the Genetics authors so that they would not be requested to fill out another WB generated form for us after their paper was published and because we needed this extra information from them. It is also my understanding that these authors will be required to fill out the form, so this was an opportunity to have 100% author feedback for paper flagging.

The pipeline for alerting data curators from this table still needs to be worked out, right now it is dealt with manually.

First-pass details

First-pass schedule, instructions, automation