Difference between revisions of "First-pass flagging pipelines"

From WormBaseWiki
Jump to navigationJump to search
Line 42: Line 42:
 
Data also shows up in the sfp column on the curator first-pass form. <br>
 
Data also shows up in the sfp column on the curator first-pass form. <br>
 
*If the SVM results are correct, the fpcurator merges the data into the cfp table entry box. (sort of, the FP curator doesn't have to merge all of sfp_ tfp_ and afp_, they just need to type ``yes'', and it's a shortcut to click any of those buttons.  Again, this is only if they themselves believe the paper contains that datatype from having read the paper -- Juancarlos)
 
*If the SVM results are correct, the fpcurator merges the data into the cfp table entry box. (sort of, the FP curator doesn't have to merge all of sfp_ tfp_ and afp_, they just need to type ``yes'', and it's a shortcut to click any of those buttons.  Again, this is only if they themselves believe the paper contains that datatype from having read the paper -- Juancarlos)
*If the SVM results are incorrect, the fpcurator should write "false positive" in the cfp table entry box.  (Again, I'm not sure whether they should or a blank properly denotes that they don't believe there's data)
+
*If the SVM results are incorrect, the fpcurator should write "false positive" in the cfp table entry box.  (Again, I'm not sure whether they should or a blank properly denotes that they don't believe there's data -- Juancarlos)
  
 
The number of papers flagged for a data type by SVM will be noted on the curation status pages in its own column and added to the total number of flags.
 
The number of papers flagged for a data type by SVM will be noted on the curation status pages in its own column and added to the total number of flags.

Revision as of 01:00, 31 July 2009

Flagging a paper vs. alerting a data curator

At this moment, papers are flagged for specific data types through two different tables, the curator first-pass table and the Textpresso first-pass table. A third table, SVM first-pass, will be implemented shortly.

Data curators are alerted to a paper containing data relevant to their data type by the presence of data in three different tables, the cfp, tfp, and the author first-pass table.

Three ways a paper can be flagged

Name of the postgres table is noted in the parentheses.

Curator first-pass (cfp) form

This form is accessed by clicking "curate!" for a paper on the WBPaperEditor first-pass checkout UI.
This form contains columns for all the different first-pass tables, currently, the Textpresso first-pass table, the author first-pass table, and the curator first-pass table.

Curator first-pass table (cfp)

Data is entered directly into postgres through this table. The curator uses the text boxes to enter data based on their own paper reading or to agree with or modify data entered by authors, textpresso, or SVM (not implemented yet).

Upon hitting 'flag!' data entered into the cfp, afp, or tfp is sent to the e-mail that corresponds with the data field that contains data.

However, for the purposes of the curation status form, only those data types with entries in the cfp table are counted as flagged. So for the paper to be considered flagged for the curation status form, the first-pass data curator must merge the data from the author data or textpresso data into the cfp box, or else type something else in the box.


Author first-pass table (afp)

This form is sent to the first e-mail address of a paper recognized by script as described here, Author first pass requests
The e-mails are sent out on a weekly basis every Thursday, in batches of no more than 50 papers.

This form contains all the same data fields as the curator first-pass table.

First-pass curators are expected to approve or reject author data by a check box. The check box is checked on by default, meaning the author data is set by default as 'approved'. To reject the author-entered data, the curator should uncheck the check box. Approved and rejected author data is stored in Postgres and can be queried by looking at the afp_ table, which has paperID, data, author timestamp, curatorID, approve/reject, curator timestamp

Although an e-mail alert for author data entered will be sent to the data curator, the paper will not be considered flagged for that data type unless the curator enters the author data (or any data that reflects what the curator thinks) into the curator column (i.e. into the cfp table); therefore, approved author data needs to be manually entered into the curator first-pass table, by clicking "merge" for the paper to be added to the 'flagged' number on the curation status form. (If you agree with it, if you just think "yes", type "yes". If you partially agree with it, merge it and edit it. The point of the merge link was to save clicks in copy-pasting when a curator partially agreed with an author -- Juancarlos)

Textpresso automated (tfp) scripts

Data entered here is fed directly to the data curator.
Data also shows up in the tfp column on the curator first-pass form.

  • If the textpresso results are correct, the fpcurator merges the data into the cfp table entry box.
  • If the textpresso results are incorrect, the fpcurator should write "false positive" in the cfp table entry box. (I'm not sure about this, I thought it was up to the data curator to report false positives, and the FP curator would only type in stuff if they believed there was something in that paper for that data type. That is, if the FP curator left it blank, they believe it's a "no", and are essentially saying false positive, while leaving the words "false positive" reserved for the data curator to enter through the False Positives section of the Paper Editor. The fact that cfp_ field is blank is enough of a clue for the data curator that the FP curator doesn't believe something is there -- Juancarlos)

not implemented yetThe number of papers flagged for a data type by Textpresso will be noted on the curation status pages in its own column and added to the total number of flags.

SVM (sfp)...not implemented yet

Data entered here is fed directly to the data curator.
Data also shows up in the sfp column on the curator first-pass form.

  • If the SVM results are correct, the fpcurator merges the data into the cfp table entry box. (sort of, the FP curator doesn't have to merge all of sfp_ tfp_ and afp_, they just need to type ``yes, and it's a shortcut to click any of those buttons. Again, this is only if they themselves believe the paper contains that datatype from having read the paper -- Juancarlos)
  • If the SVM results are incorrect, the fpcurator should write "false positive" in the cfp table entry box. (Again, I'm not sure whether they should or a blank properly denotes that they don't believe there's data -- Juancarlos)

The number of papers flagged for a data type by SVM will be noted on the curation status pages in its own column and added to the total number of flags.

Other ways data objects are being collected

Journal first-pass (jfp), through GENETICS only

This table contains data objects entered directly by Genetics authors from a URL generated by Tracey De Pellegin Connelly through the doi ticket form. The data objects collected in this table are objects that do not exist in WB already for the following data types:

  • genesymbol
  • extravariation
  • newstrains
  • newbalancers
  • antibody
  • transgene
  • newsnp
  • newcell

This data is being collected so that Arun can mark-up the Genetics paper and provide links for objects that are not in WB yet, but will be in the future.

The form that authors use to enter data into the jfp table combines these data fields with the normal author first-pass form. In effect, the author is entering data into two separate tables using one form.

The pipeline for alerting data curators from this table still needs to be worked out, right now, it is dealt with manually.

First-pass details

First-pass schedule, instructions, automation