Difference between revisions of "Juancarlos"

From WormBaseWiki
Jump to navigationJump to search
m
Line 86: Line 86:
 
* Discrepancy between FP papers and total papers curated.  For some data types, curators get to the paper before the FP curator, it would be good to know that a curator already touched it.  Is there a way to mark the paper in the FP list as curated for a specific data type but not others?  
 
* Discrepancy between FP papers and total papers curated.  For some data types, curators get to the paper before the FP curator, it would be good to know that a curator already touched it.  Is there a way to mark the paper in the FP list as curated for a specific data type but not others?  
 
- I thought on the checkout section they showed as ``RNAi only'' or something like that. -- Juancarlos
 
- I thought on the checkout section they showed as ``RNAi only'' or something like that. -- Juancarlos
 +
 +
 +
 +
 +
[[Category:Curation]]

Revision as of 23:45, 13 August 2010

Creating new FP form

Purpose of Firstpass is to alert curators to the presence of a data type in a published paper. In some cases the fp curator is asked to point to the actual data in the paper, saving the data curator time in looking at the paper.

The point of this interim form is to allow the fp curator to 1. monitor author comments for usefulness and feedback to the author submission form to clarify our data requests 2. alert curators to data types that aren't picked up by textpresso.

Therefore, the form needs to have

1. Textpresso data showing, which will be sent to the data curator. The fp curator shouldn't be doing anything with this, ultimately it will be good enough so the fp curator will know to not have to look for this data type. For now, the fp curator will need to look at it and add comments in the curator box for things that were missed.

2. Author data, which the fp curator will need to be able to accept or not and which the fp curator will be able to use to alert the DC. caveat: if it is obvious that the authors are having a difficult time understanding what we want, the fp curator should work on the author submission form to clarify our data requests (this is a separate pipeline from the FP form).

3. Curator data, which the fp curator will need to use to capture all the data missed by textpresso and the authors. For now this will be the busiest part of the form until the other data sources are set up and tested properly.

4. Curator send button to submit the flags to the data curator.

Aside from that there is nothing else that needs to go on the form for the fp curator.

BUT the big part missing is what the data curator receives and how the data curator can feedback to textpresso to hone the automated extraction.

So, in addition to the requirements of the fp curator, we need to ask if we want to layer on to the fp form, functionality for the data curator.

These functions include: 1. feedback to textpresso with ability to request a rescan. 2. ability to record false positive flags so papers can be eliminated from their pipeline.

Sending e-mail flags through the fp form should be easy, but I think the data curator should have easy access to all of the fp data. For this, should there be another output of the fp form? and should it be tailored to an individual curator or should it be one general output with feedback capabilities.


Tasks

Waiting for consensus on what to put in fields for author and curator for FP.

Raymond suggests : To have a first pass form for curators that shows author's 
submission for curator's approval (e.g. a tick) for that information to be sent 
(along with whatever information a curator puts in) for data extraction. If the 
first-pass curator dis-approves author's input (by not ticking), then the author's 
input will not be further processed or sent but it will be nevertheless stored as 
is in the database. For the next phase, results from automated first-pass via 
textpresso could be treated similarly as that of the author's. The ultimate goal 
is to maximize the number of fields that need no curator approval. Sources of 
first-pass curation should be clearly distinguished by a person ID (textpresso 
will be assigned one).
Juancarlos is okay with this, but while I'm leaning towards assigning the author 
response to a PersonID, if this is not going to be used for evidence anywhere, 
the corresponding email is possibly the more accurate evidence since the receiver 
may pass it on to someone who is not the WBPerson that email is assigned to.  In 
that case Textpresso wouldn't need a person ID.  I'm still leaning toward using 
IDs though, I'm just not sure it reflects the right things if we ever want that 
in WB or something like that.
Juancarlos also needs to know how curators want to enter data.  For any given 
Paper-Field, would curators want to be able to enter unlimited entries, and make 
then invalid to delete ?  Would you prefer the current system where there's only 
a single box where everyone mushes in all data ?  Would you care about the history 
of deleted things ?

Implement automating simple data type flagging

Finished

Resolve duplicates in FP Curation checkout pages Problem: There are many papers on the firstpass list that are already firstpassed. Most of these papers are duplicates and have two WBPaperID assignments. Is there a way to resolve this?.

Clear review papers from textpresso data types searches Problem: Reviews were making up a fair portion of false positives in data type automation trials. Assign Publication Type 'Review' to all papers that are annotated as 'review' in the Comments section of the FP curation form

Sort papers based on first pass checkout list based on whether or not they have been passed through Textpresso Curators can now focus on papers whose data fields have already be filled in by textpresso.


Unassigned tasks and comments that need more discussion

  • We do have a record of data type curation for each paper, is there some way of combining the first pass curation with the curation status form?

-The curation status form gets flagged data from the first pass form (don't think I understand the question) -- Juancarlos

  • How does the false positive work?

- the words ``false positive get appended (or prepended, I forget and can't find an example) into the text field -- Juancarlos

  • Curator preference for FP remarks, can people deal with the not getting detailed notes about their data type and where it exists in the paper or should this be a mandatory part of first-pass?
  • Discrepancy between FP papers and total papers curated. For some data types, curators get to the paper before the FP curator, it would be good to know that a curator already touched it. Is there a way to mark the paper in the FP list as curated for a specific data type but not others?

- I thought on the checkout section they showed as ``RNAi only or something like that. -- Juancarlos