First-pass to Curation

From WormBaseWiki
Revision as of 16:42, 31 July 2009 by Kyook (talk | contribs)
Jump to navigationJump to search

Caltech documentation First-pass flagging pipelines

First pass check out list

Once a paper has been entered into Postgres and is not labeled as "review", or "for functional analysis only"

1. It appears on the first-pass curation checkout page. The paper is in grey until a pdf or document is available for a curator to access. It also remains grey until the paper has been through Textpresso (?).

2. It goes through a Textpresso scan for the presence of predetermined values, patterns, etc. for specific data types. These values are entered into the Textpresso first pass tables (tfp) and show up on the curator first pass form (cfp form) in the first column. If a paper has been through Textpresso, it will have a "T" next to the WBPaper ID in the checkout form.

3. An automated script scans the paper for an e-mail address, if one is found, a URL to an author first pass form (afp form) is sent to the e-mail address. URLs are sent in batches of 50 every Thursday. Once a URL is sent, the paper is appended with an "A" in the checkout. In the first 7 days, the "A" is in red, after that the "A" is in cyan. If an author responds, the "A" turns to dark blue or to a red "R" if the author labeled their paper as a review.

Curator checks out a paper

To first pass a paper, a first-pass curator will "check-out" a paper, that is, select a paper to curate by pressing 'curate' on the check out form. Their name will be added in the 'check-out' column of the form so other people will know that that paper is getting worked on.

The curator will be taken to the cfp form were they can query postgres for data entered for the paper by Textpresso, and or authors, (or other curators).

If there are data from authors, the curator can approve or reject it by clicking the check box in the author column for each data type. If the curator wants to modify what the author enter, they can merge the author data into the curator table and edit the data.

The curator cannot approve or reject Textpresso results. However, they should feel free to comment on it in the curator table.

Once the curator is happy with the first-pass, they hit "flag!".

Data curator alerted

For those data types that are actively curated, the data are sent to the e-mail address(es), which are noted in the the last column of the cfp form. In some cases the data curator is alerted to the paper as being flagged for them through a check-out form of their own.

Each curator has their own way of receiving these flags and will need to be asked individually for this information.

Which papers have been flagged for a specific data type is stored in postgres and can be mined from there. n.b. Kimberly and I have been working on a form that displays all this information, but as things are in flux now, it is on a back burner.

Data curator finishes a paper

How the paper is noted as curated once the data curator is done, is also dealt with on an individual basis.

Ultimately, this information can be mined by ace query on the latest release, the down side of this method being it will not be on-the-fly stats. n.b. Kimberly and I have been working on a form that displays all this information, but as things are in flux now, it is on a back burner.