Detailed curation workflows and search scenarios

From WormBaseWiki
Jump to navigationJump to search

Possible Search Scenarios for CCC 2.0

Search entry points:

Searches would return sentences, one page on the curation form per paper, if possible?

Upper limit of searches would return 250 sentences (this is a guess of an upper limit). If search matches/returns more than that, we could have a message saying contact us for assistance.

  1. Source File Date
    1. This would be the date when the source file was generated by a Textpresso search.
  2. Gene Name or ID
    1. This could be a gene name, synonym (?), or ID as represented in the gpi (gene product information) file.
    2. Search could default to substring, but curators would need to select exact match and search synonyms options.
    3. Other options here could be to filter on curated sentences only, or one or more of the sentence classifications (e.g., false positive, already curated, etc.).
  3. Paper
    1. PMID, TAIR IDs, could also add WBPaper IDs?
    2. Other options here could be to filter on curated sentences only, or one or more of the sentence classifications (e.g., false positive, already curated, etc.).
  4. GO term
    1. Search could be on GO ID or term string.
    2. Could filter on Annotated, Suggested (Annotated + Unannotated), or Suggested (Unannotated)
  5. Textpresso component term
    1. Term string
    2. Annotated, Unannotated filter
  6. Curator
    1. First name, Last Name, WBPersonID (? - probably unlikely that curators would use a WBPerson ID)
  7. Annotation Date
    1. On the paper editor we left this free text, as opposed to drop downs
    2. I like drop downs for accuracy, but they are more limiting (at least as I understand their functionality)
    3. I can imagine curators possibly wanting to look at annotations from a range of dates, say the first six months of 2012.
    4. Filters - Only GO annotations
  8. Sentence classifications
    1. Curate
    2. Already curated
    3. Scrambled sentence
    4. Run-on sentence
    5. False positive
    6. Positive for localization, but not for GO
  9. Comments (?)
    1. This would be most useful if we used some kind of controlled vocabulary, e.g. Updated from obsolete term.

Curation Workflows

Note: each sentence could have an associated comment.

  1. Weekly curation - curators finishes everything in a given source file
    1. Curator selects a source file from the list each week
    2. Curator curates sentences from first paper in the file
    3. Some sentences lead to an annotation
    4. Some sentences are just classified, but don't lead to annotation
    5. Has the option to:
      1. Send annotations to Protein2GO
      2. Save annotations in postgres
    6. Curator never needs to go back to look at these papers
  2. Resume curation - curator needs to come back and finish annotating
    1. Curator didn't finish annotating a paper
    2. Work up to that point is saved
    3. Curator can go back to source file and proceed through the file
    4. Curator can search on a particular paper in a source file and proceed through the rest of the file (?)
    5. Some sentences lead to an annotation
    6. Some sentences are just classified, but don't lead to annotation
    7. Has the option to:
      1. Send annotations to Protein2GO
      2. Save annotations in postgres
    8. Curator never needs to go back to look at these papers
  3. Curator needs to return to, and modify, an old annotation
    1. This may happen when, say, a GO term is made obsolete and curators needs to update their old annotations.
    2. Query for GO term.
    3. Form retrieves all sentences for which an annotation was made using that GO term.
    4. Curator could:
      1. Change the existing annotation to a new GO term and either send the new annotation to Protein2GO or save it in postgres.
      2. Remove the existing annotation and not replace it with a new GO term (in this case, the curator would need to manually remove the annotation from Protein2GO if it was also stored there).
      3. In the relationship index that maps GO terms to Textpresso component terms, add value for obsolete to any now obsolete GO terms ( we don't currently do this). Perhaps we could run a script over the GO .obo file (or OWL file?) to periodically check for obsolete terms and mark them as such so we display that in the curation form in column 3.
  4. Curator wants to check consistency of annotations
    1. In this scenario, a curator might want to look at annotations for a given gene, Textpresso component term, or GO term to see if they've been annotating consistently.
    2. Query on a gene name, return all sentences that were used to annotate that gene, or all sentences that mention that gene.
    3. Query on a component term, return all sentences that used that component term for annotation, or all sentences that mention that component.
    4. Query on a GO term, return all sentences that used that GO term for an annotation, or all sentences that suggested that GO term for annotation.