Difference between revisions of "User Guide for Curators"

From WormBaseWiki
Jump to navigationJump to search
Line 70: Line 70:
 
**If there is NOT a suitable suggested GO annotation, then the curator can enter a new one.
 
**If there is NOT a suitable suggested GO annotation, then the curator can enter a new one.
 
**To enter a completely new GO annotation, either select a Textpresso component term from the drop-down menu, or enter a component term (if not originally recognized by Textpresso) in the free text box above the drop-down menu.  Then enter a GO term by typing the text string of the term.  The GO term entry is an auto-complete, so you must select a GO term from the GO file (updated nightly).
 
**To enter a completely new GO annotation, either select a Textpresso component term from the drop-down menu, or enter a component term (if not originally recognized by Textpresso) in the free text box above the drop-down menu.  Then enter a GO term by typing the text string of the term.  The GO term entry is an auto-complete, so you must select a GO term from the GO file (updated nightly).
**The default evidence code is IDA, but you can also make IPI annotations (e.g. for a protein complex) by selecting the IPI evidence code and adding a UniProtKB identifier in the format in the 'with' box below the evidence codes.
+
**The default evidence code is IDA, but you can also make IPI annotations (e.g. for a protein complex) by selecting the IPI evidence code and adding a UniProtKB identifier (e.g., UniProtKB:P34688) in the 'with' box below the evidence codes.

Revision as of 21:55, 12 August 2013

User Guide for Textpresso for CCC Form

Where to Find the Form

  • The form is currently located at:
http://mangolassi.caltech.edu/~azurebrd/cgi-bin/forms/ccc/ccc.cgi
  • Once we've finished initial testing, the form will be transferred over the our production server.

What We Need from Groups Using the Form

  • Instructions on what papers to run through the CCC search and how often to search
  • Any filtering steps, e.g. filter against current GAF, filter after Support Vector Machine data type classification
  • A gpi (gene product information) file as specified by the GOC: gpad and gpi file format specifications. This allows us to map all gene products mentions in a paper to a MOD identifier and to a UniProtKB identifier so we can send annotations via web services to UniProt's Protein2GO tool.

Logging In

  • Select your first name from the list of curators and click on login
  • The login page also has a button that displays the list of Textpresso Cellular Component category terms and their mapping, based on previous curation, to a GO term and GO ID.
  • Previously entered GO terms that don't map to any existing GO term are highlighted in red. Note that the Component - GO term index is based on all previous curation data from the old form that didn't restrict how GO terms were entered. The new form uses an autocomplete to add GO terms which should avoid any future errors in GO term entry.

Loading Files for Curation and Searching Files

  • Once a curator has logged in, they will be taken to the curation page with options for searching and loading files and annotations.
  • Search options may be used singly or in combination.
  • Search options are (in the order they appear on the form):
    • Sentence-curation - this allows you to search sentences files and exclude curated or noncurated sentences as needed
    • Papers to show - this allows you to set a limit on the number of papers that are returned for any given search. The default value is 10.
    • Source - this lists the names of files generated by the Textpresso search for curation. File names have a standardized name: date of search_MOD_type of search, e.g. 20130426_WB_ccc
      • Search or load one file - click on that file name
      • Search or load consecutive files - click on the first file name, hold down the shift key, press the last file name
      • Search or load non-consecutive files - click on the first file name, hold down the command key, click on the next file name
    • Gene product - this allows you to search sentences for a gene product. Searches are case-insensitive and you can search using:
      • the gene product name
      • the MOD ID
      • the UniProtKB ID, either the ID in full or just the six-digit accession (e.g., UniProtKB:P46549 or P46549)
    • Paper
      • Papers can be searched by the PMID using either the full identifier or just the number (e.g., PMID:23064028 or 23064028)
      • In the future we can add the capability to also search using a doi
    • Annotation Curator
      • You can search for any annotations made by a specific curator using their login name as shown on the login page.
    • Annotation Date
      • You can search using the date an annotation was made.
      • Searches may be performed using only part of the date, e.g. 2013-06 vs 2013-06-11, but note that since all annotations to a given sentence are stored together, if you also made an annotation to that same sentence on a different date, 2013-08-12, then you'll see that annotation as well.
    • Component
      • You can search files using a term from the Textpresso Cellular Component category, e.g. nuclear.
    • GO Term
      • You can search files using the term string of a GO term.
      • This search will return GO terms used for annotation as well as GO terms suggested by previous curation.

Making Annotations and Classifying Sentences

  • To make annotations, first load one or more source files by selecting them from the source list and clicking on Search.
  • Each matching Textpresso sentence is displayed in the curation form, along with the title and abstract of the corresponding paper.
  • Terms in the sentence that matched the Textpresso categories are color-coded:
    • Blue = Gene Product
    • Red/Brown = Textpresso Cellular Component
    • Orange = Assay Terms
    • Green = Verbs
  • Curators have the option to either make an annotation from the sentence or to otherwise classify that sentence.
  • Note that you don't have to classify sentences, but once you do these sentences will be considered curated in the same way as sentences from which you've made an annotation.
  • To make an annotation, select one or more gene products from the list of entities in the left-most column.
    • Each entity is mapped to its corresponding MOD and UniProt identifiers using the information in the gpi file. If more than one UniProt identifier exists, then curators have the option to select as many as they need for annotation.
  • The second column contains the Textpresso component terms as identified in the sentences mapped to suggested GO terms based on previous curation.
    • If there is a suitable GO annotation in this list, you can select it by clicking on it.
    • If there is NOT a suitable suggested GO annotation, then the curator can enter a new one.
    • To enter a completely new GO annotation, either select a Textpresso component term from the drop-down menu, or enter a component term (if not originally recognized by Textpresso) in the free text box above the drop-down menu. Then enter a GO term by typing the text string of the term. The GO term entry is an auto-complete, so you must select a GO term from the GO file (updated nightly).
    • The default evidence code is IDA, but you can also make IPI annotations (e.g. for a protein complex) by selecting the IPI evidence code and adding a UniProtKB identifier (e.g., UniProtKB:P34688) in the 'with' box below the evidence codes.