User Guide for Curators

From WormBaseWiki
Jump to navigationJump to search

User Guide for Textpresso for CCC Form

Where to Find the Form

  • The testing form is currently located at:
http://mangolassi.caltech.edu/~azurebrd/cgi-bin/forms/ccc/ccc.cgi
  • Once we've finished testing, any updates to the form will be transferred over to our production server.

What We Need from Groups Using the Form

  • Instructions on what papers to run through the CCC search and how often to search
  • Any filtering steps, e.g. filter against current GAF, filter after Support Vector Machine data type classification
  • A gpi (gene product information) file as specified by the GOC: gpad and gpi file format specifications. This allows us to map all gene products mentions in a paper to a MOD identifier and to a UniProtKB identifier so we can send annotations via web services to UniProt's Protein2GO tool.

Logging In

  • Select your first name from the list of curators and click on login
  • The login page also has a button that displays the list of Textpresso Cellular Component category terms and their mapping, based on previous curation, to a GO term and GO ID.
  • Previously entered GO terms that don't map to any existing GO term are highlighted in red. Note that the Component - GO term index is based on all previous curation data from the old form that didn't restrict how GO terms were entered. The new form uses an autocomplete to add GO terms which should avoid any future errors in GO term entry.

Loading Files for Curation and Searching Files

  • Once a curator has logged in, they will be taken to the curation page with options for searching and loading files and annotations.
  • Search options may be used singly or in combination.
  • Search options are (in the order they appear on the form):
    • Sentence-curation - this allows you to search sentences files and exclude curated or noncurated sentences as needed
    • Papers to show - this allows you to set a limit on the number of papers that are returned for any given search. The default value is 10.
    • Source - this lists the names of files generated by the Textpresso search for curation. File names have a standardized name: date of search_MOD_type of search, e.g. 20130426_WB_ccc
      • Search or load one file - click on that file name
      • Search or load consecutive files - click on the first file name, hold down the shift key, press the last file name
      • Search or load non-consecutive files - click on the first file name, hold down the command key, click on the next file name
    • Gene product - this allows you to search sentences for a gene product. Searches are case-insensitive and you can search using:
      • the gene product name
      • the MOD ID
      • the UniProtKB ID, either the ID in full or just the six-digit accession (e.g., UniProtKB:P46549 or P46549)
    • Paper
      • Papers can be searched by the PMID using either the full identifier or just the number (e.g., PMID:23064028 or 23064028)
      • In the future we can add the capability to also search using a doi
    • Annotation Curator
      • You can search for any annotations made by a specific curator using their login name as shown on the login page.
    • Annotation Date
      • You can search using the date an annotation was made.
      • Searches may be performed using only part of the date, e.g. 2013-06 vs 2013-06-11, but note that since all annotations to a given sentence are stored together, if you also made an annotation to that same sentence on a different date, 2013-08-12, then you'll see that annotation as well.
    • Component
      • You can search files using a term from the Textpresso Cellular Component category, e.g. nuclear.
    • GO Term
      • You can search files using the term string of a GO term.
      • This search will return GO terms used for annotation as well as GO terms suggested by previous curation.
      • Note that the only way a GO term can become associated with a sentence is through the component-GO term index that is created during curation. Therefore, the only GO term searches that will be successful are those that identify sentences that have an associated component-GO term either through previous curation or suggested curation via the index.

Making Annotations, Deleting and Editing Annotations, Classifying Sentences, Commenting on Sentences

Making Annotations

  • To make annotations, first load one or more source files by selecting them from the source list and clicking on Search.
  • Each matching Textpresso sentence is displayed in the curation form, along with the title and abstract of the corresponding paper.
  • Terms in the sentence that matched the Textpresso categories are color-coded:
    • Blue = Gene Product
    • Red/Brown = Textpresso Cellular Component
    • Orange = Assay Terms
    • Green = Verbs
  • Curators have the option to either make an annotation from the sentence or to otherwise classify that sentence.
  • Note that you don't have to classify sentences, but once you do these sentences will be considered curated in the same way as sentences from which you've made an annotation.
  • To make an annotation, select one or more gene products from the list of entities in the left-most column.
    • Each entity is mapped to its corresponding MOD and UniProt identifiers using the information in the gpi file. If more than one UniProt identifier exists, then curators have the option to select as many as they need for annotation.
  • Note that once you select an entity to curate in the left-most column a new line for annotation automatically appears. This allows curators to make as many annotations as needed for a given sentence.
  • The second column contains the Textpresso component terms as identified in the sentences mapped to suggested GO terms based on previous curation.
    • If there is a suitable GO annotation in this list, you can select it by clicking on it.
    • If there is NOT a suitable suggested GO annotation, then the curator can enter a new one.
    • To enter a completely new GO annotation, either select a Textpresso component term from the drop-down menu, or enter a component term (if not originally recognized by Textpresso) in the free text box above the drop-down menu. Then enter a GO term by typing the text string of the term. The GO term entry is an auto-complete, so you must select a GO term from the GO file (updated nightly).
    • The default evidence code is IDA, but you can also make IPI annotations (e.g. for a protein complex) by selecting the IPI evidence code and adding a UniProtKB identifier (e.g., UniProtKB:P34688) in the 'with' box below the evidence codes.
  • Once you are ready to make your annotations, click on the Submit button at the beginning of that paper's entries and your annotations will be sent to UniProt GOA's Protein2GO tool as well as saved in a local curation database.
  • If there were any errors with your submission to UniProt, they will display in red at the top of the page and, depending on the error, you can either try the annotation again or information may need to be fixed at the source (e.g., the UniProtKB identifier).

Editing and Deleting Annotations

  • If you need to change an annotation, you first need to delete it from the database by checking the delete button at the right end of the annotation line and then clicking on submit.
  • You can then re-enter an annotation with new or changed information.
  • If you need to delete an annotation, simply check the delete box and click on submit.
  • Note that web services for Protein2GO only allows for inserting annotations, not deleting them, so if you delete an annotation using the CCC form, then you will also need to delete it in Protein2GO.

Classifying Sentences

  • To help make improvements to the specificity of the Textpresso searches, curators can classify sentences not otherwise used for GO curation.
  • Sentence classification is optional.
  • There are four sentence classifications and sentences can be assigned multiple classifications, if needed.
    • False positive - These sentences have nothing to do with protein localization.
      • Example: Because previous studies have indicated important roles of ITR-1 and UNC-68 in regulating intracellular calcium release (Maryon et al., 1996; Sakube et al., 1997; Clandinin et al., 1998; Baylis et al., 1999; Dal Santo et al., 1999; Busch et al., 2012), these results suggest that intracellular calcium stores are not likely to be the main source for the sensory-evoked calcium transients in ADF.
    • Positive for localization, but not for GO - These sentences might describe localization in a mutant background.
      • Example: Loss of gex-3 or arp-2 via RNAi resulted in reduced basolateral enrichment of AQP-1 (Figure 1D).
    • Run-on sentence - These are several sentences strung together in one entry. Run-on sentences are often false positives, as well.
      • Example: The transcriptional up-regulation of a glutathione biosynthesis gene, gcs-1, is a hallmark of phase II detoxification under the control of SKN-1 in C. elegans (Kell et al., 2007; Park et al., 2009 observations that knockdown of skn-1, gcs-1 and ugt-22 leads to increased sterility in the presence of dietary DGLA supports the model that sterility induced by dietary DGLA may occur through a toxic lipid metabolite that requires activation of detoxification systems for removal. Dietary DGLA disrupts germ cell membranes Previously we demonstrated that dietary DGLA leads to the loss of germ cell nuclei and increased apoptosis.
    • Scrambled sentence - These are fragments of different sentences and/or Figures and Tables that have been consolidated into one Textpresso sentence. Scrambled sentences tend to happen more often in older papers, where the PDF-to-Text conversion is not as robust.
      • Example: We next determined whether mutations in tax-6 / cnb-1 could bypass the requirement for KIN-29 in the regulation of str-1Hgfp expression . lf mutations in both tax-6 and % dauers at 25C Mean body length ( mm ) A 0 20 40 60 80 100 B WT kin-29 mef-2 hda-4 mef-2 ; kin-29 mef-2 ; kin-29 ; Ex [ gfp-tagged mef-2 genomic ] mef-2 ; kin-29 ; Ex [ unc-14 : : mef-2 ] mef-2 ; kin-29 ; Ex [ odr-4 : : mef-2 ] mef-2 ; kin-29 ; Ex [ lim-4 : : mef-2 ] kin-29 hda-4 ; Ex [ gfp-tagged hda-4 genomic ] kin-29 hda-4 ; Ex [ odr-4 : : hda-4 : : gfp ] kin-29 hda-4 ; Ex [ lim-4 : : hda-4 ] kin-29 hda-4 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 0 ND ND ND ND ND ND * * * * * * * * * * * * * * Figure 2 lf mutations in mef-2 and hda-4 suppress the decreased body size and pheromone hypersensitivity phenotypes of kin-29 mutants .

Commenting on Sentences

  • There is a box below the sentences classifications where curators can leave free text comments about a particular sentence.
  • Only the most recent comment will be stored, so if you want to preserve an early comment, just append the new text.

Back to WormBase