Difference between revisions of "Mf hmm tool"

From WormBaseWiki
Jump to navigationJump to search
Line 29: Line 29:
 
#Curated indicates that the curator would make an annotation from that sentence.
 
#Curated indicates that the curator would make an annotation from that sentence.
  
#TP indicates that the sentence describes enzymatic or transporter activity, but that an annotation would not normally be made.
+
##TP indicates that the sentence describes enzymatic or transporter activity, but that an annotation would not normally be made.
  
#FP indicates that the sentences does not describe enzymatic or transporter activity.
+
###FP indicates that the sentences does not describe enzymatic or transporter activity.
  
 
Not Done = all sentences have blank flag.
 
Not Done = all sentences have blank flag.

Revision as of 19:54, 26 October 2010

Back to Gene Ontology

Specifications/Instructions for the mf_hmm tool

Intended Use

The mf_hmm tool is to be used to categorize sentences with respect to whether they describe enzymatic or transporter activities. The results of the categorization will be used to continually train an HMM to identify sentences for curation from new papers.

Login page

http://textpresso-dev.caltech.edu/cgi-bin/azurebrd/mf_hmm.cgi

The page is password protected.

The front page is a curator login page. This will ensure that subsequent linking out from the form to the paper in the paper editor uses the appropriate curator ID.

Select a curator name from the drop down and click on Login.

List of papers page

The next page contains a list of all papers that potentially contain positive sentences. The papers are sorted according to relative rank (i.e., combined sentence score of the HMM for that paper).

Each paper listed on this page will have one of three curation status tags: Done, Not Done, Partial

The curation status tag is based upon the selection of values (more on this below) for each sentence in the paper:

Done = all sentences have either curated, TP, or FP flag.

  1. Curated indicates that the curator would make an annotation from that sentence.
    1. TP indicates that the sentence describes enzymatic or transporter activity, but that an annotation would not normally be made.
      1. FP indicates that the sentences does not describe enzymatic or transporter activity.

Not Done = all sentences have blank flag.

Partial = a mix of curated, TP, FP, and blank.

Clicking on a paper link takes the curator to a page listing the sentences and curation options.

Curation page

The curation page lists all of the sentences from a given paper that had a score of 6 - 9 for the HMM.

TO BE CONTINUED.....

Caveats

Don't use tabs in sentences of hmm results nor comments

Don't overwrite nor delete nor make unwriteable file : papers_done

Don't overwrite nor delete nor make unwriteable WBPaper######## files that have been curated (will have a done or partial status in the papers_done file)

How code works, in general

Data in directory : /data2/srv/textpresso-dev.caltech.edu/www/docroot/michael/mfea-curation/

index.html file is where we get paper order for mainpage.

papers_done file is where we store the done / partial / <blank> status of papers looked at.

WBPaper######## are the files that have the hmm mark results. When changing this data through curation, after each sentence data is appended with tabs so that it becomes <start line><hmm><tab><status><tab><curator_id><tab><comment><end of line>

Curator is stored in format "two"<WBPersonID number> to conform with postgres format should they ever need to talk to each other.