Mf hmm tool

From WormBaseWiki
Revision as of 12:53, 8 October 2010 by Jchan (talk | contribs) (→‎Caveats)
Jump to navigationJump to search

Back to Gene Ontology

Specifications/Instructions for the mf_hmm tool

Intended Use

The mf_hmm tool is to be used to categorize sentences with respect to whether they describe enzymatic or transporter activities. The results of the categorization will be used to continually train an HMM to identify sentences for curation from new papers.

Login page

The front page will be a curator login page. This will ensure that subsequent linking out from the form to the paper in the paper editor uses the appropriate curator ID.

List of papers page

The next page contains a list of all papers that potentially contain positive sentences. The papers are sorted according to relative rank (i.e., combined sentence score of the HMM for that paper).

Each paper listed on this page will have one of three curation status tags: Done, Not Done, Partial

The curation status tag is based upon the selection of values (more on this below) for each sentence in the paper:

Done = all sentences have either curated, TP, or FP flag.

Not Done = all sentences have blank flag.

Partial = a mix of curated, TP, FP, and blank.

Clicking on a paper link takes the curator to a page listing the sentences and curation options.

Curation page

The curation page lists all of the sentences from a given paper that had a score of 6 - 9 for the HMM.

TO BE CONTINUED.....

Caveats

Don't use tabs in sentences of hmm results nor comments

Don't overwrite nor delete nor make unwriteable file : papers_done

Don't overwrite nor delete nor make unwriteable WBPaper######## files that have been curated (will have a done or partial status in the papers_done file)

How code works, in general

Data in directory : /data2/srv/textpresso-dev.caltech.edu/www/docroot/michael/mfea-curation/

index.html file is where we get paper order for mainpage.

papers_done file is where we store the done / partial / <blank> status of papers looked at.

WBPaper######## are the files that have the hmm mark results. When changing this data through curation, after each sentence data is appended with tabs so that it becomes <start line><hmm><tab><status><tab><curator_id><tab><comment><end of line>