Training Set

09/17/209 email:

    seqchange is highlighted in yellow, which means that the SVM
    performance is good judging by using those papers fristpassed. 
    I need to obtain a list of curated paper IDs to validate the results.

Data Type	ID_Num_0709	Methdos	Ids of curated paper	Flag email (from first pass form)
seqchange	1000	SVM	need	genenames at wormbase dot org

09/23/09 email

    Among the above, we have run SVM on seqchange using the papers have already firstpassed/flagged as training/testing set and 
    the performance is looking pretty good (recall/precision > 0.9). The next step is to see whether we have enough number of
    curated seqchage papers to use as training set (normally >400 papers), and if there is no enough curated papers, we'll need to 
    check how clean these flagged seachange papers are, i.e., the datacurator for seqchange would have to check ~ 20 randomly     
    selected papers among them.  The datatype curator for seqchange may also want to validate the SVM resutls for a few runs on the    
    new coming IDs so to provide quick feedback and we could see how we could improve the performance.

October 21, 2009

    To try to improve precision, we constructed a training set using curated papers gleaned from the Evidence hash in the Type_of_mutation tag in the ?Variation model.

Random Selection of New Training Set Papers

Back to WormBase SVMs

Back to Caltech documentation

Training Set

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools