Training Set

From WormBaseWiki
Jump to navigationJump to search

09/17/209 email:

    seqchange is highlighted in yellow, which means that the SVM
    performance is good judging by using those papers fristpassed. 
    I need to obtain a list of curated paper IDs to validate the results.
Data Type ID_Num_0709 Methdos Ids of curated paper Flag email (from first pass form)
seqchange 1000 SVM need genenames at wormbase dot org


09/23/09 email

    Among the above, we have run SVM on seqchange using the papers have already firstpassed/flagged 
    as training/testing set and the performance is looking pretty good (recall/precision > 0.9). The 
    next step is to see whether we have enough number of curated seqchage papers to use as training set 
    (normally >400 papers), and if there is no enough curated papers, we'll need to check how clean 
    these flagged seachange papers are, i.e., the datacurator for seqchange would have to check ~ 20 
    randomly selected papers among them.  The datatype curator for seqchange may also want to validate 
    the SVM resutls for a few runs on the new coming IDs so to provide quick feedback and we could see 
    how we could improve the performance.


October 21, 2009

    To try to improve precision, we constructed a training set using curated papers gleaned from the 
    Evidence hash in the Type_of_mutation tag in the ?Variation model.


December 7, 2009

    Checking the new training set:  Random Selection of New Training Set Papers



Back to WormBase SVMs

Back to Caltech documentation