Difference between revisions of "Training Set"

From WormBaseWiki
Jump to navigationJump to search
Line 22: Line 22:
  
  
'''092309 email'''
+
'''09/23/09 email'''
  
 
     Among the above, we have run SVM on seqchange using the papers have already firstpassed/flagged as training/testing set and  
 
     Among the above, we have run SVM on seqchange using the papers have already firstpassed/flagged as training/testing set and  

Revision as of 12:26, 4 December 2009

09/17/209 email:

    seqchange is highlighted in yellow, which means that the SVM
    performance is good judging by using those papers fristpassed. 
    I need to obtain a list of curated paper IDs to validate the results.
Data Type ID_Num_0709 Methdos Ids of curated paper Flag email (from first pass form)
seqchange 1000 SVM need genenames at wormbase dot org


09/23/09 email

    Among the above, we have run SVM on seqchange using the papers have already firstpassed/flagged as training/testing set and 
    the performance is looking pretty good (recall/precision > 0.9). The next step is to see whether we have enough number of
    curated seqchage papers to use as training set (normally >400 papers), and if there is no enough curated papers, we'll need to 
    check how clean these flagged seachange papers are, i.e., the datacurator for seqchange would have to check ~ 20 randomly     
    selected papers among them.  The datatype curator for seqchange may also want to validate the SVM resutls for a few runs on the    
    new coming IDs so to provide quick feedback and we could see how we could improve the performance.



Back to Caltech documentation