WormBase SVMs

From WormBaseWiki
Revision as of 20:42, 28 June 2012 by Vanaukenk (talk | contribs)
Jump to navigationJump to search

Back to Caltech documentation


Storing SVM Results in postgres on tazendra

What to store

  • SVM prediction
    • Positive - high confidence
    • Positive - medium confidence
    • Positive - low confidence
    • Negative
  • Paper ID
  • Data type
    • antibody
    • geneint
    • geneprod_GO
    • genereg
    • newmutant
    • otherexpr
    • overexpr
    • rnai
    • seqchange
    • structcorr
  • SVM version
  • Date SVM was performed? I don't know if this is critical or not.
  • Curator assessment
    • Positive
    • Negative
  • Curator ID
  • Curator comment - would refer to the result (i.e., paper, data type, SVM prediction, curator assessment, SVM version)
    • Curator comment will be tied to curator assessment
    • Curator comments will use controlled vocabulary and drop-down list
  • Timestamp
  • Do we also want to store if a paper was used in an SVM training set?

How to populate

  • SVM results could come directly from output (talk with Yuling)
  • Curator assessment will depend on the data type and where the curated data is stored
    • Data curated into postgres
      • If data type is curated in postgres, then curation of an SVM positive would indicate curator positive (i.e., true positive)
      • If data type is curated in postgres, then curation of an SVM negative would indicate curator positive (i.e., false negative)
      • If the data is curated in postgres, though, distinguishing false positives and true negatives will require curators to specifically mark papers as such. There will be no way to tell the difference between a paper that just hasn't been curated yet (in the to-do pile) from a false positive if a curator doesn't specifically mark that paper as a false positive. Likewise, predicted negatives can only be marked as true negatives if a curator has looked at the paper and confirmed the negative prediction. In practice, most of the predicted negatives will likely stay as predictions without curator assessment.
  • Data not curated into postgres (e.g., geneace, BioGRID)
    • In theory, this data could be populated from WS or the file we'll get from BioGRID.
    • In practice, I think what we do for these data types will depend on what the individual curators for these data types want to do.
    • One option would be to populate from a web form, i.e. curators would have check boxes for individual paper IDs or large boxes to upload lists of papers according to their classification. For this option, we would like to display all four classifications: True Positive, False Positive, True Negative, False Negative. The reason for doing this is really just pragmatic and has to do with how curators are thinking about the papers while they're looking at the SVM results.

How to visualize

  • We will want to visualize data on a web display
    • Curators may want the option to search for a data type, a paper ID, the time period over which the papers were classified (i.e., the SVM dates that we're now all used to seeing, e.g., 051812_042012_antibody)
    • Curators may want the ability to filter results, if possible
    • Curators may want the ability to sort particular "columns", if possible
    • If a paper has been tested by several different SVM models, then I think we would want the option to see the results from the different models?

How to query

  • This refers to how to easily get Yuling the data he needs.
    • sql queries?
    • Download feature of the web form?

geneprod_GO

Training Set

112009_geneprod_GO

Overall_performance

seqchange and genesymbol

Training Set

Test_Set_Results


Overall_performance


Back to Caltech documentation