WormBase SVMs

From WormBaseWiki
Jump to navigationJump to search

Storing SVM Results in postgres on tazendra

What to store

  • SVM prediction
    • Positive - high confidence
    • Positive - medium confidence
    • Positive - low confidence
    • Negative
  • Paper ID
  • Data type
    • antibody
    • geneint
    • geneprod_GO
    • genereg
    • newmutant
    • otherexpr
    • overexpr
    • rnai
    • seqchange
    • structcorr
  • SVM version
  • Date SVM was performed? I don't know if this is critical or not.
  • Curator assessment
    • Positive
    • Negative
  • Curator ID
  • Curator comment
  • Timestamp

How to populate

  • SVM results could come directly from output (talk with Yuling)
  • Curator assessment will depend on the data type and where the curated data is stored
    • Data curated into postgres
      • If data type is curated in postgres, then curation of an SVM positive would indicate curator positive (i.e., true positive)
      • If data type is curated in postgres, then curation of an SVM negative would indicate curator positive (i.e., false negative)
      • If the data is curated in postgres, though, distinguishing false positives and true negatives will require curators to specifically mark papers as such. There will be no way to tell the difference between a paper that just hasn't been curated yet (in the to-do pile) from a false positive if a curator doesn't specifically mark that paper as a false positive. Likewise, predicted negatives can only be marked as true negatives if a curator has looked at the paper and confirmed the negative prediction. In practice, most of the predicted negatives will likely stay as predictions without curator assessment.
  • Data not curated into postgres (e.g., geneace, BioGRID)
    • In theory, this data could be populated from WS or the file we'll get from BioGRID.
    • In practice, I think what we do for these data types will depend on what the individual curators for these data types want to do.
    • One option would be to populate from a web form, i.e. curators would have check boxes for individual paper IDs or large boxes to upload lists of papers according to their classification. For this option, we would need to display all four outcomes: True Positive, False Positive, True Negative, False Negative.

How to visualize

  • We will want to visualize data on a web display
    • Curators may want the option to search for a data type, a paper ID, the time period over which the papers were classified (i.e., the dates that we're now all used to seeing), and maybe filter on classification?
    • Curators may want the ability to sort based on a particular "column"
    • If a paper has been tested by several different SVM models, then we would want the option to see the different results?

How to query

  • This refers to how to easily get Yuling the data he needs.
    • sql queries?
    • Download feature of the web form?

geneprod_GO

Training Set

112009_geneprod_GO

Overall_performance

seqchange and genesymbol

Training Set

Test_Set_Results


Overall_performance


Back to Caltech documentation