WormBase-Caltech Weekly Calls October 2011

From WormBaseWiki
Jump to navigationJump to search

October 6, 2011

SVM

  • SVM Manuscript in process of resubmission
  • Messy procedures in place; needs to be cleaned up
  • Yuling able to get pipeline running from beginning to the end
  • May take longer than previously expected to get up and running
  • FlyBase has code running for SVM
  • Yuling should contact FlyBase to make sure they're OK, once he has WormBase SVM working
  • SGD would like to incorporate SVM into they're database
  • Yuri would work on SGD SVM, ultimately


Physical interaction model

  • Close to done
  • Once model approved, work on:
    • Converting BioGRID file to the ACE model
    • Convert existing YH data into new model


Image curation

  • Need to establish legal process for acquiring images
  • Contact Caltech Office of the General Counsel for approval


WORM Paper

  • Karen working on it
  • Need a clearer/new plan


Expression pattern display

  • Showing composite images with the option to see greater detail
  • Separate out detail images according to Qualifier tags (Certain, Uncertain, Partial)
  • Render oblique angles of cells/tissues to show 3D features/morphology of anatomy objects and, possibly, animations
  • "Real" images should always take precedence to "virtual" images
  • May talk to John Murray or Bill Mohler for confocal images of embryos


Worm Method (Worm Book)

  • Any new web pages for C. elegans researchers, send to Raymond
  • URL-finder for neuroscience (Yuling)


Textpresso

  • Working on Fly Textpresso site (and others)
  • Rose Oughtred would like Wnt Textpresso tools
  • PDF tools (URL extractor)
  • SVM re-working
  • Arun working on GSA editor


Grants

  • Need to think about next grant in a couple of months (winter); due in October 2012? 30 pages, short?
  • Focus on writing papers (anatomy function, WORM, virtual worm, etc.)
  • Focus on human-relevance
  • Need Gantt charts and quarterly statements of progress
  • Stats
    • Where are we with each data type
    • Status of website
    • Automation
    • What is different in the field?
    • How is data changing?


October 13, 2011

WORM Publiction

  • Looks good overall
  • Change "library" of papers to "index"
  • Putative deadline of November 15th
  • Put in more details about the new website?
  • Figures? Screenshots from new website? Gene page features?
  • Save other (detailed) content for future papers


Paper pipeline

  • Need to fix some PubMed IDs
  • Over 900 articles in Postgres do not have a PubMed ID; Why? How to fix?
  • Could some aspects of the pipeline be accomplished by a non-PhD?
  • Supplemental hires? Students?
  • What tasks would need to be done?
  • Can these tasks be scripted/automated? Community input?


SVM

  • Almost all scripts working
  • Output of SVM slightly different from Ruihua's results
  • Need to resolve the differences
  • Data inconsistencies
  • Need to spot check the papers to see what the major problems might be
  • Yuling ~90% confident in his results
  • Get feedback from relevant/interested curator(s)
  • One-day CPU/computation time (~30 hours for 70 papers with 9 models)


NAR Publication Accepted


Genetic Interaction Curation

  • Sharing curated genetic interactions with BioGRID
  • What tools will each database use and how will we share data?
    • Both BioGRID and WormBase will use the IMS tool at BioGRID for physical interactions
    • WormBase will use the Ontology Annotator for genetic interactions
    • BioGRID may still use IMS for genetic interactions, but the format will have to be parsed to populate Postgres
  • BioGRID-curated genetic interactions can be flagged as such for later review


Physical Interaction model

  • Kimberly and Paul Davis discussing
  • XREF questions
    • Paul would like to minimize the number of XREFs in the model
    • Which XREFs can we remove?

Elsevier Linking

  • Establishing pipeline for linking Science Direct papers to Wormbase and viceversa


October 20, 2011

Data Submission Working Group

  • One representative from each site
  • Raymond from Caltech, with Juancarlos
  • Have a discussion for every type
    • How are we doing?
    • Are we up to date?
    • Do we want unpublished data?
    • Quality control
    • Data type priority?
  • Groups/people that have a lot of data of a particular type; acquire their data
    • How can we facilitate data submission from these groups/individuals?
    • Ultimately like to train these users/submitters to use curator tools
  • Form-filling as part of publication process?
  • What data types are we missing? Wiki Pages
    • qPCR?
    • Nanostring?
    • Single molecule studies, absolute quantities?
    • SAGE?
    • 3C (Chromosome Conformation Capture), 4C, 5C
    • Metabalome?
    • Pathways/Processes?
    • Drug/disease interactions?
    • Infections?
    • Examples of C. elegans as a model?
  • Expression data
    • How do annotate a presence/absence call on expression of a given gene?
    • rpkm cutoff?
    • Number of molecules?
    • Case-by-case thresholds
  • Disease vs Process pages?
    • 100 diseases modeled in C. elegans?
    • Flat-file capture and display of disease-relevance information (as opposed to ontology-based)
  • "Subject" vs. "Process" vs. "Disease"
    • Examples of "Subject": Sewage treatment, Biofilm formation, etc.
    • Examples of "Disease": Alzheimer's, Huntington's, Cancer, etc.
    • Examples of "Process": Development, Innate immunity, Organ formation, etc.
    • User experience?
  • User-submitted data inconsistency/discrepancy flag?


WORM Paper

  • Still requesting any major comments or content changes
  • "Final" version available next week


C. elegans Research Resources Chapter (WormBook)

  • Yahoo-like Yellow pages of useful URLs with brief descriptions of each site
  • URLs being pulled from papers by Yuling


October 27, 2011

NAR Proofs are in

  • Karen will send around
  • Going back out tomorrow; everyone check
  • Add Arun as author


Predicted Gene Interactions

  • Wei wei contacted Paul
  • Who is the contact person? Xiaodong can be point person
  • Add URLs to Gene Orienteer and Interolog database on gene pages
  • Issue is with syncing the databases and updating


SVM

  • Yuling finsished running SVM pipeline; can make run every two weeks
  • Need to learn how to best train the models; do we need to retrain?
  • Gary Williams sent e-mail; incorrect paper IDs on some SVM results
    • Gene structure correction papers have been incorrect
    • May need to retrain the gene structure correction SVM
    • Need someone at Caltech to work with Yuling to retrain the model
  • Kimberly will look through e-mail archive about the issue
  • We could really improve the training set, as original set may have been inadequate
  • What data types do we retrain: Expression pattern? Phenotype?
  • Improve training by including verified positive papers from last two years (since last training)
  • In past, needed 400 papers to train; would be best to lower requirement
  • Antibody and transgene SVM works well; maybe incorporate for Expression pattern SVM?
  • Use pattern matching to correct SVM results or vice versa?
  • Good to make progress on Gene Structure correction papers
  • Keeping track of sentences/paragraphs; performing SVM on paragraphs (vs papers)?


Collecting URLs from papers (Raymond)

  • For interesting/relevant URLs for C. elegans
  • Journal URLs are problematic
  • Yuling runs scripts on Textpresso corpus to collect URLs
  • PubMed XML of journals (Kimberly)


WORM Publication

  • Final version available
  • Review for corrections
  • To submit early next week (before Wednesday)