WormBase-Caltech Weekly Calls September 2012

From WormBaseWiki
Jump to navigationJump to search

September 6, 2012

Canopus

  • Will become a server (heavy outside access)
  • Raymond will take precautions to restore properly; should take a week or less
  • Used for:
    • Picture curation
    • Virtual worm web page & FTP


Grant Writing

  • Approach & future plans - do we need to discuss and add more to proposal?
  • Virtual worm access to data
  • Browsing data
    • Browse genes/proteins by class/GO annotation
  • Intermine queries
    • Queries can be saved and used again later (available to other users)
    • Results can be stored and displayed on web pages
  • Displaying large scale expression (microarray) data (from SPELL)
    • SPELL data currently stored in relational (MySQL?) form
    • We could extend the WormBase web app to interact with SPELL data
    • RESTful-compliant data serving would be best


Migration away from ACEDB?

  • Scalability - ACEDB does not scale to genome-scale studies
  • We (EBI & OICR) spend a lot of time building ACEDB database
  • ACEDB not actually used so much
  • Makes more sense to adopt a more central database
  • Migration would take place over entire grant cycle
  • We should continue extending models as we see fit, in the meantime


Massmails

  • Discuss again with Paul about Worm Breeder's Gazette and WormBook.


September 13, 2012

Grant

  • Two things to work on:
    • 1) Metrics and quality control
      • Metrics: Transparent about our coverage (e.g. "We estimate XXX many RNAi experiments/papers and we have ~62% coverage")
        • SVM form (in development) vs Curation Status form (needs work/update); can generate numbers for coverage to display on the web?
        • What is the vision of a new Curation Status form? (Daniela will write something)
        • Stats with new SVM results on pre-2000 papers: state them in grant as pre-2000 corpus
      • Quality Control: How do we demonstrate that our curation is correct/accurate?
        • Sampling First-pass curation (false negative estimates)
        • WormBase person would check on case-by-case basis the accuracy of data (e.g. expression pattern, RNAi experiment, etc.); what percent do you need to check to ensure good quality control? Blind/independent re-curation? Collection of facts to use for metrics
        • Contact senior author on a paper to check the quality of data from a paper; experts for a process; prioritize based on who responds to author-first-pass
        • Data-type specific issues
        • Internal consistency checks
        • Ontology based curation checks (granularity; branch-consistency)
    • 2) Training and user interaction
      • Screencasts and FAQs list (Chris will write some sentences)
      • User guide status? Currently needs major revamp and new information (old Wiki User Guide is now obsolete)
      • Online chat: has been broken for months; should it only be allowed for logged-in users?
  • Daily updates with low-connectivity data (not relying on sequence mapping) from PostgreSQL database?
    • Have different CGI forms to populate the data on the website?
  • Are we committed to migrating away from ACEDB? Who then is responsible for managing data models?
    • Migrate to Chado? Do we need detailed plan for this?
    • FlyBase has migrated; they like Chado, but the migration was painful


Worm Breeder's Gazette & WormBook outreach

  • Mass mailings should be an Opt-in system, as opposed to blanket mass mailings
  • One mass mail for each to ask about Opting-in to e-mail alerts?
  • Regional & International worm meetings; don't hand all e-mails over to meeting organizers.


Itai Yanai expression datasets

  • Including images from microarray expression data directly from paper for now
  • Plan is to generate time-course expression data 'on-the-fly' later on so that other data sources can be compared and included in the same graph, given that they are the same type of data
  • Eventually, the paper images will be superceded by the new, calculated 'on-the-fly' images


Phenotype Images

  • We were asked if we wanted to join a consortium that is collecting images portraying different phenotypes
  • We may in the future, but not right now; we would need to develop a method for collecting and curating such images


September 20, 2012

Canopus back online

  • Cron job for pulling pictures; need to make sure it still works
  • Raymond lifted Todd's directories over to new hard drive
  • /usr/local gone; Raymond did not restore
  • Todd was pulling from somewhere in /usr/local (/usr/local/wormbase/image ?)
  • Pictures in OICR directory
  • Raymond will send directory to Todd


Molecule curation

  • "Drug" vs. "Molecule"
  • Synthetic vs Non-Synthetic?
  • Endogenous vs. Exogenous?
  • Focus of study
  • Textpresso categories including some molecules (e.g. 'glucose') throw off predictions for papers having 'drugs'


Remote backup storage

  • Juancarlos suggesting storing some data/files remotely
  • Amazon will back up 1 GB data 12.5 cents/month; Essentially free to upload; Pay for download (12 cents/GB)
  • Data to store: Postgres data (for example)
  • Data could be encrypted, if we are concerned


Curation Prioritization

  • Curating papers by process or topic
    • Reinstate some Textpresso categorization of papers by topic/process
    • Tied to pathway curation (LEGO and WikiPathways)
  • Prioritization of curation for genes that currently have no annotations


September 27, 2012

Microarrays from other species

  • Storing data?
  • Will keep in a separate database (separate from C. elegans)
  • Curating Itai Yanai's paper (5 species)
  • Only found 4 microarray data sets from C. briggsae (from GO search) including Yanai paper; very few from other species so far
  • There may be more microarray data on non-Caenorhabditis species (parasites, etc.)
  • No way, currently, to inter-compare data sets (across species)


Yanai data set

  • Everything in sandbox (images and data)


Process Pages

  • Marc Perry is working on generating the pages
  • Newer data not available on website yet
  • WikiPathway for male mating (Karen made); Marc made mock page and put it up
  • Interactive pages; can click on entities
  • Process search not up and running yet
  • Can we get WormBase data into WikiPathways programatically? Not yet
  • Can use GenMAPP files
  • Wikipathways has node/edge hyperlinks
  • Figure out a way to pull data (programatically) from WormBase process
    • Create and place GPML objects spaced evenly on a page
  • Process pages have lists of: genes, cells, molecules, microarray expression clusters
  • Process pages are a good home for expression clusters
  • Process page: Summary, WikiPathways widget, lists
  • Phenotype to Process mapping of objects
  • Big-picture mapping of pathways (top down approach)
  • LEGO will become the bottom up approah
  • Would be good to capture pathway connections/structure in the ACEDB data model


GO Meeting @ Caltech


PWM data

  • Gary Stormo paper coming out with lots of new PWM data (5000 binding sites)
  • Genome wide conserved elements (some experimentally verified)


Transcription Factor -to- Target Gene Associations

  • We need better data display methods/tools
  • ModENCODE ChIP-seq data really would benefit from conversion to digital/discrete objects/associations


Curating with respect to topic

  • Karen already doing this to some extent for processes
  • Clustering (papers) based on GO Biological Process terms