WormBase-Caltech Weekly Calls

2009 Meetings

2011 Meetings

2012 Meetings

January

September 6, 2012

Canopus

Will become a server (heavy outside access)
Raymond will take precautions to restore properly; should take a week or less
Used for:
- Picture curation
- Virtual worm web page & FTP

Grant Writing

Approach & future plans - do we need to discuss and add more to proposal?
Virtual worm access to data
Browsing data
- Browse genes/proteins by class/GO annotation
Intermine queries
- Queries can be saved and used again later (available to other users)
- Results can be stored and displayed on web pages
Displaying large scale expression (microarray) data (from SPELL)
- SPELL data currently stored in relational (MySQL?) form
- We could extend the WormBase web app to interact with SPELL data
- RESTful-compliant data serving would be best

Migration away from ACEDB?

Scalability - ACEDB does not scale to genome-scale studies
We (EBI & OICR) spend a lot of time building ACEDB database
ACEDB not actually used so much
Makes more sense to adopt a more central database
Migration would take place over entire grant cycle
We should continue extending models as we see fit, in the meantime

Massmails

Discuss again with Paul about Worm Breeder's Gazette and WormBook.

September 13, 2012

Grant

Two things to work on:
- 1) Metrics and quality control
  - Metrics: Transparent about our coverage (e.g. "We estimate XXX many RNAi experiments/papers and we have ~62% coverage")
    - SVM form (in development) vs Curation Status form (needs work/update); can generate numbers for coverage to display on the web?
    - What is the vision of a new Curation Status form? (Daniela will write something)
    - Stats with new SVM results on pre-2000 papers: state them in grant as pre-2000 corpus
  - Quality Control: How do we demonstrate that our curation is correct/accurate?
    - Sampling First-pass curation (false negative estimates)
    - WormBase person would check on case-by-case basis the accuracy of data (e.g. expression pattern, RNAi experiment, etc.); what percent do you need to check to ensure good quality control? Blind/independent re-curation? Collection of facts to use for metrics
    - Contact senior author on a paper to check the quality of data from a paper; experts for a process; prioritize based on who responds to author-first-pass
    - Data-type specific issues
    - Internal consistency checks
    - Ontology based curation checks (granularity; branch-consistency)
- 2) Training and user interaction
  - Screencasts and FAQs list (Chris will write some sentences)
  - User guide status? Currently needs major revamp and new information (old Wiki User Guide is now obsolete)
  - Online chat: has been broken for months; should it only be allowed for logged-in users?
Daily updates with low-connectivity data (not relying on sequence mapping) from PostgreSQL database?
- Have different CGI forms to populate the data on the website?
Are we committed to migrating away from ACEDB? Who then is responsible for managing data models?
- Migrate to Chado? Do we need detailed plan for this?
- FlyBase has migrated; they like Chado, but the migration was painful

Worm Breeder's Gazette & WormBook outreach

Mass mailings should be an Opt-in system, as opposed to blanket mass mailings
One mass mail for each to ask about Opting-in to e-mail alerts?
Regional & International worm meetings; don't hand all e-mails over to meeting organizers.

Itai Yanai expression datasets

Including images from microarray expression data directly from paper for now
Plan is to generate time-course expression data 'on-the-fly' later on so that other data sources can be compared and included in the same graph, given that they are the same type of data
Eventually, the paper images will be superceded by the new, calculated 'on-the-fly' images

Phenotype Images

We were asked if we wanted to join a consortium that is collecting images portraying different phenotypes
We may in the future, but not right now; we would need to develop a method for collecting and curating such images

September 20, 2012

Canopus back online

Cron job for pulling pictures; need to make sure it still works
Raymond lifted Todd's directories over to new hard drive
/usr/local gone; Raymond did not restore
Todd was pulling from somewhere in /usr/local (/usr/local/wormbase/image ?)
Pictures in OICR directory
Raymond will send directory to Todd

Molecule curation

"Drug" vs. "Molecule"
Synthetic vs Non-Synthetic?
Endogenous vs. Exogenous?
Focus of study
Textpresso categories including some molecules (e.g. 'glucose') throw off predictions for papers having 'drugs'

Remote backup storage

Juancarlos suggesting storing some data/files remotely
Amazon will back up 1 GB data 12.5 cents/month; Essentially free to upload; Pay for download (12 cents/GB)
Data to store: Postgres data (for example)
Data could be encrypted, if we are concerned

Curation Prioritization

Curating papers by process or topic
- Reinstate some Textpresso categorization of papers by topic/process
- Tied to pathway curation (LEGO and WikiPathways)
Prioritization of curation for genes that currently have no annotations

September 27, 2012

Microarrays from other species

Storing data?
Will keep in a separate database (separate from C. elegans)
Curating Itai Yanai's paper (5 species)
Only found 4 microarray data sets from C. briggsae (from GO search) including Yanai paper; very few from other species so far
There may be more microarray data on non-Caenorhabditis species (parasites, etc.)
No way, currently, to inter-compare data sets (across species)

Yanai data set

Everything in sandbox (images and data)

Process Pages

Marc Perry is working on generating the pages
Newer data not available on website yet
WikiPathway for male mating (Karen made); Marc made mock page and put it up
Interactive pages; can click on entities
Process search not up and running yet
Can we get WormBase data into WikiPathways programatically? Not yet
Can use GenMAPP files
Wikipathways has node/edge hyperlinks
Figure out a way to pull data (programatically) from WormBase process
- Create and place GPML objects spaced evenly on a page
Process pages have lists of: genes, cells, molecules, microarray expression clusters
Process pages are a good home for expression clusters
Process page: Summary, WikiPathways widget, lists
Phenotype to Process mapping of objects
Big-picture mapping of pathways (top down approach)
LEGO will become the bottom up approah
Would be good to capture pathway connections/structure in the ACEDB data model

GO Meeting @ Caltech

GO Wiki has details

PWM data

Gary Stormo paper coming out with lots of new PWM data (5000 binding sites)
Genome wide conserved elements (some experimentally verified)

Transcription Factor -to- Target Gene Associations

We need better data display methods/tools
ModENCODE ChIP-seq data really would benefit from conversion to digital/discrete objects/associations

Curating with respect to topic

Karen already doing this to some extent for processes

WormBase-Caltech Weekly Calls

Contents

2012 Meetings

September 6, 2012

September 13, 2012

September 20, 2012

September 27, 2012

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools