WormBase-Caltech Weekly Calls
From WormBaseWiki
Contents
2012 Meetings
September 6, 2012
Canopus
- Will become a server (heavy outside access)
- Raymond will take precautions to restore properly; should take a week or less
- Used for:
- Picture curation
- Virtual worm web page & FTP
Grant Writing
- Approach & future plans - do we need to discuss and add more to proposal?
- Virtual worm access to data
- Browsing data
- Browse genes/proteins by class/GO annotation
- Intermine queries
- Queries can be saved and used again later (available to other users)
- Results can be stored and displayed on web pages
- Displaying large scale expression (microarray) data (from SPELL)
- SPELL data currently stored in relational (MySQL?) form
- We could extend the WormBase web app to interact with SPELL data
- RESTful-compliant data serving would be best
Migration away from ACEDB?
- Scalability - ACEDB does not scale to genome-scale studies
- We (EBI & OICR) spend a lot of time building ACEDB database
- ACEDB not actually used so much
- Makes more sense to adopt a more central database
- Migration would take place over entire grant cycle
- We should continue extending models as we see fit, in the meantime
Massmails
- Discuss again with Paul about Worm Breeder's Gazette and WormBook.
September 13, 2012
Grant
- Two things to work on:
- 1) Metrics and quality control
- Metrics: Transparent about our coverage (e.g. "We estimate XXX many RNAi experiments/papers and we have ~62% coverage")
- SVM form (in development) vs Curation Status form (needs work/update); can generate numbers for coverage to display on the web?
- What is the vision of a new Curation Status form? (Daniela will write something)
- Stats with new SVM results on pre-2000 papers: state them in grant as pre-2000 corpus
- Quality Control: How do we demonstrate that our curation is correct/accurate?
- Sampling First-pass curation (false negative estimates)
- WormBase person would check on case-by-case basis the accuracy of data (e.g. expression pattern, RNAi experiment, etc.); what percent do you need to check to ensure good quality control? Blind/independent re-curation? Collection of facts to use for metrics
- Contact senior author on a paper to check the quality of data from a paper; experts for a process; prioritize based on who responds to author-first-pass
- Data-type specific issues
- Internal consistency checks
- Ontology based curation checks (granularity; branch-consistency)
- Metrics: Transparent about our coverage (e.g. "We estimate XXX many RNAi experiments/papers and we have ~62% coverage")
- 2) Training and user interaction
- Screencasts and FAQs list (Chris will write some sentences)
- User guide status? Currently needs major revamp and new information (old Wiki User Guide is now obsolete)
- Online chat: has been broken for months; should it only be allowed for logged-in users?
- 1) Metrics and quality control
- Daily updates with low-connectivity data (not relying on sequence mapping) from PostgreSQL database?
- Have different CGI forms to populate the data on the website?
- Are we committed to migrating away from ACEDB? Who then is responsible for managing data models?
- Migrate to Chado? Do we need detailed plan for this?
- FlyBase has migrated; they like Chado, but the migration was painful
Worm Breeder's Gazette & WormBook outreach
- Mass mailings should be an Opt-in system, as opposed to blanket mass mailings
- One mass mail for each to ask about Opting-in to e-mail alerts?
- Regional & International worm meetings; don't hand all e-mails over to meeting organizers.
Itai Yanai expression datasets
- Including images from microarray expression data directly from paper for now
- Plan is to generate time-course expression data 'on-the-fly' later on so that other data sources can be compared and included in the same graph, given that they are the same type of data
- Eventually, the paper images will be superceded by the new, calculated 'on-the-fly' images
Phenotype Images
- We were asked if we wanted to join a consortium that is collecting images portraying different phenotypes
- We may in the future, but not right now; we would need to develop a method for collecting and curating such images
September 20, 2012
Canopus back online
- Cron job for pulling pictures; need to make sure it still works
- Raymond lifted Todd's directories over to new hard drive
- /usr/local gone; Raymond did not restore
- Todd was pulling from somewhere in /usr/local (/usr/local/wormbase/image ?)
- Pictures in OICR directory
- Raymond will send directory to Todd
Molecule curation
- "Drug" vs. "Molecule"
- Synthetic vs Non-Synthetic?
- Endogenous vs. Exogenous?
- Focus of study
- Textpresso categories including some molecules (e.g. 'glucose') throw off predictions for papers having 'drugs'
Remote backup storage
- Juancarlos suggesting storing some data/files remotely
- Amazon will back up 1 GB data 12.5 cents/month; Essentially free to upload; Pay for download (12 cents/GB)
- Data to store: Postgres data (for example)
- Data could be encrypted, if we are concerned
Curation Prioritization
- Curating papers by process or topic
- Reinstate some Textpresso categorization of papers by topic/process
- Tied to pathway curation (LEGO and WikiPathways)
- Prioritization of curation for genes that currently have no annotations
September 27, 2012
Microarrays from other species
- Storing data?
- Will keep in a separate database (separate from C. elegans)
- Curating Itai Yanai's paper (5 species)
- Only found 4 microarray data sets from C. briggsae (from GO search) including Yanai paper; very few from other species so far
- There may be more microarray data on non-Caenorhabditis species (parasites, etc.)
- No way, currently, to inter-compare data sets (across species)
Yanai data set
- Everything in sandbox (images and data)
Process Pages
- Marc Perry is working on generating the pages
- Newer data not available on website yet
- WikiPathway for male mating (Karen made); Marc made mock page and put it up
- Interactive pages; can click on entities
- Process search not up and running yet
- Can we get WormBase data into WikiPathways programatically? Not yet
- Can use GenMAPP files
- Wikipathways has node/edge hyperlinks
- Figure out a way to pull data (programatically) from WormBase process
- Create and place GPML objects spaced evenly on a page
- Process pages have lists of: genes, cells, molecules, microarray expression clusters
- Process pages are a good home for expression clusters
- Process page: Summary, WikiPathways widget, lists
- Phenotype to Process mapping of objects
- Big-picture mapping of pathways (top down approach)
- LEGO will become the bottom up approah
- Would be good to capture pathway connections/structure in the ACEDB data model
GO Meeting @ Caltech
- GO Wiki has details
PWM data
- Gary Stormo paper coming out with lots of new PWM data (5000 binding sites)
- Genome wide conserved elements (some experimentally verified)
Transcription Factor -to- Target Gene Associations
- We need better data display methods/tools
- ModENCODE ChIP-seq data really would benefit from conversion to digital/discrete objects/associations
Curating with respect to topic
- Karen already doing this to some extent for processes