|
|
Line 24: |
Line 24: |
| [[WormBase-Caltech_Weekly_Calls_August_2012|August]] | | [[WormBase-Caltech_Weekly_Calls_August_2012|August]] |
| | | |
− | | + | [[WormBase-Caltech_Weekly_Calls_September_2012|September]] |
− | | |
− | == September 6, 2012 ==
| |
− | | |
− | Canopus
| |
− | *Will become a server (heavy outside access)
| |
− | *Raymond will take precautions to restore properly; should take a week or less
| |
− | *Used for:
| |
− | **Picture curation
| |
− | **Virtual worm web page & FTP
| |
− | | |
− | | |
− | Grant Writing
| |
− | *Approach & future plans - do we need to discuss and add more to proposal?
| |
− | *Virtual worm access to data
| |
− | *Browsing data
| |
− | **Browse genes/proteins by class/GO annotation
| |
− | *Intermine queries
| |
− | **Queries can be saved and used again later (available to other users)
| |
− | **Results can be stored and displayed on web pages
| |
− | *Displaying large scale expression (microarray) data (from SPELL)
| |
− | **SPELL data currently stored in relational (MySQL?) form
| |
− | **We could extend the WormBase web app to interact with SPELL data
| |
− | **RESTful-compliant data serving would be best
| |
− | | |
− | | |
− | Migration away from ACEDB?
| |
− | *Scalability - ACEDB does not scale to genome-scale studies
| |
− | *We (EBI & OICR) spend a lot of time building ACEDB database
| |
− | *ACEDB not actually used so much
| |
− | *Makes more sense to adopt a more central database
| |
− | *Migration would take place over entire grant cycle
| |
− | *We should continue extending models as we see fit, in the meantime
| |
− | | |
− | | |
− | Massmails
| |
− | *Discuss again with Paul about Worm Breeder's Gazette and WormBook.
| |
− | | |
− | | |
− | == September 13, 2012 ==
| |
− | | |
− | | |
− | Grant
| |
− | *Two things to work on:
| |
− | **1) Metrics and quality control
| |
− | ***Metrics: Transparent about our coverage (e.g. "We estimate XXX many RNAi experiments/papers and we have ~62% coverage")
| |
− | ****SVM form (in development) vs Curation Status form (needs work/update); can generate numbers for coverage to display on the web?
| |
− | ****What is the vision of a new Curation Status form? (Daniela will write something)
| |
− | ****Stats with new SVM results on pre-2000 papers: state them in grant as pre-2000 corpus
| |
− | ***Quality Control: How do we demonstrate that our curation is correct/accurate?
| |
− | ****Sampling First-pass curation (false negative estimates)
| |
− | ****WormBase person would check on case-by-case basis the accuracy of data (e.g. expression pattern, RNAi experiment, etc.); what percent do you need to check to ensure good quality control? Blind/independent re-curation? Collection of facts to use for metrics
| |
− | ****Contact senior author on a paper to check the quality of data from a paper; experts for a process; prioritize based on who responds to author-first-pass
| |
− | ****Data-type specific issues
| |
− | ****Internal consistency checks
| |
− | ****Ontology based curation checks (granularity; branch-consistency)
| |
− | **2) Training and user interaction
| |
− | ***Screencasts and FAQs list (Chris will write some sentences)
| |
− | ***User guide status? Currently needs major revamp and new information (old Wiki User Guide is now obsolete)
| |
− | ***Online chat: has been broken for months; should it only be allowed for logged-in users?
| |
− | *Daily updates with low-connectivity data (not relying on sequence mapping) from PostgreSQL database?
| |
− | **Have different CGI forms to populate the data on the website?
| |
− | *Are we committed to migrating away from ACEDB? Who then is responsible for managing data models?
| |
− | **Migrate to Chado? Do we need detailed plan for this?
| |
− | **FlyBase has migrated; they like Chado, but the migration was painful
| |
− | | |
− | | |
− | Worm Breeder's Gazette & WormBook outreach
| |
− | *Mass mailings should be an Opt-in system, as opposed to blanket mass mailings
| |
− | *One mass mail for each to ask about Opting-in to e-mail alerts?
| |
− | *Regional & International worm meetings; don't hand all e-mails over to meeting organizers.
| |
− | | |
− | | |
− | Itai Yanai expression datasets
| |
− | *Including images from microarray expression data directly from paper for now
| |
− | *Plan is to generate time-course expression data 'on-the-fly' later on so that other data sources can be compared and included in the same graph, given that they are the same type of data
| |
− | *Eventually, the paper images will be superceded by the new, calculated 'on-the-fly' images
| |
− | | |
− | | |
− | | |
− | Phenotype Images
| |
− | *We were asked if we wanted to join a consortium that is collecting images portraying different phenotypes
| |
− | *We may in the future, but not right now; we would need to develop a method for collecting and curating such images
| |
− | | |
− | | |
− | == September 20, 2012 ==
| |
− | | |
− | | |
− | Canopus back online
| |
− | *Cron job for pulling pictures; need to make sure it still works
| |
− | *Raymond lifted Todd's directories over to new hard drive
| |
− | */usr/local gone; Raymond did not restore
| |
− | *Todd was pulling from somewhere in /usr/local (/usr/local/wormbase/image ?)
| |
− | *Pictures in OICR directory
| |
− | *Raymond will send directory to Todd
| |
− | | |
− | | |
− | Molecule curation
| |
− | *"Drug" vs. "Molecule"
| |
− | *Synthetic vs Non-Synthetic?
| |
− | *Endogenous vs. Exogenous?
| |
− | *Focus of study
| |
− | *Textpresso categories including some molecules (e.g. 'glucose') throw off predictions for papers having 'drugs'
| |
− | | |
− | | |
− | Remote backup storage
| |
− | *Juancarlos suggesting storing some data/files remotely
| |
− | *Amazon will back up 1 GB data 12.5 cents/month; Essentially free to upload; Pay for download (12 cents/GB)
| |
− | *Data to store: Postgres data (for example)
| |
− | *Data could be encrypted, if we are concerned
| |
− | | |
− | | |
− | Curation Prioritization
| |
− | *Curating papers by process or topic
| |
− | **Reinstate some Textpresso categorization of papers by topic/process
| |
− | **Tied to pathway curation (LEGO and WikiPathways)
| |
− | *Prioritization of curation for genes that currently have no annotations
| |
− | | |
− | | |
− | | |
− | == September 27, 2012 ==
| |
− | | |
− | | |
− | Microarrays from other species
| |
− | *Storing data?
| |
− | *Will keep in a separate database (separate from C. elegans)
| |
− | *Curating Itai Yanai's paper (5 species)
| |
− | *Only found 4 microarray data sets from C. briggsae (from GO search) including Yanai paper; very few from other species so far
| |
− | *There may be more microarray data on non-Caenorhabditis species (parasites, etc.)
| |
− | *No way, currently, to inter-compare data sets (across species)
| |
− | | |
− | | |
− | Yanai data set
| |
− | *Everything in sandbox (images and data)
| |
− | | |
− | | |
− | Process Pages
| |
− | *Marc Perry is working on generating the pages
| |
− | *Newer data not available on website yet
| |
− | *WikiPathway for male mating (Karen made); Marc made mock page and put it up
| |
− | *Interactive pages; can click on entities
| |
− | *Process search not up and running yet
| |
− | *Can we get WormBase data into WikiPathways programatically? Not yet
| |
− | *Can use GenMAPP files
| |
− | *Wikipathways has node/edge hyperlinks
| |
− | *Figure out a way to pull data (programatically) from WormBase process
| |
− | **Create and place GPML objects spaced evenly on a page
| |
− | *Process pages have lists of: genes, cells, molecules, microarray expression clusters
| |
− | *Process pages are a good home for expression clusters
| |
− | *Process page: Summary, WikiPathways widget, lists
| |
− | *Phenotype to Process mapping of objects
| |
− | *Big-picture mapping of pathways (top down approach)
| |
− | *LEGO will become the bottom up approah
| |
− | *Would be good to capture pathway connections/structure in the ACEDB data model
| |
− | | |
− | | |
− | GO Meeting @ Caltech
| |
− | *[http://wiki.geneontology.org/index.php/Consortium_Meetings GO Wiki] has details
| |
− | *October 7-9
| |
− | | |
− | | |
− | PWM data
| |
− | *Gary Stormo paper coming out with lots of new PWM data (5000 binding sites)
| |
− | *Genome wide conserved elements (some experimentally verified)
| |
− | | |
− | | |
− | Transcription Factor -to- Target Gene Associations
| |
− | *We need better data display methods/tools
| |
− | *ModENCODE ChIP-seq data really would benefit from conversion to digital/discrete objects/associations
| |
− | | |
− | | |
− | Curating with respect to topic
| |
− | *Karen already doing this to some extent for processes
| |
− | *Clustering (papers) based on GO Biological Process terms
| |
| | | |
| | | |