Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
m
Line 24: Line 24:
 
[[WormBase-Caltech_Weekly_Calls_August_2012|August]]
 
[[WormBase-Caltech_Weekly_Calls_August_2012|August]]
  
 
+
[[WormBase-Caltech_Weekly_Calls_September_2012|September]]
 
 
== September 6, 2012 ==
 
 
 
Canopus
 
*Will become a server (heavy outside access)
 
*Raymond will take precautions to restore properly; should take a week or less
 
*Used for:
 
**Picture curation
 
**Virtual worm web page & FTP
 
 
 
 
 
Grant Writing
 
*Approach & future plans - do we need to discuss and add more to proposal?
 
*Virtual worm access to data
 
*Browsing data
 
**Browse genes/proteins by class/GO annotation
 
*Intermine queries
 
**Queries can be saved and used again later (available to other users)
 
**Results can be stored and displayed on web pages
 
*Displaying large scale expression (microarray) data (from SPELL)
 
**SPELL data currently stored in relational (MySQL?) form
 
**We could extend the WormBase web app to interact with SPELL data
 
**RESTful-compliant data serving would be best
 
 
 
 
 
Migration away from ACEDB?
 
*Scalability - ACEDB does not scale to genome-scale studies
 
*We (EBI & OICR) spend a lot of time building ACEDB database
 
*ACEDB not actually used so much
 
*Makes more sense to adopt a more central database
 
*Migration would take place over entire grant cycle
 
*We should continue extending models as we see fit, in the meantime
 
 
 
 
 
Massmails
 
*Discuss again with Paul about Worm Breeder's Gazette and WormBook.
 
 
 
 
 
== September 13, 2012 ==
 
 
 
 
 
Grant
 
*Two things to work on:
 
**1) Metrics and quality control
 
***Metrics: Transparent about our coverage (e.g. "We estimate XXX many RNAi experiments/papers and we have ~62% coverage")
 
****SVM form (in development) vs Curation Status form (needs work/update); can generate numbers for coverage to display on the web?
 
****What is the vision of a new Curation Status form? (Daniela will write something)
 
****Stats with new SVM results on pre-2000 papers: state them in grant as pre-2000 corpus
 
***Quality Control: How do we demonstrate that our curation is correct/accurate?
 
****Sampling First-pass curation (false negative estimates)
 
****WormBase person would check on case-by-case basis the accuracy of data (e.g. expression pattern, RNAi experiment, etc.); what percent do you need to check to ensure good quality control? Blind/independent re-curation? Collection of facts to use for metrics
 
****Contact senior author on a paper to check the quality of data from a paper; experts for a process; prioritize based on who responds to author-first-pass
 
****Data-type specific issues
 
****Internal consistency checks
 
****Ontology based curation checks (granularity; branch-consistency)
 
**2) Training and user interaction
 
***Screencasts and FAQs list (Chris will write some sentences)
 
***User guide status? Currently needs major revamp and new information (old Wiki User Guide is now obsolete)
 
***Online chat: has been broken for months; should it only be allowed for logged-in users?
 
*Daily updates with low-connectivity data (not relying on sequence mapping) from PostgreSQL database?
 
**Have different CGI forms to populate the data on the website?
 
*Are we committed to migrating away from ACEDB? Who then is responsible for managing data models?
 
**Migrate to Chado? Do we need detailed plan for this?
 
**FlyBase has migrated; they like Chado, but the migration was painful
 
 
 
 
 
Worm Breeder's Gazette & WormBook outreach
 
*Mass mailings should be an Opt-in system, as opposed to blanket mass mailings
 
*One mass mail for each to ask about Opting-in to e-mail alerts?
 
*Regional & International worm meetings; don't hand all e-mails over to meeting organizers.
 
 
 
 
 
Itai Yanai expression datasets
 
*Including images from microarray expression data directly from paper for now
 
*Plan is to generate time-course expression data 'on-the-fly' later on so that other data sources can be compared and included in the same graph, given that they are the same type of data
 
*Eventually, the paper images will be superceded by the new, calculated 'on-the-fly' images
 
 
 
 
 
 
 
Phenotype Images
 
*We were asked if we wanted to join a consortium that is collecting images portraying different phenotypes
 
*We may in the future, but not right now; we would need to develop a method for collecting and curating such images
 
 
 
 
 
== September 20, 2012 ==
 
 
 
 
 
Canopus back online
 
*Cron job for pulling pictures; need to make sure it still works
 
*Raymond lifted Todd's directories over to new hard drive
 
*/usr/local gone; Raymond did not restore
 
*Todd was pulling from somewhere in /usr/local (/usr/local/wormbase/image ?)
 
*Pictures in OICR directory
 
*Raymond will send directory to Todd
 
 
 
 
 
Molecule curation
 
*"Drug" vs. "Molecule"
 
*Synthetic vs Non-Synthetic?
 
*Endogenous vs. Exogenous?
 
*Focus of study
 
*Textpresso categories including some molecules (e.g. 'glucose') throw off predictions for papers having 'drugs'
 
 
 
 
 
Remote backup storage
 
*Juancarlos suggesting storing some data/files remotely
 
*Amazon will back up 1 GB data 12.5 cents/month; Essentially free to upload; Pay for download (12 cents/GB)
 
*Data to store: Postgres data (for example)
 
*Data could be encrypted, if we are concerned
 
 
 
 
 
Curation Prioritization
 
*Curating papers by process or topic
 
**Reinstate some Textpresso categorization of papers by topic/process
 
**Tied to pathway curation (LEGO and WikiPathways)
 
*Prioritization of curation for genes that currently have no annotations
 
 
 
 
 
 
 
== September 27, 2012 ==
 
 
 
 
 
Microarrays from other species
 
*Storing data?
 
*Will keep in a separate database (separate from C. elegans)
 
*Curating Itai Yanai's paper (5 species)
 
*Only found 4 microarray data sets from C. briggsae (from GO search) including Yanai paper; very few from other species so far
 
*There may be more microarray data on non-Caenorhabditis species (parasites, etc.)
 
*No way, currently, to inter-compare data sets (across species)
 
 
 
 
 
Yanai data set
 
*Everything in sandbox (images and data)
 
 
 
 
 
Process Pages
 
*Marc Perry is working on generating the pages
 
*Newer data not available on website yet
 
*WikiPathway for male mating (Karen made); Marc made mock page and put it up
 
*Interactive pages; can click on entities
 
*Process search not up and running yet
 
*Can we get WormBase data into WikiPathways programatically? Not yet
 
*Can use GenMAPP files
 
*Wikipathways has node/edge hyperlinks
 
*Figure out a way to pull data (programatically) from WormBase process
 
**Create and place GPML objects spaced evenly on a page
 
*Process pages have lists of: genes, cells, molecules, microarray expression clusters
 
*Process pages are a good home for expression clusters
 
*Process page: Summary, WikiPathways widget, lists
 
*Phenotype to Process mapping of objects
 
*Big-picture mapping of pathways (top down approach)
 
*LEGO will become the bottom up approah
 
*Would be good to capture pathway connections/structure in the ACEDB data model
 
 
 
 
 
GO Meeting @ Caltech
 
*[http://wiki.geneontology.org/index.php/Consortium_Meetings GO Wiki] has details
 
*October 7-9
 
 
 
 
 
PWM data
 
*Gary Stormo paper coming out with lots of new PWM data (5000 binding sites)
 
*Genome wide conserved elements (some experimentally verified)
 
 
 
 
 
Transcription Factor -to- Target Gene Associations
 
*We need better data display methods/tools
 
*ModENCODE ChIP-seq data really would benefit from conversion to digital/discrete objects/associations
 
 
 
 
 
Curating with respect to topic
 
*Karen already doing this to some extent for processes
 
*Clustering (papers) based on GO Biological Process terms
 
  
  

Revision as of 16:58, 4 October 2012

2009 Meetings

2011 Meetings


2012 Meetings

January

February

March

April

May

June

July

August

September


October 4, 2012

FlyBase SAB topics

  • Genome Space - can integrate data in different formats
    • Cloud-based data integration
    • File conversion done automatically - no need to write scripts
  • FlyBase talking about joining Genome Space?


Protein-to-GO Tool

  • Ranjana and Kimberly performing data checks for testing
  • If file can get to Rachel and Tony, maybe can discuss this weekend (at GO meeting)


GO Annotator Tool

  • Kimberly looking over
  • James will make a couple slides to introduce tool and provide demo
    • 1-minute demo: can have file in XML/HTML format, can highlight, annotate a sentence, and save annotation as a link
      • Demo: simpler = better
      • Prepare for live feedback and discussion


Curation Status/Statistics Tool

  • How many papers have given data types?
  • How many papers have been curated, how many not?
  • How many objects/connections do we have of a given type?
  • How many objects per paper (average, distribution?)?
  • Estimated number of objects/papers exist vs. how many we have curated?
  • Do we care about types of flagging? Curator first pass, author first pass, etc? Yes
    • All flagging types should be shown; combined and individual statistics would be useful
  • Interactive form vs. static page showing curation/flagging statisitcs
  • Tracking negatives (true vs false)
  • Data types in OA vs not (microarray, Protein-to-GO output)
    • Microarray data - Wen can write script to generate stats for microarray curation
  • Curation stats per paper (Raymond): write comments in a remark field; traceable, transparent, available


Process pages

  • GPML - may be able to automatically map Postgres data to GPML
  • WikiPathways has color, formatting, labels, etc. so we can define types/views of different relationships (should follow standards)
  • AWC-ON/AWC-OFF sample page: only single connection type; can be many different types, as we define it
  • "Too many arrows" editorial?


Physical interaction curation

  • On interaction OA (Tab2) physical interactions; 'Colocalizes', etc.
  • Need to establish our data exchange with BioGRID; get BioGRID data in Postgres/OA (daily cronjob?)
  • Interaction model was modeled to be compatible with BioGRID's data
  • Revive curator first pass for physical interactions? To tag papers (in the meantime)