Difference between revisions of "WormBase-Caltech Weekly Calls"

Revision as of 16:58, 4 October 2012

2009 Meetings

2011 Meetings

2012 Meetings

October 4, 2012

FlyBase SAB topics

Genome Space - can integrate data in different formats
- Cloud-based data integration
- File conversion done automatically - no need to write scripts
FlyBase talking about joining Genome Space?

Protein-to-GO Tool

Ranjana and Kimberly performing data checks for testing
If file can get to Rachel and Tony, maybe can discuss this weekend (at GO meeting)

GO Annotator Tool

Kimberly looking over
James will make a couple slides to introduce tool and provide demo
- 1-minute demo: can have file in XML/HTML format, can highlight, annotate a sentence, and save annotation as a link
  - Demo: simpler = better
  - Prepare for live feedback and discussion

Curation Status/Statistics Tool

How many papers have given data types?
How many papers have been curated, how many not?
How many objects/connections do we have of a given type?
How many objects per paper (average, distribution?)?
Estimated number of objects/papers exist vs. how many we have curated?
Do we care about types of flagging? Curator first pass, author first pass, etc? Yes
- All flagging types should be shown; combined and individual statistics would be useful
Interactive form vs. static page showing curation/flagging statisitcs
Tracking negatives (true vs false)
Data types in OA vs not (microarray, Protein-to-GO output)
- Microarray data - Wen can write script to generate stats for microarray curation
Curation stats per paper (Raymond): write comments in a remark field; traceable, transparent, available

Process pages

GPML - may be able to automatically map Postgres data to GPML
WikiPathways has color, formatting, labels, etc. so we can define types/views of different relationships (should follow standards)
AWC-ON/AWC-OFF sample page: only single connection type; can be many different types, as we define it
"Too many arrows" editorial?

Physical interaction curation

On interaction OA (Tab2) physical interactions; 'Colocalizes', etc.
Need to establish our data exchange with BioGRID; get BioGRID data in Postgres/OA (daily cronjob?)
Interaction model was modeled to be compatible with BioGRID's data
Revive curator first pass for physical interactions? To tag papers (in the meantime)

@@ Line 24: / Line 24: @@
 [[WormBase-Caltech_Weekly_Calls_August_2012|August]]
+[[WormBase-Caltech_Weekly_Calls_September_2012|September]]
-== September 6, 2012 ==
-Canopus
-*Will become a server (heavy outside access)
-*Raymond will take precautions to restore properly; should take a week or less
-*Used for:
-**Picture curation
-**Virtual worm web page & FTP
-Grant Writing
-*Approach & future plans - do we need to discuss and add more to proposal?
-*Virtual worm access to data
-*Browsing data
-**Browse genes/proteins by class/GO annotation
-*Intermine queries
-**Queries can be saved and used again later (available to other users)
-**Results can be stored and displayed on web pages
-*Displaying large scale expression (microarray) data (from SPELL)
-**SPELL data currently stored in relational (MySQL?) form
-**We could extend the WormBase web app to interact with SPELL data
-**RESTful-compliant data serving would be best
-Migration away from ACEDB?
-*Scalability - ACEDB does not scale to genome-scale studies
-*We (EBI & OICR) spend a lot of time building ACEDB database
-*ACEDB not actually used so much
-*Makes more sense to adopt a more central database
-*Migration would take place over entire grant cycle
-*We should continue extending models as we see fit, in the meantime
-Massmails
-*Discuss again with Paul about Worm Breeder's Gazette and WormBook.
-== September 13, 2012 ==
-Grant
-*Two things to work on:
-**1) Metrics and quality control
-***Metrics: Transparent about our coverage (e.g. "We estimate XXX many RNAi experiments/papers and we have ~62% coverage")
-****SVM form (in development) vs Curation Status form (needs work/update); can generate numbers for coverage to display on the web?
-****What is the vision of a new Curation Status form? (Daniela will write something)
-****Stats with new SVM results on pre-2000 papers: state them in grant as pre-2000 corpus
-***Quality Control: How do we demonstrate that our curation is correct/accurate?
-****Sampling First-pass curation (false negative estimates)
-****WormBase person would check on case-by-case basis the accuracy of data (e.g. expression pattern, RNAi experiment, etc.); what percent do you need to check to ensure good quality control? Blind/independent re-curation? Collection of facts to use for metrics
-****Contact senior author on a paper to check the quality of data from a paper; experts for a process; prioritize based on who responds to author-first-pass
-****Data-type specific issues
-****Internal consistency checks
-****Ontology based curation checks (granularity; branch-consistency)
-**2) Training and user interaction
-***Screencasts and FAQs list (Chris will write some sentences)
-***User guide status? Currently needs major revamp and new information (old Wiki User Guide is now obsolete)
-***Online chat: has been broken for months; should it only be allowed for logged-in users?
-*Daily updates with low-connectivity data (not relying on sequence mapping) from PostgreSQL database?
-**Have different CGI forms to populate the data on the website?
-*Are we committed to migrating away from ACEDB? Who then is responsible for managing data models?
-**Migrate to Chado? Do we need detailed plan for this?
-**FlyBase has migrated; they like Chado, but the migration was painful
-Worm Breeder's Gazette & WormBook outreach
-*Mass mailings should be an Opt-in system, as opposed to blanket mass mailings
-*One mass mail for each to ask about Opting-in to e-mail alerts?
-*Regional & International worm meetings; don't hand all e-mails over to meeting organizers.
-Itai Yanai expression datasets
-*Including images from microarray expression data directly from paper for now
-*Plan is to generate time-course expression data 'on-the-fly' later on so that other data sources can be compared and included in the same graph, given that they are the same type of data
-*Eventually, the paper images will be superceded by the new, calculated 'on-the-fly' images
-Phenotype Images
-*We were asked if we wanted to join a consortium that is collecting images portraying different phenotypes
-*We may in the future, but not right now; we would need to develop a method for collecting and curating such images
-== September 20, 2012 ==
-Canopus back online
-*Cron job for pulling pictures; need to make sure it still works
-*Raymond lifted Todd's directories over to new hard drive
-*/usr/local gone; Raymond did not restore
-*Todd was pulling from somewhere in /usr/local (/usr/local/wormbase/image ?)
-*Pictures in OICR directory
-*Raymond will send directory to Todd
-Molecule curation
-*"Drug" vs. "Molecule"
-*Synthetic vs Non-Synthetic?
-*Endogenous vs. Exogenous?
-*Focus of study
-*Textpresso categories including some molecules (e.g. 'glucose') throw off predictions for papers having 'drugs'
-Remote backup storage
-*Juancarlos suggesting storing some data/files remotely
-*Amazon will back up 1 GB data 12.5 cents/month; Essentially free to upload; Pay for download (12 cents/GB)
-*Data to store: Postgres data (for example)
-*Data could be encrypted, if we are concerned
-Curation Prioritization
-*Curating papers by process or topic
-**Reinstate some Textpresso categorization of papers by topic/process
-**Tied to pathway curation (LEGO and WikiPathways)
-*Prioritization of curation for genes that currently have no annotations
-== September 27, 2012 ==
-Microarrays from other species
-*Storing data?
-*Will keep in a separate database (separate from C. elegans)
-*Curating Itai Yanai's paper (5 species)
-*Only found 4 microarray data sets from C. briggsae (from GO search) including Yanai paper; very few from other species so far
-*There may be more microarray data on non-Caenorhabditis species (parasites, etc.)
-*No way, currently, to inter-compare data sets (across species)
-Yanai data set
-*Everything in sandbox (images and data)
-Process Pages
-*Marc Perry is working on generating the pages
-*Newer data not available on website yet
-*WikiPathway for male mating (Karen made); Marc made mock page and put it up
-*Interactive pages; can click on entities
-*Process search not up and running yet
-*Can we get WormBase data into WikiPathways programatically? Not yet
-*Can use GenMAPP files
-*Wikipathways has node/edge hyperlinks
-*Figure out a way to pull data (programatically) from WormBase process
-**Create and place GPML objects spaced evenly on a page
-*Process pages have lists of: genes, cells, molecules, microarray expression clusters
-*Process pages are a good home for expression clusters
-*Process page: Summary, WikiPathways widget, lists
-*Phenotype to Process mapping of objects
-*Big-picture mapping of pathways (top down approach)
-*LEGO will become the bottom up approah
-*Would be good to capture pathway connections/structure in the ACEDB data model
-GO Meeting @ Caltech
-*[http://wiki.geneontology.org/index.php/Consortium_Meetings GO Wiki] has details
-*October 7-9
-PWM data
-*Gary Stormo paper coming out with lots of new PWM data (5000 binding sites)
-*Genome wide conserved elements (some experimentally verified)
-Transcription Factor -to- Target Gene Associations
-*We need better data display methods/tools
-*ModENCODE ChIP-seq data really would benefit from conversion to digital/discrete objects/associations
-Curating with respect to topic
-*Karen already doing this to some extent for processes
-*Clustering (papers) based on GO Biological Process terms

Difference between revisions of "WormBase-Caltech Weekly Calls"

Revision as of 16:58, 4 October 2012

2012 Meetings

October 4, 2012

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools