WormBase-Caltech Weekly Calls

2009 Meetings

2011 Meetings

2012 Meetings

January

August 2, 2012

Grant updates

Topics
- Diseases
- Worm Phenotype ontology; attempt applying to other species? Some has been done already (e.g. C. briggsae)
  - Benefit of curating phenotypes in other species? Particularly useful for genes not in C. elegans, for example
- Textpresso for nematodes
  - >5000 papers now
  - Complete set of papers by ~Labor Day
- Anatomy ontology
  - Anatomy page is hub for data
    - We should strive for user friendly display of information
    - Cell functions, 10 or 100 highest/preferentially expressed genes, cell connections, cell signals
  - Upcoming challenge - male/female/hermaphrodite divergence
  - Anatomy with respect to life stage (e.g. life-stage-specific cells)
  - Multiple species
  - Uberon framework can be adapted for multiple (nematode) species
- Gene Function
  - RNAi, Allele-phenotype, Transgene-phenotype etc.
- Interactions
  - Integrated interaction model, genetic interaction ontology
  - Can we estimate how many interactions are left to curate? Can use OA/Postgres, SVM, and first-pass author forms etc. to estimate
- Gene expression and Pictures
  - Need to update Gene expression model to accommodate Epic dataset (John Murray), 3D movies (Bill Mohler), and single-molecule FISH (van Oudenarden et al)
  - Itai Yania dataset (embryo expression across several nematode species)
  - Expression SVM won't catch isolated tissue/cells expression analysis or microarray data
  - Incorporating the virtual worm and browser
- Microarray and SPELL
  - Will incorporate microarray and RNA-Seq data sets for other species
  - Should let users download search results more easily (for single genes, for example)
  - Need to change SPELL database to incorporate new species
  - Users should be able to run clustering on data
  - Co-expression correlation; should recalculate each build (with flexible significance thresholds)
  - Provide Cytoscape view of genes connected by co-expression
- Pathways and Processes
  - Plan to work with Wikipathways
  - Vocabularies and annotation schemes like Systems Biology Graphical Notation (SBGN)
  - Trying to get data into BioPAX (Biological PAthway eXchange) format
  - BioPAX too detail oriented? Very biochemical?
  - Some databases dump BioPAX format, but won't read it in
- Paper and curation pipeline
- Concise description progress; coverage
  - Re-annotation efforts?
  - New concise description curation interface for easier writing and updating
- Annotating genes in the more expressive GO format
  - How would our data models need to change to curate with the new expressive GO
- SVMs
- Include collaborations?
  - GSA markup; encouraging other journals to adopt?
  - Web page links; electronic text books, etc.
  - Can automate linking, but can't support manual QC without more (financial) support
- Supporting links to WormBase (in general)
  - WormAtlas, for example
- Google-like entity info (e.g. George Washington) displayed on side of search results page
  - We provide short write up of genes to Google?
  - Google-funded? Google.org
- Transcriptional regulatory networks (TRNs)
  - Gene regulation curation
  - Limited number of data for Position weight matrices (PWMs) and TF-binding sites
  - Consolidation of TF-binding/target-gene data into one place (ChIP-Seq/modENCODE data, PWMs, Gene regulation interactions)
  - How to best visualize the available data?
  - We can design a new visual scheme for TRNs
  - We will curate enhancers?
Suggestions for future
- Better integration across data types
- How the OA can evolve and what it can be used for?

August 9, 2012

SVM Analysis Form

Sandbox version available for testing
Data stored on Postgres
Daniela can show how to use
SVM flags main papers and supplemental documents; should they be grouped into a single document or kept separate?
- Depends on curator
- Should have a direct (unambiguous) link to supplemental documents
Can flag false positive papers
Can query for papers on a batch-per-batch (by SVM analysis date) basis
False negatives are automatically annotated as such when an SVM-negative paper is curated for the respective data type in the OA
Curators CAN check SVM-negatives if they want to, but are not required to
Can query if a specific paper (or papers) has been flagged (by SVM) for certain data types
Proposed OA field to capture what supplemental document the data came from, if from supplement

Grant

People can add new ideas/visions for future development of WormBase
- Visualization, integration, graphs, etc.
- How do we visualize complex information?
- Do we need to group data types for visualization? E.G. Transcriptional regulatory networks vs genome browsing
Scaling?
Dependency on ACEDB?

August 16, 2012

Helpdesk

GitHub link to e-mail?
When an issue is submitted via the website, GitHub generates a unique e-mail which can be replied to. That e-mail thread is then included in the GitHub issue
Who should close an issue? Ultimately, an issue/ticket should be closed by whichever WormBase staff addresses or resolves the issue
- If the issue is not closed by the one who resolves it, the help desk officer should follow up to check
When is an issue resolved? May depend on the nature of the issue and on what has to be done
Need project management tools? Redmine? Something else?

Large scale projects

Scripts and documentation used to deal with/handle large scale data should be put into GitHub
Example, Itai Yanai expression data; just another microarray paper, but with new oligo sets?
Need to store enough info to reproduce curation
Store all large scale data sets and scripts, documentation, etc. on a single computer with regular backup

August 23, 2012

Transgene tables

Not available on new site
Broken on legacy site
Integrated transgenes' location (which chromosome?)
Static page was proposed for new site
We (Caltech) could possibly put on the new WormBase Support section (which we have write-access to)

Finding all labs in a region (user request)

How to best identify all labs in, for example, England, Asia, South America, Canada, etc.
Can search using patterns for e-mail address (not optimal)
Maybe better search with physical mailing address, but need all country codes, and country-continent affiliations
We can possibly create a script to generate a table every release
Change data model to have a "Country" tag? And then programmatically assign continents based on Country tag
Juancarlos can extract PI address info from Postgres and setup a CGI for future search
- Form at http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=ContinentPIs

SVM Tool and Data Type Statistics

Chris would like to get Interaction numbers (how complete are we with curation)
SVM Tool needs some tweaks:
- All papers are broken up into several documents, if there is supplementary material
- If a curator searches for all "Positive" papers, all papers that have at least ONE "Positive" document will be returned
- Conversely, if a curator searches for all "Negative" papers, all papers that have at least ONE "Negative" document will be returned
- This is a problem, since we would want all papers in which, ALL documents are "Negative"
What are the benefits/drawbacks of separating a paper into multiple documents?
- Kimberly likes to have separate documents to reduce amount of data type searching to be done once we have SVM results
- Keeping multiple documents may complicate the search procedure, unless we can change the query process
Juancarlos will create a filter step in the SVM tool such that the user/curator can specify if they would like to search on the basis of "whole" paper versus individual document
In this way, searching for "whole" "Positive" papers will only return and display "Positive" whole papers for which at least one document in the paper is "Positive"; searching for "Negative" papers will return all papers for which ALL documents are "Negative"
Searching on the basis of individual documents will work as it does now: displaying all papers as individual documents with their respective SVM results
Juancarlos will also add a filter for "Primary" vs "Not Primary" vs "Not Designated"

WormBase-Caltech Weekly Calls

Contents

2012 Meetings

August 2, 2012

August 9, 2012

August 16, 2012

August 23, 2012

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools