WormBase-Caltech Weekly Calls

From WormBaseWiki
Jump to navigationJump to search

2009 Meetings

2011 Meetings


2012 Meetings

January

February

March

April

May

June

July


August 2, 2012

Grant updates

  • Topics
    • Diseases
    • Worm Phenotype ontology; attempt applying to other species? Some has been done already (e.g. C. briggsae)
      • Benefit of curating phenotypes in other species? Particularly useful for genes not in C. elegans, for example
    • Textpresso for nematodes
      • >5000 papers now
      • Complete set of papers by ~Labor Day
    • Anatomy ontology
      • Anatomy page is hub for data
        • We should strive for user friendly display of information
        • Cell functions, 10 or 100 highest/preferentially expressed genes, cell connections, cell signals
      • Upcoming challenge - male/female/hermaphrodite divergence
      • Anatomy with respect to life stage (e.g. life-stage-specific cells)
      • Multiple species
      • Uberon framework can be adapted for multiple (nematode) species
    • Gene Function
      • RNAi, Allele-phenotype, Transgene-phenotype etc.
    • Interactions
      • Integrated interaction model, genetic interaction ontology
      • Can we estimate how many interactions are left to curate? Can use OA/Postgres, SVM, and first-pass author forms etc. to estimate
    • Gene expression and Pictures
      • Need to update Gene expression model to accommodate Epic dataset (John Murray), 3D movies (Bill Mohler), and single-molecule FISH (van Oudenarden et al)
      • Itai Yania dataset (embryo expression across several nematode species)
      • Expression SVM won't catch isolated tissue/cells expression analysis or microarray data
      • Incorporating the virtual worm and browser
    • Microarray and SPELL
      • Will incorporate microarray and RNA-Seq data sets for other species
      • Should let users download search results more easily (for single genes, for example)
      • Need to change SPELL database to incorporate new species
      • Users should be able to run clustering on data
      • Co-expression correlation; should recalculate each build (with flexible significance thresholds)
      • Provide Cytoscape view of genes connected by co-expression
    • Pathways and Processes
      • Plan to work with Wikipathways
      • Vocabularies and annotation schemes like Systems Biology Graphical Notation (SBGN)
      • Trying to get data into BioPAX (Biological PAthway eXchange) format
      • BioPAX too detail oriented? Very biochemical?
      • Some databases dump BioPAX format, but won't read it in
    • Paper and curation pipeline
    • Concise description progress; coverage
      • Re-annotation efforts?
      • New concise description curation interface for easier writing and updating
    • Annotating genes in the more expressive GO format
      • How would our data models need to change to curate with the new expressive GO
    • SVMs
    • Include collaborations?
      • GSA markup; encouraging other journals to adopt?
      • Web page links; electronic text books, etc.
      • Can automate linking, but can't support manual QC without more (financial) support
    • Supporting links to WormBase (in general)
      • WormAtlas, for example
    • Google-like entity info (e.g. George Washington) displayed on side of search results page
      • We provide short write up of genes to Google?
      • Google-funded? Google.org
    • Transcriptional regulatory networks (TRNs)
      • Gene regulation curation
      • Limited number of data for Position weight matrices (PWMs) and TF-binding sites
      • Consolidation of TF-binding/target-gene data into one place (ChIP-Seq/modENCODE data, PWMs, Gene regulation interactions)
      • How to best visualize the available data?
      • We can design a new visual scheme for TRNs
      • We will curate enhancers?
  • Suggestions for future
    • Better integration across data types
    • How the OA can evolve and what it can be used for?


August 9, 2012

SVM Analysis Form

  • Sandbox version available for testing
  • Data stored on Postgres
  • Daniela can show how to use
  • SVM flags main papers and supplemental documents; should they be grouped into a single document or kept separate?
    • Depends on curator
    • Should have a direct (unambiguous) link to supplemental documents
  • Can flag false positive papers
  • Can query for papers on a batch-per-batch (by SVM analysis date) basis
  • False negatives are automatically annotated as such when an SVM-negative paper is curated for the respective data type in the OA
  • Curators CAN check SVM-negatives if they want to, but are not required to
  • Can query if a specific paper (or papers) has been flagged (by SVM) for certain data types
  • Proposed OA field to capture what supplemental document the data came from, if from supplement


Grant

  • People can add new ideas/visions for future development of WormBase
    • Visualization, integration, graphs, etc.
    • How do we visualize complex information?
    • Do we need to group data types for visualization? E.G. Transcriptional regulatory networks vs genome browsing
  • Scaling?
  • Dependency on ACEDB?


August 16, 2012

Helpdesk

  • GitHub link to e-mail?
  • When an issue is submitted via the website, GitHub generates a unique e-mail which can be replied to. That e-mail thread is then included in the GitHub issue
  • Who should close an issue? Ultimately, an issue/ticket should be closed by whichever WormBase staff addresses or resolves the issue
    • If the issue is not closed by the one who resolves it, the help desk officer should follow up to check
  • When is an issue resolved? May depend on the nature of the issue and on what has to be done
  • Need project management tools? Redmine? Something else?


Large scale projects

  • Scripts and documentation used to deal with/handle large scale data should be put into GitHub
  • Example, Itai Yanai expression data; just another microarray paper, but with new oligo sets?
  • Need to store enough info to reproduce curation
  • Store all large scale data sets and scripts, documentation, etc. on a single computer with regular backup


August 23, 2012

Transgene tables

  • Not available on new site
  • Broken on legacy site
  • Integrated transgenes' location (which chromosome?)
  • Static page was proposed for new site
  • We (Caltech) could possibly put on the new WormBase Support section (which we have write-access to)


Finding all labs in a region (user request)

  • How to best identify all labs in, for example, England, Asia, South America, Canada, etc.
  • Can search using patterns for e-mail address (not optimal)
  • Maybe better search with physical mailing address, but need all country codes, and country-continent affiliations
  • We can possibly create a script to generate a table every release
  • Change data model to have a "Country" tag? And then programmatically assign continents based on Country tag
  • Juancarlos can extract PI address info from Postgres and setup a CGI for future search


SVM Tool and Data Type Statistics

  • Chris would like to get Interaction numbers (how complete are we with curation)
  • SVM Tool needs some tweaks:
    • All papers are broken up into several documents, if there is supplementary material
    • If a curator searches for all "Positive" papers, all papers that have at least ONE "Positive" document will be returned
    • Conversely, if a curator searches for all "Negative" papers, all papers that have at least ONE "Negative" document will be returned
    • This is a problem, since we would want all papers in which, ALL documents are "Negative"
  • What are the benefits/drawbacks of separating a paper into multiple documents?
    • Kimberly likes to have separate documents to reduce amount of data type searching to be done once we have SVM results
    • Keeping multiple documents may complicate the search procedure, unless we can change the query process
  • Juancarlos will create a filter step in the SVM tool such that the user/curator can specify if they would like to search on the basis of "whole" paper versus individual document
  • In this way, searching for "whole" "Positive" papers will only return and display "Positive" whole papers for which at least one document in the paper is "Positive"; searching for "Negative" papers will return all papers for which ALL documents are "Negative"
  • Searching on the basis of individual documents will work as it does now: displaying all papers as individual documents with their respective SVM results
  • Juancarlos will also add a filter for "Primary" vs "Not Primary" vs "Not Designated"