WormBase-Caltech Weekly Calls August 2012

From WormBaseWiki
Jump to navigationJump to search

August 2, 2012

Grant updates

  • Topics
    • Diseases
    • Worm Phenotype ontology; attempt applying to other species? Some has been done already (e.g. C. briggsae)
      • Benefit of curating phenotypes in other species? Particularly useful for genes not in C. elegans, for example
    • Textpresso for nematodes
      • >5000 papers now
      • Complete set of papers by ~Labor Day
    • Anatomy ontology
      • Anatomy page is hub for data
        • We should strive for user friendly display of information
        • Cell functions, 10 or 100 highest/preferentially expressed genes, cell connections, cell signals
      • Upcoming challenge - male/female/hermaphrodite divergence
      • Anatomy with respect to life stage (e.g. life-stage-specific cells)
      • Multiple species
      • Uberon framework can be adapted for multiple (nematode) species
    • Gene Function
      • RNAi, Allele-phenotype, Transgene-phenotype etc.
    • Interactions
      • Integrated interaction model, genetic interaction ontology
      • Can we estimate how many interactions are left to curate? Can use OA/Postgres, SVM, and first-pass author forms etc. to estimate
    • Gene expression and Pictures
      • Need to update Gene expression model to accommodate Epic dataset (John Murray), 3D movies (Bill Mohler), and single-molecule FISH (van Oudenarden et al)
      • Itai Yania dataset (embryo expression across several nematode species)
      • Expression SVM won't catch isolated tissue/cells expression analysis or microarray data
      • Incorporating the virtual worm and browser
    • Microarray and SPELL
      • Will incorporate microarray and RNA-Seq data sets for other species
      • Should let users download search results more easily (for single genes, for example)
      • Need to change SPELL database to incorporate new species
      • Users should be able to run clustering on data
      • Co-expression correlation; should recalculate each build (with flexible significance thresholds)
      • Provide Cytoscape view of genes connected by co-expression
    • Pathways and Processes
      • Plan to work with Wikipathways
      • Vocabularies and annotation schemes like Systems Biology Graphical Notation (SBGN)
      • Trying to get data into BioPAX (Biological PAthway eXchange) format
      • BioPAX too detail oriented? Very biochemical?
      • Some databases dump BioPAX format, but won't read it in
    • Paper and curation pipeline
    • Concise description progress; coverage
      • Re-annotation efforts?
      • New concise description curation interface for easier writing and updating
    • Annotating genes in the more expressive GO format
      • How would our data models need to change to curate with the new expressive GO
    • SVMs
    • Include collaborations?
      • GSA markup; encouraging other journals to adopt?
      • Web page links; electronic text books, etc.
      • Can automate linking, but can't support manual QC without more (financial) support
    • Supporting links to WormBase (in general)
      • WormAtlas, for example
    • Google-like entity info (e.g. George Washington) displayed on side of search results page
      • We provide short write up of genes to Google?
      • Google-funded? Google.org
    • Transcriptional regulatory networks (TRNs)
      • Gene regulation curation
      • Limited number of data for Position weight matrices (PWMs) and TF-binding sites
      • Consolidation of TF-binding/target-gene data into one place (ChIP-Seq/modENCODE data, PWMs, Gene regulation interactions)
      • How to best visualize the available data?
      • We can design a new visual scheme for TRNs
      • We will curate enhancers?
  • Suggestions for future
    • Better integration across data types
    • How the OA can evolve and what it can be used for?


August 9, 2012

SVM Analysis Form

  • Sandbox version available for testing
  • Data stored on Postgres
  • Daniela can show how to use
  • SVM flags main papers and supplemental documents; should they be grouped into a single document or kept separate?
    • Depends on curator
    • Should have a direct (unambiguous) link to supplemental documents
  • Can flag false positive papers
  • Can query for papers on a batch-per-batch (by SVM analysis date) basis
  • False negatives are automatically annotated as such when an SVM-negative paper is curated for the respective data type in the OA
  • Curators CAN check SVM-negatives if they want to, but are not required to
  • Can query if a specific paper (or papers) has been flagged (by SVM) for certain data types
  • Proposed OA field to capture what supplemental document the data came from, if from supplement


Grant

  • People can add new ideas/visions for future development of WormBase
    • Visualization, integration, graphs, etc.
    • How do we visualize complex information?
    • Do we need to group data types for visualization? E.G. Transcriptional regulatory networks vs genome browsing
  • Scaling?
  • Dependency on ACEDB?


August 16, 2012

Helpdesk

  • GitHub link to e-mail?
  • When an issue is submitted via the website, GitHub generates a unique e-mail which can be replied to. That e-mail thread is then included in the GitHub issue
  • Who should close an issue? Ultimately, an issue/ticket should be closed by whichever WormBase staff addresses or resolves the issue
    • If the issue is not closed by the one who resolves it, the help desk officer should follow up to check
  • When is an issue resolved? May depend on the nature of the issue and on what has to be done
  • Need project management tools? Redmine? Something else?


Large scale projects

  • Scripts and documentation used to deal with/handle large scale data should be put into GitHub
  • Example, Itai Yanai expression data; just another microarray paper, but with new oligo sets?
  • Need to store enough info to reproduce curation
  • Store all large scale data sets and scripts, documentation, etc. on a single computer with regular backup


August 23, 2012

Transgene tables

  • Not available on new site
  • Broken on legacy site
  • Integrated transgenes' location (which chromosome?)
  • Static page was proposed for new site
  • We (Caltech) could possibly put on the new WormBase Support section (which we have write-access to)


Finding all labs in a region (user request)

  • How to best identify all labs in, for example, England, Asia, South America, Canada, etc.
  • Can search using patterns for e-mail address (not optimal)
  • Maybe better search with physical mailing address, but need all country codes, and country-continent affiliations
  • We can possibly create a script to generate a table every release
  • Change data model to have a "Country" tag? And then programmatically assign continents based on Country tag
  • Juancarlos can extract PI address info from Postgres and setup a CGI for future search


SVM Tool and Data Type Statistics

  • Chris would like to get Interaction numbers (how complete are we with curation)
  • SVM Tool needs some tweaks:
    • All papers are broken up into several documents, if there is supplementary material
    • If a curator searches for all "Positive" papers, all papers that have at least ONE "Positive" document will be returned
    • Conversely, if a curator searches for all "Negative" papers, all papers that have at least ONE "Negative" document will be returned
    • This is a problem, since we would want all papers in which, ALL documents are "Negative"
  • What are the benefits/drawbacks of separating a paper into multiple documents?
    • Kimberly likes to have separate documents to reduce amount of data type searching to be done once we have SVM results
    • Keeping multiple documents may complicate the search procedure, unless we can change the query process
  • Juancarlos will create a filter step in the SVM tool such that the user/curator can specify if they would like to search on the basis of "whole" paper versus individual document
  • In this way, searching for "whole" "Positive" papers will only return and display "Positive" whole papers for which at least one document in the paper is "Positive"; searching for "Negative" papers will return all papers for which ALL documents are "Negative"
  • Searching on the basis of individual documents will work as it does now: displaying all papers as individual documents with their respective SVM results
  • Juancarlos will also add a filter for "Primary" vs "Not Primary" vs "Not Designated"


August 30, 2012

Transgenes

  • Transgenes have now transgene IDs
    • each curator should check dumpers for next upload (in 2 months) on mangolassi and see that everything looks fine.
    • Each curator should also make sure that all the transgene objects they have in their curation pipeline are converted into IDs. E.g. Kimberly had a bunch of transgenes that were not converted.
    • implement transgene ontology in GO?


Dead genes

  • How do we deal with dead genes?
  • Juancarlos mentioned that on Tazendra if dead genes are mapped, then those maps are reflected; there is no mapping otherwise.
  • Wen normally replaces dead genes into new ones. She will double check the scripts and see what is happening when a dead gene is found.


Grant

  • by tomorrow evening everyone should finish to write up his part on the google docs as Paul will remove it from there.
  • each curator should estimate the effort that is going to take for his own data type in terms of time, important for budgeting.
  • everyone should add real references at the bottom of the document. In the text only 'Author, Date'


SVM on other nematodes

  • Daniela and Yuling are trying SVM on other nematodes (Pacificus as first trial) to estimate how many papers could be positive for otherexpr and to see if SVM could be used for triage for other species.


Papers with no PMIDs

  • James pointed out that many papers (for other nematodes) did not have PMIDs
  • Paul suggested to check in agricultural databases e.g. Agricola


Canopus died -R.I.P.

  • Canopus did not have any backup but everyone knew
  • The new Canopus will be used as server
  • Raymond bought a new HD


Nikhil Bhatla join the meeting from MIT with Xiaodong

  • Nikhil started curating neural connectivity in wormweb.org
  • he shared some ideas to implement some data types in Wormbase
    • expression -> nikhil started curating images of expression in the neuralnet portal -e.g. http://wormweb.org/neuralnet#c=AVA&m=1. Daniela will get in touch with him and see how we can set up a crosstalk with pictures in Wormbase
    • loss of function studies. Raymond will talk to Nikhil and see how he can contribute to curation. We have part of loss of function curation in anatomy function -site of action, direct manipulation of anatomical part, ablation.
    • gain of function-> we have some gain of function in overexpression. Karen mentioned she annotates overexpression and phenotypes and when possible she associates with cell.
    • physiology -> stimulus dependent response. e.g. to a given stimulus there is a Ca2+ increase. We should be able to capture that info, it is worth thinking about it.

Raymond mentioned that some of these info are already present in phenotype, Xiaodong said some are also in gene regulation. Paul suggested that we need to tie everything together and have a more unified view, more data integration of what we already have.

    • Correlation physiology? How to implement correlation physiology data (see Hendricks et al, 2012 http://www.ncbi.nlm.nih.gov/pubmed/22722842)
    • Nikhil mentioned it would be useful for instance to be able to have a query such as: display all cells that respond to a certain stimulus, e.g. increase of Temperature.


Flybase

  • Susan contacted Ranjana about our disease pipeline. They would like to use the pipeline for Flybase. They were talking about having GO-style curation with evidence codes for human disease relevance tag.


QCFast for GO

  • James will give a demo next week at our regular group meeting. The QCFast for GO is coming along very well. The QCFast for GO will be the precursor of the next generation curation interface.


Database paper

  • Kimberly is revising the Database paper. She is including author accuracy and identification of data types. It can vary from 70% (Raymond) to 98% (Juancarlos, Gary, Daniela) (not Juancarlos, I don't have afp_ data -- J)