WormBase-Caltech Weekly Calls March 2014

From WormBaseWiki
Jump to navigationJump to search

March 6, 2014


  • ?Construct model


  • WS242 started to have miRNA datasets; very short probes, not uniquely mapped
  • Wen made modifications to SPELL to accommodate
  • Wen has started separating datasets from papers
  • miRNA analyses have used completely different platforms than protein-coding genes, so requires a different dataset for each
  • Wen started loading WB topics into SPELL
  • Wen would like assign ~100 papers in SPELL to WB topics
  • Wen also modified SPELL script to annotate datasets applicable to specific tissues or cells
  • Wen may/will co-opt Juancarlos' SPELL instance for testing

?Construct model

  • New ?Construct model will be vetted and approved for early in the WS244 curation cycle
  • Some changes to the ?Construct model proposed over last week
  • Still discussing how to handle plasmids/vectors, whether they should go in the ?Clone class (there is precedence)
  • We should defer to the Sequence Ontology, wherever possible, for definitions and relationships
  • We will go ahead with considering including all plasmids and vectors (e.g. AddGene & Fire vectors) to the ?Clone class which can then be referenced within the ?Construct class
  • Will change the "Identical_transgene"/"Identical_variation" tags to "Corresponding_transgene"/"Corresponding_variation"
  • If necessary, we will consider adding a ?Construct tag to the ?Interaction model to accommodate annotation of constructs used in in vitro physical interaction experiments; this is pending curation of (sufficient amounts of) the relevant data

March 13, 2014


Database Meeting Summary

  • Write up <https://www.dropbox.com/s/fr9qrsbup9djx4z/DatabaseFutureMeetingatOICR.pdf>.
  • Working group from three sites will test candidate technologies against metrics, considerations and requirements and report by Oct 2014.
  • Not clear yet whether everyone (in WormBase) will use one universal database technology, or if each site might use different databases
  • Site requirements: web speed & performance, model flexibility with regular updates, understandability of modeling language/structure
  • Different options: relational, row, column, NoSQL, SQL, object-oriented, graph database (Neo4J), DynamoDB (Amazon NoSQL DB)
  • Central database for everyone? Real-time editing and updating, or regular updating/synchronization
  • Process will take ~2 years or so

Data visualization with Santiago Lombeyda

  • Will play with some potentially straight-forward display options
  • Virtual worm renderings made into SVGs for layering and clicking/linking

?Infection_assay model

  • Broaden scope to include all types of species-species interaction (?Interspecies_interaction class ???)
  • Remove "Modifying_influence" tag and "Required*" tags in favor of "Resistance" and "Hypersensitivity" sets of tags
  • If we are going to include many types of species-species interactions, we need to consider how to make tag names that are unambiguous with respect to which species is playing a particular role
  • We will talk to parasite curators to see what we want to include in the model
  • Ranjana would like to add a concise-description-like text description to genes describing their role in infection
    • Will require retroactively making database connections once the ?Infection_assay (or equivalent) model is finalized and implemented (not this upcoming release)

WormBase Topic/Process hierarchy/ontology relationships

  • Karen will send around a working OBO file
  • Curators can look at which topics should be related to others (via parent-child relationships)
  • Also, we can look at trying to tie in to existing GO terms

March 20, 2014


New Database Discussions

  • We want speed, efficiency, performance
  • Will keep WB 2.0 web architecture
  • One big database for everyone?
  • Timestamps would not be kept for all annotations (performance, economy issues)
  • We still keep "Date_last_updated" and "Curator_confirmed" in ACEDB in the #Evidence hash
  • We would want to specify when and where we would want to keep (and track) timestamps
  • ACEDB timestamps have been proven useful

Padding WBPerson ID numbers with zeros

  • Laboratory search for "Raymond Chan" did not produce the desired/expected results
  • WBPerson ID is indexed but not the text string of the affiliated person
  • In this case WBPerson98 is listed, but automated searching for the full text person name was affected by the return of many WBPerson98* results
  • How much work would be required to change the WBPerson ID padding? Not trivial
  • We'll clarify what we think should be the user experience and Abby can decide the best way to fix it
  • We should thoroughly go through classes and clarify what fields should be indexed

GO Meeting update (from Paul Sternberg and Kimberly Van Auken)

  • Chris Mungall presenting/discussing common annotation tool
  • Table view annotator
  • Graph view annotator (LEGO/ORION) (example: http://go-genkisugi.rhcloud.com/seed/model/gomodel:goa_human-5323da180000002)
  • Tree view, PAINT tool used
  • Text, Paper viewer (SAB was excited about)
  • Protein-2-GO tool: dumped or restructured/repurposed?
  • Protein-2-GO open source? Should be open to further development
  • OBO-Edit is not being supported, changes from OBO to OWL format
  • Switching from OBO-Edit to Protege (will take time, ~6 months development)
  • CANTO (PomBase)
  • AMIGO2 now live on GO website
  • http://beta.geneontology.org/
  • Enrichment analysis (from PANTHER database)

Term Genie

  • New tool
  • Relies on logical definitions for GO terms
  • Web-based form for adding GO annotations when you can create logical definitions (explicit use of defined relationships)
  • E.G. response to chemical (when chemical has a ChEBI ID), etc.
  • Great for requesting new terms
  • http://go.termgenie.org/
  • http://code.google.com/p/termgenie/

Disease curation

  • FlyBase went live with disease data
  • We want to continue to have (and maybe extend upon) our connections/links to and from OMIM
  • We could work on WormBase/OMIM portal

March 27, 2014


  • Removing NOT phenotype from Phenotype GAF file.
    • Michael's comment: Removing it is easy, but I don't know what the position of the GO curators on that is. Also I think the negative phenotypes are quite useful to have.
    • Kevin's comment: Indeed. Conversely, if negative phenotypes are uninformative and confusing, then shouldn't we remove them across the board? i.e. from the GAF file and the database and web displays?
  • WOBr ready for testing
    • Enter WOBr issues here

Disease data in WOBr

  • Ranjana will inform EBI how to generate Gene Association File (GAF)
  • Once GAF is generated, will send to Raymond for inclusion into WOBr

Life stage Gene Association File (GAF)

  • Will need to also inform EBI on how to generate GAF
  • Daniela can e-mail Raymond to include GAF for WOBr

Removing NOTs from Gene Association Files (GAFs)

  • Phenotype GAF has NOTs for unobserved phenotypes
  • Should we remove NOTs from GAF?
  • Conclusion: We will leave the GAFs as they are and perform a pre-filtering step during WOBr data incorporation to exclude NOT phenotype annotations (for now)
  • Something to consider for future: Separate NOTs into their own GAF?
  • Will need to know/understand what are all of the downstream uses of the GAFs?


  • Not clear what input we would want to get from parasite curators and Matt Berriman's group
  • Depends on the ?Infection_model class and whether we are planning on expanding to Nematode_species-to-species interactions