WormBase-Caltech Weekly Calls February 2018

From WormBaseWiki
Jump to navigationJump to search

February 1, 2018

Automated gene descriptions - orthology

  • Some genes have human orthology mentioned in automated descriptions, even though the orthology call has not been called in DIOPT
  • WormBase uses EnsemblCompara and other methods (not aggregate method like DIOPT)
  • Orthology synchrony is a challenge; WormBase and FlyBase may need to pay special attention to orthology calls and discrepancies
  • DIOPT is purely automated, does not consider other information about orthology evidence
  • We should be clear about how the orthology calls are made

Next upload

  • Unclear of exact date
  • Probably end of March

SimpleMine issue

  • Redundant genes in input list are merged
  • Should SimpleMine provide an option to keep redundancies?
    • Give option up front? Provide submission step to point out redundancies? Ask for choice?
  • We can default to show row-by-row correspondence, and display the number of redundant entries
  • Conclusion: Make an option for users to indicate if they want row-by-row correspondence or a merged list

Cell type expression

  • Waterston paper
  • 40,000 random cells, clusters sequenced individually to a depth of 20,000 reads; ~1000 genes per cell; cluster data; make judgement call as to what cell types they likely are
  • For now, we can do a simple annotation: significantly expressed genes for each cell type
  • Supplemental table S5 for neurons
  • Maybe just ignore the hybrid calls like AQM/PVM, etc.
  • It may be good to isolate the single cell data from other expression data
  • We should annotate/capture the expression clusters
  • Would be good to be able to do enrichment analysis on the clusters; compare data sets
  • Data has not been placed in SPELL yet, Gary considered the data a work in progress
  • We can communicate with Waterston group; are they collecting more data?
  • Wen will take another look at the data
  • Gary W. concerned about the reported/assumed/inferred identity of the cells in the paper
  • Probably cannot curate to individual cells, but we can annotate to a higher level term
  • We want to annotate and display expression enrichment as well as presence/absence calls

February 8, 2018

Release schedule

  • Wen will ask Hinxton to update the published release schedule (for next data upload)

New York Worm Meeting

  • Wen and Kimberly will present a WormBase tutorial on March 24
  • Wen communicated to Oliver Hobert; suggested topics:
    • Multi-gene (batch) search tools
    • How literature info gets into WormBase? Curation process?
    • Should we discuss completeness?

GO curation

  • New simple input form for Noctua, being developed at USC
  • Not very much GO curation happening at WB right now
  • Protein-2-GO pipeline
  • Do we have a good Phenotype-2-GO(Process) mapping pipeline? We have our old mappings; not very reliable; would need to spend more time expanding the worm phenotype ontology and GO to improve
  • Cellular component curation will come in from WB expression curation
  • Don't have pipeline for Interactions-2-GO
  • Textpresso Molecular Functions pipeline?
  • geneprod and catalyticact data types for molecular function pipeline
  • Textpresso can send molecular function annotations to Noctua
  • For high-level pathway curation; we should probably read WormBook chapters (or other reviews) and develop pathways (using non-experimental evidence codes)
  • We could potentially seed Noctua models from Reactome
  • We would like to have complete curation for major pathways for gene enrichment analysis
  • Roles of small molecules in Noctua models still being worked out

Phenotype curation

  • Chris has had community curation pipeline on back burner while updating Wiki and dealing with AGR, WormMine, etc.
  • Will get back to soon; will resend email requests for newer papers sent over a year ago

Expression curation

  • Daniela getting back to expression curation after Micropublication stuff has quieted down

Gene regulation curation

  • April came across dataset involving regulation of siRNAs that don't seem to have gene objects in WB
  • May need to instantiate genes for these?

Physical interaction curation

  • SVM classification; do we flag a paper as negative that has protein interactions but no interactions for C. elegans
  • Can we generate a good SVM that only identifies WB-curatable papers?

Disease curation

  • Now curating the specific genetic entities involved in a disease model
  • Will also capture environmental conditions, treatments (e.g. ameliorates, exacerbates)
  • Curation in-line with AGR standards
  • Evidence code needed for assertions that an animal is a model of disease in which the assertion is based on background knowledge and experimental evidence, together
  • Evidence Code Ontology (ECO) is developing a new term to accommodate
  • Disease curators can use new evidence code as well as any existing codes
  • Is there a definition of a "disease model"?
  • What are the minimal criteria for considering something a disease model?
  • WB and FB curators focus on cellular phenotype and relation to the disease

Expression cluster curation

  • 27 papers in pipeline
  • Will then work on "single-cell" RNAseq
    • Wen, Raymond, and David should discuss

April and May Worm Meetings

  • Midwest and Colorado meetings
  • Wen submitting abstracts
  • Wen and Kimberly can write up abstract template for New York meeting and send around to be modified for future meetings


  • Published last version for legacy site


  • Daniel requested 13 (older) papers from Caltech library through inter-library process
  • Received more than half as images; would need optical character recognition (OCR) for Textpresso purposes
  • What is the state of the art of OCR now? How good is it? Can we ask Caltech library for the service?
  • Are these high priority papers? Need to check to see if worth processing


  • Disease working group setting up a face-to-face meeting
  • Variant working group may need a face-to-face meeting as well
  • Expression working group working out initial AGR site data display mockups
  • Interaction working group; we will want to incorporate miRNA/target interactions (RNA-RNA interactions); will look at miRBase

February 15, 2018

Model changes

  • Models freeze March 2nd
  • Will need to get model changes proposed and tested by then

Sys admin of Tazendra/Mangolassi

  • Raymond will discuss with Juancarlos to centralize
  • Need good documentation for forms, tools, etc.
  • Will be a push to put all code for tools and forms on GitHub

Tazendra forms, tools bug this week

  • There is a dependency on Mangolassi for some tools
  • Mangolassi went down and caused problems
  • Would be good to decouple the two machines


  • May not get an AGR all-hands face-to-face meeting before the summer
  • Working groups can decide to have face-to-face meeting
  • People should speak up if they have interest in visiting other MODs/sites; can be arranged
  • Consider what grant proposals could come out of such meetings/visits
  • Currently no ontology working group, no anatomy working group
  • Could establish a preliminary working group; reach out to relevant people
  • Anatomy working group issues may come up in expression working group
    • Daniela will keep Raymond updated on relevant issues that come up with the expression group

Ontology Browser gene lists

  • Chris requesting change to gene list display from WOBr
  • https://github.com/WormBase/website/issues/6190
  • Should provide WBGene IDs, not just gene public names
    • That was the original intent, but using WBGene IDs was, for some reason, causing issues when developing the tool; will need to revisit that issue to get WBGene IDs displayed

February 22, 2018

Making MOD data publicly available in a central location

  • Meets the NIH mandate for WB as a publicly-funded project and helps researchers get their data highlighted faster than waiting for the db build
    • would put in filters to avoid releasing sensitive data or incomplete curation annotations
    • would be good for journal hyperlinking project since it needs access to up to date data--see more below about project
    • Central data repository for all data (MOD) files would be helpful to developers (and users)
    • Does Caltech have an FTP site that could be used?

Journal Hyperlinking project goals

Hyperlinking project, has been in production since 2009, links bioentities in worm, yeast, and fly research articles to relevant databases, requires the latest data from WormBase, SGD, and FlyBase - could use a central repository to pull entities (name, ids, synonyms) from

    • MOD Curators check link accuracy and check for missed links (needs ongoing fte support)
    • supporting this project is not in the remit of any of the MODs, project is not sustainable without outside funding, hence finding funding outside of WB.
    • Since inception, project links bioentities in GSA papers (Genetics and G3)
    • C. elegans genes, alleles, rearrangements, strains, clones, short phenotype names, and transgenes are linked in these papers
    • Karen's (InSilico) grant goals are to
      • expand the pipeline to other journals, specifically eLife (then to PLoS)
      • expand to all AGR member mods in addition to SGD and FlyBase. Also bring in PomBase
      • not planning on expanding linking beyond simple text recognition of known lexica and entities that follow a regex
      • SBIR commercialization plan is to extend entity identification to commercial reagents and collect data for subscription-based access from biotech suppliers.
      • need data from postgres, which is not available in geneace dump. Could possibly just dump all Postgres data into one place, Karen's developers could write scripts to process that data;
          • Juancarlos will setup a URL that can be used to access the data; will setup on cronjob every day at 8pm
      • InSilico hyperlinks go through InSilico page to embedded i-frames of MOD entity page
        • allows trackability of link access, stats that will be given to the MODS
        • allows monitoring and resolution of links that go dead
        • allows splash pages for silent links

Alliance SAB meeting

  • SAB critical of:
    • not being unified
    • not being organized
  • Now everyone has committed
  • Concern still exists about autonomy of MODs
    • Will each user community still be served effectively by the Alliance?
  • Organization is easier when all are committed
  • Maybe bring in a professional organizer/project manager (long term)
  • New aggressive timeline for progress
    • April 23rd meeting; need to give material a week earlier
    • Need year and a half plan; each working group will provide details
  • Only 2 full time Alliance staff; may need more on project; difficult for individuals to split time/effort
  • SAB member: curation involves expert decision making/analysis on issues, not just straight-forward data acquisition
  • Maybe we would have better curation consistency if individual curators focused on particular topics; became experts for certain subject matter
  • Possibility to have Alliance all-hands call in Fall
  • Working groups can have face-to-face meetings; have travel/meeting budget until July 31st then resets

Automated gene descriptions

  • Difficult to handle genes with high information content; many ontology term annotations
  • How do we simplify descriptions? Using higher-level terms, slim terms? Gets tricky


  • Michael Elowitz tweeted out micropublication stuff
  • Received feedback on Twitter; worth looking at threads, comments

Community curation plan

  • Alliance need for community curation
  • Micropublications are a bit of a pilot for community curation
  • Need shared curation/submission forms; fit shared data models?

RNAi secondary targets

  • WormMine and WormBase gene/RNAi pages include secondary RNAi targets but WOBr and SObA do not
  • Should we include or exclude secondary targets? We want consistency across data sets
  • No gold standard RNAi target prediction algorithm
  • We should be transparent about primary/secondary status wherever we include secondary targets in display
  • Could be addressed with phenotype display proposal
  • Should probably remove secondary targets from bulk data sets