WormBase-Caltech Weekly Calls January 2019

January 3, 2019

WS270 Citace upload

Next Tuesday, Jan 8th, 10am Pacific

Gene descriptions

Valerio generated new files to ignore/filter-out problematic genes
Still need to validate new pipeline
Barring any major issues, will submit new files for WS270 (can load old files if needed)
Maybe should define a test set (random sample) to test each release? Already have a test set

Protege Tutorial

Doodle poll open: https://doodle.com/poll/kn49rd3rggymn68g
Please fill out poll if you are interested in attending; have responses from Kimberly and Gary S.

January 10th, 2019

WB workshop at IWM 2019

Here's a draft, need to finalize as Jan 15th is the deadline


Title: WormBase 2019 - Data, Tools and Community Curation
This workshop will be an interactive session with users in order to discuss the types of data in WormBase and how to query them using specific tools.  We will discuss recent changes to WormBase community annotation forms and how to use them to contribute data to WormBase.  We will also present updates to ParaSite, a portal for parasitic worm genomic data, and guide participants on how to find data across model organisms at the Alliance of Genome Research.

Format: 90 minutes: 1 section of 40 minutes, followed by a second section of 20 mins and a third section which will be a 30 minute open discussion/Q&A session.  Talks in each section will also be tailored to allow time for questions from the audience.

Section 1: Introduction to the WormBase gene page and tools such as SimpleMine, Tools for RNA seq data and enrichment analysis, gene-related data using WormBase Ontology Browser and Annotation Visualization tools.

Section 2: WormBase Parasite database, Model Organism data at the Alliance of Genome Research and Community Curation forms

Section 3: Open forum for discussion and Q&A.

Finalize Protege tutorial time

Best final options:
- Wed, Jan 16th, 1pm Pacific/4pm Eastern
- Thurs, Jan 17th, 11am Pacific/2pm Eastern
- Thurs, Jan 17th, 1pm Pacific/4pm Eastern
Propose we go with Wed, Jan 16th, 1pm Pacific/4pm Eastern

Automated descriptions

Distinguishing information rich vs. poor genes
Information poor genes can take advantage of information across MODs/species
Need more robust QC pipeline; can work on for WormBase, and later apply to Alliance once worked out
Working on expression statements for Alliance genes
Considering rearrangement of description so disease features more prominently

Disease curation

Disease model curation progressing; Lots of discussions about data standards and entities in Alliance Disease Working Group
Considering SVM for disease; current paper flagging pipeline is rather broad
200+ papers as positive training set available
Results section are not being extracted in latest Textpresso (paper sectioning in general not happening)

Noctua / GO-CAM

Making progress on best practices
Can use Noctua to generate GO annotations
Starting to incorporate proteins
Working with an ever changing Noctua platform; bugs emerge as it is developed; may benefit from frozen release of the software
Next month or two, will import entire set of C. elegans GO annotations into Noctua
- Many decisions to make: how to model?
- Each gene will become a single Noctua model; not linked to each other initially
- Working on batch updates/uploads to Noctua

Expression cluster curation

Wen working on 40 paper backlog; hoping to finish by WS271
Wen wants to work on RNA-Seq tools next
- FPKM tools
- Filtering by datasets
- Would like tools ready before International C. elegans Meeting (June 2019)

Neural function curation

Raymond: want to use design pattern strategy to curate

WOBr

Now incororating non-IEA disease annotations into WOBr
Using disease-association file

Phenotype curation

Will run a new round of phenotype requests on ~3,000 papers in next few weeks (last one ran in October)
Processing community curation submissions
Will recurate some community curation papers to check:
- 1. completeness of community curation
- 2. the time-savings of the phenotype form pipeline
Have made recent improvements to phenotype request emails, allowing authors more feedback options which are now being readily used
Working with new phenotype ontology GitHub repository
- OBO Foundry now pointing phenotype ontology at the GitHub repository (both OBO and OWL files)
- Need to update the citace upload procedure to generate phenotype .ACE file; currently the script is still running on the old OBO Tazendra location; need to update to work off new OBO file at GitHub

Metabolomics

Karen working with Michael Witting to pull in metabolomics data
Integrating information about endogenous concentrations of metabolites

Automated descriptions React tool

Juancarlos developed tool to request versions of the automated descriptions
Will update pipeline to pull data from Alliance; currently coming from Tazendra
Tracking how the descriptions are changing, by data module for example
React tool currently on mangolassi but will move to Alliance at a location of Olin's choosing (AWS resource)

Transgenes in the Alliance

Are transgenes being discussed at the Alliance?
Yes, the phenotype and disease working group has been discussing
Hasn't come up in recent weeks, but was discussed at face-to-face meeting
One significant issue is that WormBase uniquely has extra-chromosomal arrays, whereas other MODS (always?) have integrated transgenes and consider them types of alleles
Chris will give Karen a heads up next time the issue is intended to be discussed within the Alliance

January 17th, 2019

Alliance Grant

Review grant and see if anything important is missing or if there are any needed edits
Tight on space but feel free to add a sentence here or there
Doc: https://docs.google.com/document/d/1HtTBnQYISfrMjnfFKEDaSjSazlyVBkvacOA8VWo8INY/edit?usp=sharing

January 24th, 2019

Author First Pass

For strain identification, we are using the obo_name_strain table.
There is an entry for 'Strain' in that table that leads to false positives.
Is this entry needed for curation?
If so, we will just filter it out for the purposes of AFP.
In Tazendra with timestamp Jan 23, 2019; on Mangolassi with timestamp Nov 15, 2018
In WS269 with timestamp '2018-09-25_17:00:39_pad'
Linked to paper WBPaper00055300; Location 'PS'; species C. elegans

Specifically expressed genes

On anatomy pages, in the Ontology Browser widget, we have a list of genes in a box that says "There are ### genes that may be specifically expressed."
These genes are genes that (1) are shown by expression pattern (Expr_pattern) objects to only be expressed in that tissue/cell or subtype but not in any other AND (2) genes that are shown to be enriched in that tissue/cell or subtype by expression cluster data BUT may include genes that are shown to be expressed (to some degree) via expression cluster in other tissues, albeit at low levels
Wording is currently a bit misleading; should the statement/wording change or should we change the algorithm?
Could offer a specifically expressed list and an enriched list separately
Warrants more discussion

January 31st, 2019

IWM workshop

Organizers reluctant to give us three workshops
We could ask for two: Micropublication & WormBase
Will be difficult to cover all relevant material
WormBase content
- Would want to cover tools for retrieving data
- Want WormMine, JBrowse, ontologies and ontology tools
- Should consider: Paulo is working on InterMine 2.0 release which will have a different interface; we should think about the timing of that release with respect to the timing of the meeting
We can try to have tutorials/presentations at the booth
- Set up a schedule to cover certain topics when appropriate curators/staff are there
We can ask attendees of the workshop what additional material to cover at the booth later
- Prepare several presentations, and present those that get the most votes

Specifically expressed genes

Proposal to:
- A) Parse gene sets that fall into specific categories and provide several lists
- B) Keep as is but make the language of the statement more vague
- C) Keep as is but remove genes shown to be expressed (to any level) in other tissues by expression cluster data
Could reserve "specifically expressed" for genes shown to only be expressed in that tissue by Expr_pattern data and/or expression cluster data, but excluding genes expressed in any other tissue to any level
Decision: Will change text to "may be predominantly expressed" and add explanation text for users
WormMine currently linking genes to any anatomy term (?) associated with an expression cluster; we need to review this as sometimes genes are connected because they are depleted in a tissue.

WormBase-Caltech Weekly Calls January 2019

Contents

January 3, 2019

WS270 Citace upload

Gene descriptions

Protege Tutorial

January 10th, 2019

WB workshop at IWM 2019

Finalize Protege tutorial time

Automated descriptions

Disease curation

Noctua / GO-CAM

Expression cluster curation

Neural function curation

WOBr

Phenotype curation

Metabolomics

Automated descriptions React tool

Transgenes in the Alliance

January 17th, 2019

Alliance Grant

January 24th, 2019

Author First Pass

Specifically expressed genes

January 31st, 2019

IWM workshop

Specifically expressed genes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools