Difference between revisions of "WormBase-Caltech Weekly Calls"
|Line 28:||Line 28:|
== July 9, 2015 ==
== July 9, 2015 ==
Revision as of 17:09, 13 August 2015
- 1 Previous Years
- 2 2015 Meetings
- 2.1 July 9, 2015
- 2.2 July 16, 2015
- 2.3 July 23, 2015
- 2.3.1 Worm model for autism
- 2.3.2 Database Migration
- 2.3.3 WormBase ParaSite
- 2.3.4 Microarray datasets & modSeek
- 2.3.5 WormBook chapter reviewers
- 2.3.6 C. elegans proteome in UniProt
- 2.3.7 Gene Orienteer Data
- 2.3.8 Precanned queries for exclusive expression
- 2.3.9 Embryonic developmental timing
- 2.3.10 Genetic Interaction Ontology (GIO)
- 2.3.11 Phenotype (ontology) display
- 2.3.12 PATO-style EQ (Entitiy-Quality) phenotype annotations
- 2.4 July 30, 2015
- 2.5 August 6, 2015
- 2.6 August 13, 2015
July 9, 2015
- Certain/uncertain qualifiers not annotated before some date
- ~3,000 ?Expr_pattern objects without that annotation/tag
- Daniela work on bringing up to date, hopefully won't take long
Expression Clusters to Anatomy & Life Stage annotations
- Many large scale datasets with tissue-specific expression data
- Much of what is in SPELL is not annotated to ?Anatomy or ?Life_stage terms
- A goal: make expression data queryable via ?Anatomy terms/pages
- Wen will make the model change proposal
- We may not want to show explicitly in widget
- There is a need for a condensed display of expression data (per gene)
- Some datasets, like the EPIC data, explicitly mention each embryonic cell name
- Need for a condensed ontology browser per gene/anatomy and gene/life stage
- Encyclopedia of Proteomic Dynamics, contacted Wen to share data
- Wen will meet/discuss with group soon to determine what the goals are
- It isn't clear what format the data has
- Should include Gary Williams on discussions as he already processes Mass Spectrometry data
- To what extent can we take care of the data and display of other lab/publication databases
- Many authors want to share and make links to their database/website via WormBase
- What is the best way to handle large scale dataset sharing requests that don't necessarily (for the time being) fit our data model
- We can take advantage of the "External Links" display on WBPaper pages to link out the the external databases affiliated with the paper, including a link to our FTP site with shared data files, maybe?
- At least a stop gap measure until we can properly model the data
Cis-regulatory site nomenclature?
- Barbara Meyer's lab published many "rex" (Recruitment Elements on X) sites, numbered sequentially
- Tim Schedl wondering about others' thoughts/opinions on how to, possibly, standardize the names of cis-regulatory elements
- Could be like gene names, without dash, e.g. "rex1", "rex2"
- We may want to try "WBsf-" prefix, on all element names like "WBsf-rex1", although may be only used in-house
- Were there any conclusions about phenotype lookup from the Allele-Phenotype form?
- Chris spoke with Harald Hutter and others at the meeting about how to improve the lookup for phenotypes
- Would be good to provide an explicit option to see phenotypes of related (or allele-affiliated) genes, perhaps by shared GO-term annotation
- Need to think more on how to best compress display of phenotypes on gene pages as well
- We do already provide links to the Variation and Gene pages (with Phenotypes displayed) in the term information box of the form
July 16, 2015
Anatomy term page expression
- Raymond and Juancarlos are working on a display of genes that may be exclusively expressed in that anatomy object
- Karen trying to make the curation of constructs & transgenes easier
- May consider merging the transgene and construct OA's
- Possibly add a construct/transgene request functionality in other OA's
- Would those need multiple input fields?
- Karen would take care of the details
- Exogenous/endogenous tags issue
- Scraping data from external chemical databases versus adding biologically relevant data from papers
- We pull data from, e.g. CHEBI, but not all molecules fall under their purview, e.g. proteins
- Promotion and outreach
- Micropublications discoverable in PubMed?
- Publisher = WormBase? Caltech?
- Minimal standards for publication?
July 23, 2015
Worm model for autism
- Would want to take human variations implicated in autism; look for orthologous genes in C. elegans/nematodes and find/make synonymous mutations
- Prioritize based on worm phenotypes
- Generally applies to human disease variants
- Thomas Down leaving WormBase in September
- Moving ahead with Datomic
- Good starting use-case for Datomic is querying Datomic-version of GeneACE
- Need to make sure documentation for migration to Datomic is available and comprehensible
- Point-people at each site: Sibyl @ OICR, Juancarlos @ Caltech
- Now need to work out the mechanics of curating into Datomic
- Reciprocal searches (WB <-> PS) are working well
Microarray datasets & modSeek
- Some earlier datasets were re-processed (log-transformed, or re-annotated into original replicates instead of averaged results)
- Need to try out different methods of processing raw-data (WB usually only takes in processed data)
- One pipeline can feed data into SPELL and modSeek
- It's difficult to establish/determine gold standards for assessing process performance
WormBook chapter reviewers
- Send reviewer suggestions to Paul ASAP
C. elegans proteome in UniProt
- Not a complete correspondence between WormBase and UniProt
- Cases: UniProt has entry for a protein that differs by one or two amino acids from WormBase
- Made from translations of what cDNAs etc. have been submitted
- Partial data, e.g. partial cDNAs translated
- Anything we can do to achieve greater consistency?
- Protein data sets are important
- Hinxton can use disrepancies as a flag to check on the gene/protein models
- Would be good to have more reciprocal linkage between UniProt and WormBase
- AVR-15, UniProt have two additional entries compared to Wormbase, differing in only 1 or 2 amino acids
- Should we pick up different entries from UniProt and store/display the data; how to reconcile?
- Possible use case: enter a UniProt ID into the BLAST/BLAT tool to identify WormBase matches
Gene Orienteer Data
- Sibyl and Xiaodong looking at data and scripts from Gene Orienteer
Precanned queries for exclusive expression
- Raymond & Juancarlos working on final details
- Intent is to display genes that may be specifically/exclusively expressed in e.g. an anatomy term
Embryonic developmental timing
- Sulston, Murray timing data sets for wild type embryonic cell division timing
- Mutant data sets are coming in as well
Genetic Interaction Ontology (GIO)
- Latest version of the GIO complete
- Juancarlos and Chris built a "genetic interaction calculator" to determine interaction types from quantitative phenotype inequalities
- Sending out to other MODs, etc.
- Seems that although there is buy in conceptually, most curators can't afford the time for such detailed curation
Phenotype (ontology) display
- Problems with display of phenotypes (and other annotations) on WormBase, as pointed out by several people at the IWM
- Karen would like to start creating allele concise descriptions
- We need compact, intelligently ordered annotation lists, not just alphabetical lists of ontology annotations
- It would be good to show ancestors for relatedness and order
- Chris working on Python script to display all annotations in the context of the entire ontology
- We will need to see if this approach is feasible/beneficial
PATO-style EQ (Entitiy-Quality) phenotype annotations
- It is clear that some phenotype annotations require details, e.g. "drug sensitivity" annotations should have the drug involved
- This drug/molecule annotation should be present in the details if not directly in the term itself
- Raises the issue of a number of cases where we need PATO-style EQ annotations, not just explicit phenotype terms for all possible scenarios
- This would be helpful in annotating embryonic timing and identity phenotype datasets
July 30, 2015
Wen Chen helped Wen Chen
- Wen Chen (lab) has list of genes to analyze
- Wen Chen (WB) helped process the list
- Would be good to have a simple CGI to process a list of genes in a variety of ways
- For GeneTissueLifeStage and GeneConciseDescription more datatypes easily slotted in if curator makes a file
- Is this redundant with WormMine?
- Not for data that doesn't exist (in WormMine) yet; more agile: could be up and running within a matter of days
Interconnections between WormBase and FlyBase
- We could create more inter-connectivity between the two databases
- Sharing concise descriptions of genes
- Would be good for FlyBase and WB curators (Xiaodong?) to talk about where the links should exist at each site
August 6, 2015
- prioritize new data types into WormMine
- RNAi phenotype, interactions, human disease...
- WormMine wiki page: http://wiki.wormbase.org/index.php/WormMine
- Wen wants to use the machine when WormMart retires
UniProt/wormbase gene class
- need to talk to UniProt C.elegans curator
Raymond, Chris and Juancarlos are working on phenotype viewer
James: list of genes, enrich in what tissues
- python code
- biotype ontology, tissue expression from postgres as input
August 13, 2015
Phenotype term annotation summary graph
Goal: Provides an ontology-relationship-aware summary view of a gene's phenotype annotations. Prototype link aex-3 (fewer phenotypes) existing phenotype widget <http://www.wormbase.org/species/c_elegans/gene/WBGene00000086#-b-3> summary graph <http://188.8.131.52/~azurebrd/cgi-bin/amigo.cgi?action=annotSummaryGraph&focusTermId=WBGene00000086>
daf-2 (lots of phenotypes) phenotype widget <http://www.wormbase.org/species/c_elegans/gene/WBGene00000898#-b-3> summary <http://184.108.40.206/~azurebrd/cgi-bin/amigo.cgi?action=annotSummaryGraph&focusTermId=WBGene00000898>
Proposed development procedure:
- standalone prototyping, commenting and improvements within the group.
- implementation as a widget on dev site (juancarlos.wormbase.org), more testing and soliciting comments from selected end users.
- committing to main site for general use.
Outline of graph processing:
To gather information:
- WOBr query to collect all phenotypes annotated to the gene of interest.
- WOBr query to collect all transitive relationships of the phenotypes from (1) towards the ontology root.
To simplify and to control graph size:
- Remove all nodes (phenotype terms) that are not directly annotated with or at branching points where two branches of annotations merge (LCA lowest common ancestor, if you will).
- Scale node size according to annotation count (includes inferred annotations).
- Limit appearance of label to nodes above a given size (roughly big enough to hold term name).
- Show annotation counts in mouse-over bubble, add hyperlink to term pages to each node
International Biocuration Conference
Propose to submit paper on Community Curation
- Mary Ann happy to lead.
- Daniela on board.