WormBase-Caltech Weekly Calls September 2014
From WormBaseWiki
Jump to navigationJump to search
September 4, 2014
I need to leave at 5.30pm (ermm, 9.30am CA time?) Mary Ann
Concise Descriptions
- Automated descriptions to go in for WS245
New Upload Schedule
- Delayed a couple weeks compared to original schedule
- Official citace upload to Hinxton on October 10th
- We can/should upload our data Wednesday before SAB trip (October 1st) to Hinxton
- Wen needs queries to include in Citace Upload summary by October 1st
- Upload contingent on models freeze
Data submission as part of publication process
- eLife considering micro-publication, addendums to papers (individual add-on experimental results)
- Can certain data be required to publish? Sequence info, etc. ?
- Could there be a pilot with a specific publisher (like GSA markup)?
Elixir
- http://www.elixir-europe.org/
- Use RDF (Resource Description Framework) triples
- Checking individual statements/sentences from literature for data presence/absence in database
New tags in Qualifier Hash
- Life_stage and Anatomy_term
- Adding to enable annotation of EPIC data
- Couples (or attempts to) time-and-space (life_stage-and-anatomy) annotation of expression pattern
- Can ambiguities be captured?
- This approach (bit of a kludge) introduces some denormalization (normalization can be automated later)
LEGO Curation
- Setting up connection to Minerva
- Juancarlos working with Seth, Chris, Heiko to debug setup
- Would be good (necessary?) to establish a working protocol for collaboration
- Raymond's LEGO-like approach to curating anatomy function
- Annotate a phenotype by annotating relevant DB objects, e.g. anatomy term, GO term, etc. as well as context/condition
- Use minimal relationships (relationship ontologies complicated and difficult to use)
September 11, 2014
SAB Meeting
- Can we start putting together a more detailed agenda, at least for Caltech?
- Would be good to decide on our talk topics so we can begin putting our presentation(s) together.
- Curation Stats numbers spreadsheet
- Good to capture amount of time (FTEs) on curation, but also software development, curation tools, pipelines, data modeling, help desk, fixing old data
- Would be good to have a rough breakdown of every curator's FTE breakdown
- Allocation of resources
- Ontology development; how much time is spent? Is it worth it?
- What tools do we have, or could we develop, that could substantially improve efficiency/effectiveness of curation? Example: sequence generation tool
- What are considerations for future database migration? We should account for migration delays to curation, etc.
- The curation database (like Postgres now) may or may not be the same database that drives the website
- Are our curation pipelines capturing sufficient detail (or too much, unnecessary detail)?
- Is it worth capturing negative data?
SAB Talk Proposals
- Nomenclature - not stats, but what we do, how it's done, communication etc Mary Ann
- Sequence Feature - developments Mary Ann
- Physical Interaction Curation - a relatively new data type for us, discuss existing data, strategies for going forward, what groups we could/do collaborate with, what files we could provide
- Community-Assisted Curation - what we currently do (author first pass, data submission forms), what more we could do (CANTO)
- Topic-Based Curation
September 18, 2014
Epic Data
- Daniela will test once Paul tags models (tomorrow Friday Sept 19th, 2014) and then upload to citace minus
- Wen needs a couple days from model tagging to prepare the final upload
SAB
- Chris (Caltech curation overview)
- Pipeline/Mission of Caltech (pull biological data from papers and put into database)
- Curation data types
- Who is who, who does what? (Photos of curators?)
- Curation stats, what is up to date, what needs more effort?
- Rate-limiting steps, tools (OA, curation status form, etc.) (slide from Daniela)
- Topic curation, pathways curation
- Brief statement on curation of other nematode species
- Wen (Expression Clusters)
- Couple slides on expression curation from Daniela
- What are expression clusters? Come from microarray, proteomics, tiling array
- Triage, pattern matching
- SPELL tool: what is it? usage?
- Display of expression cluster data
- WormBase generated expression clusters (custom algorithm?)
- Enrichment of GO terms and anatomy terms (segue into WOBr talk)
- Raymond (WormBase ontology browser)
- We use and develop a number of ontologies
- The ontological structure allows hierarchical browsing and reasoning
- Co-opting Amigo2 (existing tool)
- OWL formats
- Future developments
- Mary Ann (Nomenclature & Sequence features)
- Sequence features
- Sequence feature data display
- JBrowse/GBrowse integration
- Nomenclature
- CRISPR alleles
- Sequence features
- Ranjana (Human Disease)
- Update
- Two classes of disease models: Genes (via variations) & transgenes (overexperssion, deleterious repeats, etc.)
- ~260 gene models for human disease
- Drugs & therapeutic compounds, metal toxicity (toxicology in general)
- Disease portal? Toxicology portal?
- Toxins in Textpresso?
- Kimberly (GO curation)
- Enrichment analysis
- Priorities for GO
- Annotation extensions & LEGO curation
September 25, 2014
SAB Preparations
- Review of overview presentation
- What data are we not curating from the newest elegans/nematode papers?
- Papers per year is roughly flat per datatype per year but total number of papers is going up ~10% each year
- How many minutes should each CIT curator get to present?
- Karen will present Process/Topic slides (take over from Chris)
- Chris - Curation Overview, Ranjana - Disease, Raymond - WOBr, Wen- Expression datasets, Kimberly - GO, Mary Ann - Nomenclature, Sequence
New Backups
- Raymond setup backups through Duplicity
- Tazendra, Mangolassi, Athena, Canopus, other Linux machines backed up
- Backup is limited to two years; any data removed from a machine earlier than 2 years ago will be lost
Sequence Feature OA
- Currently read only, makes data available to curators
- Populated OA on sandbox (Mangolassi); skeleton form on Tazendra