WormBase-Caltech Weekly Calls September 2014

From WormBaseWiki
Jump to navigationJump to search

September 4, 2014

I need to leave at 5.30pm (ermm, 9.30am CA time?) Mary Ann

Concise Descriptions

  • Automated descriptions to go in for WS245

New Upload Schedule

  • Delayed a couple weeks compared to original schedule
  • Official citace upload to Hinxton on October 10th
  • We can/should upload our data Wednesday before SAB trip (October 1st) to Hinxton
  • Wen needs queries to include in Citace Upload summary by October 1st
  • Upload contingent on models freeze

Data submission as part of publication process

  • eLife considering micro-publication, addendums to papers (individual add-on experimental results)
  • Can certain data be required to publish? Sequence info, etc. ?
  • Could there be a pilot with a specific publisher (like GSA markup)?

Elixir

  • http://www.elixir-europe.org/
  • Use RDF (Resource Description Framework) triples
  • Checking individual statements/sentences from literature for data presence/absence in database

New tags in Qualifier Hash

  • Life_stage and Anatomy_term
  • Adding to enable annotation of EPIC data
  • Couples (or attempts to) time-and-space (life_stage-and-anatomy) annotation of expression pattern
  • Can ambiguities be captured?
  • This approach (bit of a kludge) introduces some denormalization (normalization can be automated later)

LEGO Curation

  • Setting up connection to Minerva
  • Juancarlos working with Seth, Chris, Heiko to debug setup
  • Would be good (necessary?) to establish a working protocol for collaboration
  • Raymond's LEGO-like approach to curating anatomy function
    • Annotate a phenotype by annotating relevant DB objects, e.g. anatomy term, GO term, etc. as well as context/condition
    • Use minimal relationships (relationship ontologies complicated and difficult to use)


September 11, 2014

SAB Meeting

  • Can we start putting together a more detailed agenda, at least for Caltech?
  • Would be good to decide on our talk topics so we can begin putting our presentation(s) together.
  • Curation Stats numbers spreadsheet
    • Good to capture amount of time (FTEs) on curation, but also software development, curation tools, pipelines, data modeling, help desk, fixing old data
  • Would be good to have a rough breakdown of every curator's FTE breakdown
  • Allocation of resources
  • Ontology development; how much time is spent? Is it worth it?
  • What tools do we have, or could we develop, that could substantially improve efficiency/effectiveness of curation? Example: sequence generation tool
  • What are considerations for future database migration? We should account for migration delays to curation, etc.
  • The curation database (like Postgres now) may or may not be the same database that drives the website
  • Are our curation pipelines capturing sufficient detail (or too much, unnecessary detail)?
  • Is it worth capturing negative data?

SAB Talk Proposals

  • Nomenclature - not stats, but what we do, how it's done, communication etc Mary Ann
  • Sequence Feature - developments Mary Ann
  • Physical Interaction Curation - a relatively new data type for us, discuss existing data, strategies for going forward, what groups we could/do collaborate with, what files we could provide
  • Community-Assisted Curation - what we currently do (author first pass, data submission forms), what more we could do (CANTO)
  • Topic-Based Curation


September 18, 2014

Epic Data

  • Daniela will test once Paul tags models (tomorrow Friday Sept 19th, 2014) and then upload to citace minus
  • Wen needs a couple days from model tagging to prepare the final upload


SAB

  • Chris (Caltech curation overview)
    • Pipeline/Mission of Caltech (pull biological data from papers and put into database)
    • Curation data types
    • Who is who, who does what? (Photos of curators?)
    • Curation stats, what is up to date, what needs more effort?
    • Rate-limiting steps, tools (OA, curation status form, etc.) (slide from Daniela)
    • Topic curation, pathways curation
    • Brief statement on curation of other nematode species
  • Wen (Expression Clusters)
    • Couple slides on expression curation from Daniela
    • What are expression clusters? Come from microarray, proteomics, tiling array
    • Triage, pattern matching
    • SPELL tool: what is it? usage?
    • Display of expression cluster data
    • WormBase generated expression clusters (custom algorithm?)
    • Enrichment of GO terms and anatomy terms (segue into WOBr talk)
  • Raymond (WormBase ontology browser)
    • We use and develop a number of ontologies
    • The ontological structure allows hierarchical browsing and reasoning
    • Co-opting Amigo2 (existing tool)
    • OWL formats
    • Future developments
  • Mary Ann (Nomenclature & Sequence features)
    • Sequence features
      • Sequence feature data display
      • JBrowse/GBrowse integration
    • Nomenclature
      • CRISPR alleles
  • Ranjana (Human Disease)
    • Update
    • Two classes of disease models: Genes (via variations) & transgenes (overexperssion, deleterious repeats, etc.)
    • ~260 gene models for human disease
    • Drugs & therapeutic compounds, metal toxicity (toxicology in general)
    • Disease portal? Toxicology portal?
    • Toxins in Textpresso?
  • Kimberly (GO curation)
    • Enrichment analysis
    • Priorities for GO
    • Annotation extensions & LEGO curation


September 25, 2014

SAB Preparations

  • Review of overview presentation
  • What data are we not curating from the newest elegans/nematode papers?
    • Papers per year is roughly flat per datatype per year but total number of papers is going up ~10% each year
  • How many minutes should each CIT curator get to present?
  • Karen will present Process/Topic slides (take over from Chris)
  • Chris - Curation Overview, Ranjana - Disease, Raymond - WOBr, Wen- Expression datasets, Kimberly - GO, Mary Ann - Nomenclature, Sequence


New Backups

  • Raymond setup backups through Duplicity
  • Tazendra, Mangolassi, Athena, Canopus, other Linux machines backed up
  • Backup is limited to two years; any data removed from a machine earlier than 2 years ago will be lost


Sequence Feature OA

  • Currently read only, makes data available to curators
  • Populated OA on sandbox (Mangolassi); skeleton form on Tazendra