Hinxton 2015.08- Meeting minutes

From WormBaseWiki
Jump to: navigation, search

August 2015

August 20, 2015

In attendance: KH, TD, GW, MP, JL

Build 250

  • Problems
    • Documentation error meant that crucial step at start of build was missed.
    • Result was that all non-elegans species were build using old data
    • Needed to re-start all T2 builds from scratch
    • Re-run of BLASTs, InterProScan, Compara
  • Timing
    • Build will probably be one week late.
    • MP will be on vacation at end of build, GW to finish build

Database migration

  • Curation tools (Collonade, editor)
    • Numerous fixes/changes based on feedback from Mary Ann
    • Added KeySet functionalty
  • Importer/Exporter
    • Cannot re-implement AceDB's partial case-insensitivity in Datomic.
      • WBGene00000001 != WBgene00000001 in new system
    • GW to run the importer for WS250
    • Dicussed cut-and-paste-into-shell method of running the importer

Curation

  • Weird genes
    • Handful of genes in C.elegans that have contrived structures to accommodate an frameshift / in-frame stop
    • In some cases, the frameshft/stop is conserved in other Caenorhabditis.
    • GW doing further work to investigate what this might mean
  • Mass spec data from Lamond lab
    • Aligning mass-spec peptides to genome, will appear as a genome browser track
  • Further discussed capturing/registering when a curator has reviewed a gene model but not made a change
    • Should boil this down to a proposal and circulate with whole group



August 14, 2015

In attendance: KH, TD, PD, GW, BB, MP

Database migration

  • Security discussion
    • Script will authenticate to the application using self-signed SSL certificates
    • What level of granularity for the certificates? Per center? Per user? Per update type (e.g. a separate cert for each build step)
  • Transaction log
    • Web app now allow unrolling of recent transactions
    • Functionality added to detect when unrolling of a T impacts on Ts made subsequently (e.g. T1 makes object, T2 refers to object, user attempts to undo T1).
  • Environment
    • GW had trouble previously getting Datomic running on EBI machine on nfd-mounted storage.
    • The free storage engine with Datomic assumes/requires low-latency (ideally directly attached) storage
    • GW to try again on acedb.ebi.ac.uk using /nfs/acedb/vol1 (purportedly low-latency storage)
  • 2-way XEFS with different attached # models at either end
    • e.g. Gene -> Variation (#Evidence), Variation -> Gene (#Molecular_change)
    • New db puts #model annotation of the edges of the graph, so does not currently support this.
    • Discussed possible workarounds
  • Loading Ace into Datomic
    • Some testing done by GW, works well

Build 250

  • Status
    • BLAST - all 9 species run, dumped and loaded
    • Compara - finished and loaded
    • RNASeq - in progress
  • General problems/fire-fighting
    • Transcript builder (mysteriously did not complete for two genomes)
    • InterProScan (found a bug in the Ensembl code, fixed and provided patch)
    • BLAST dumping (exceeded MySQL max connections. Needed to tweak server conf)

ParaSite

  • Redesign of homepage
    • Beta version demonstrated. Will circulate some other possible layouts
  • New blog for ParaSite
    • wbparasite.wordpress.com

Data integration / curation

  • Brugia
    • Produced initial annotation of new assembly.
    • MP presented to Brugia analysis group in conference call
  • C. elegans
    • DB_remark clean up -> Brief_identification, pseudogenes (will go in as a note in ENA submissions)
    • Classification pseudogenes. Will need a new unitary pseudogene tag in ?Pseudogene model
    • Consolidation of UniProt product names into the database
    • SVM paper => gene curation
    • Proteomics data from Lamond lab


August 6, 2015

In attendance: KH, PD, MP, TD, BB

Build 250

  • In progress. MP doing all 9 core species (no precedent for this).
  • First build from github code base (no problems so far)
  • How long will BLAST for all 9 species take (all BLASTX and BLASTP will need to be re-done, since all of the worm proteomes have changed)
  • Problems
    • RNAis with bad DNA_text. Cleaned and reported to Caltech
    • NCBI taxonomy database at EBI changed location, config needed a patch

Data integration / curation

  • B. malayi
    • WS248 Brugia submission now live at ENA
    • Gene prediction on new assembly
      • Looking at RATT transfers from old assembly
      • Using HAL mapping to project genes also works quite well.
      • AUGUSTUS to fill in the gaps
    • Handover will be Monday 10th August (during Brugia conference call)
  • C. remanei
    • Bad data. Submitter does not have time right now to clean. Will be deferred from this build
  • General
    • Hawaain strain - will go in as T3. BLASTable.
    • Cleaning up T2. Getting rid of in-frame STOPs for P. pacific us
    • Cleaning up Brief_identifcation for protein-coding and pseudogenes.

Database migration

  • New interface for uploading ace files.
  • Discussed potential strategies for schema updates

ParaSite

  • New release out
  • BLAST
    • Fixed glitches in new service
    • Discussed potential minor improvements to font page
  • Analytics
  • Redesign of the home page. Make it more WormBase compliant.
    • Search box confusion