Hinxton 2015.07- Meeting minutes

From WormBaseWiki
Jump to: navigation, search

July 2015

July 23, 2015

In attendance: KH, PD, MP, TD, JL, BB

General

  • Build 250
    • Plan to update all tier2 species for WS250
    • Some species (e.g. P. pac) have few/no gene model changes, but could do with being brought up to latest models and have newest processing code run over them
    • MP to create summary of gene model changes for each species
    • Lot of work, probably too much for a single builder
    • MP will initiate all builds, and then hand over to helpers
    • Will include the new C. remanei and C. elegans Hawaiian strain assemblies
  • WormBook chapters
    • Chapter on genomes and annotation submitted
    • Comparative genomics chapter nearly there (needs a extra figure or two)

Database migration

  • Deployed new curation database (the database currently known as "Genetomic") on fresh AWS instance, documenting along the way
  • Overhauled security code for access to the database
    • wormbase.org accounts via the web-app
    • SSL certificate for script access
  • Starting work porting some of the old name server scripts. Agreed that a 1-for-1 re-write not appropriate for new system (many operations will be simplified in the new system)

ParaSite

  • Test site for release 3 live: test.parasite.wormbase.org
  • Testing in progess
    • New Loa loa almost finished;
    • E. canadensis started
  • FTP site nearly done. Needs release notes (KH)
  • Home page now includes a panel summarising what is new this release
    • Two new genomes
    • Updates of 5 other genomes
    • REST API
    • New BLAST
    • RNASeq views for tapeworm genomes (Sanger data)
    • Various bug fixes
  • For next release, need to import the gene descriptions (product names) genereated by Avril Coghlan as part of the 50 HGP analysis

Curation / Data ingration

  • Caenorhabditis
    • EMBL dumping code changes to support UniProt-recommended product names
    • Cleanup of pseudogene Brief_identification, in prep for ENA submissions
    • Work done toward splitting C. briggsae "chrun" into its consituent supercontigs (with view to removing chrun, chrI_random etc in a future release)
  • Parasitic worms
    • O. volvulus
      • Some gene model and GO term curation done(JL used EBI protein2go to do the GO curation)
    • S. ratti
      • Some curation will be done this week, to go into the WS250 build
      • S. ratti will become a UniProt reference proteome from October
        • Proteins will go into Panther, and thus be available in the TE tool
    • Brugia
      • v3 assembly ENA submission to saga seems to be nearing completion
      • v4 annotation transfer work on-going
      • UCSC hub for v3/v4 comparison (including CACTUS SNAKE track)
  • Oschius
    • New nematode genome from South African lab
    • Performed CEGMA analysis of the assembly (MP). Found to be grossly incomplete. Reported back to the authors. Integration into WormBase on ice



July 16, 2015

In attendance: KH, MP, BB, JL, PD, GW, TD Minuted by MP Start 10:35

ParaSite release 3

  • planned for 30th July
  • RNASeq tracks with data from 2 publications ready for release (JL)
  • assembly hubs will work on ParaSite and UCSC (JL)
  • work is being doing to improve the display of BigWig files on ParaSite (JL,BB)
  • webcode bugfixes rolled into ParaSite 3 (BB)
  • Blast reworked to link results to WormBase-Core for WBC species and their GBrowse (BB)
  • REST API needs some testing
  • example ids are being reworked (BB/KH)
  • steering group meeting coming up, so talks need to be planned (KLH)

Infrastructure

  • production moved to GitHub, CVS only as archive (MP)
  • bug tracker consolidation into Jira (PD)
    • github sequcur tracker migrated (PD)
    • BitBucket seqcur tracker being moved by EBI systems (KH)
    • RT migration planned (MT)
  • Datomic
    • new GeneAce import (TD)
    • open for wider testing of the GenEnTonic/TrAceView web interface, as backup/restore is like running a script 2minutes (TD,MT)
    • get temporary resources to test the build process (GW/KLH)
  • should there be a separate ParaSite blog, to prevent too many parasite posts on the main blog? (BB,JL,KH)
  • GW attended WebApollo seminar run by VectorBase / EnsEMBL-Protists (GW)

Data Curation

  • C.briggsae chromosome cleanup for chromosome un and rnd (PD)
  • clones removed from toplevel C.elegans chromosomes (PD)
  • Brief_Identification cleanup for the collaboration with UniProt (PD)
  • tracking (last updated) rolled into the gene curation tool (PD)
    • last reviewed needs to be also designed/added (PD)
  • B.malayi v3.1 submission from WS248 got delayed. Problems fixed and resubmitted (MP)
  • RNASeq (GW)
    • 432 new C.elegans experiments added to WS250 (GW)
    • mockups of RNASeq display improvements as collaboration with Julie Ahringer (GW)
  • MassSpec (Wen / GW)
    • single worm data from Angus Lamont’s lab Dundee
    • new peptides / translation levels / post-translational modifications

New genomes

  • new C.remanei assembly and C.elegans Hawaii strain added to Cactus (MP)
  • new C.remanei alternate assembly added to build process / core-databases (PD)
  • B.malayi v4
    • RATT gene projection and Augustus prediction done (MP/Sanger Parasite groups)
    • STAR alignments for RNASeq done (GW)
    • ACeDB database done (MP)
    • display and gene model refinements in process (MP)

July 9, 2015

In attendance: KH, MP, BB, JL, PD, GW Minuted by PD Start 16:00

Database migration

  • Thomas last day 18th Sept.
  • Documentation Documentation Documentation.....
    • Ideal: Anyone can take an ACeDB database and load into a datomic instance
    • Documentation is on the database github repository
    • Not everyone has access to the db repository currently.
    • There may be parts of docs that need to be more proscriptive
  • Datomic set-up
    • Gary had trouble setting up Datomic on his desktop machine
    • Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
  • Datomic build
    • WS250 GW to take a 250 acedb and try and work solely from the documentation
    • Any issues then use Thomas's expertise while he is still here and improve the documentation
    • WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
  • Models updates
    • Done via an augmented models file to make the conversion.
    • Simple model additions = Simple extension to the augmented file
    • Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
  • Tools development: Collonade and the AceDb like tree editor TrACeView
  • Thomas's replacement
    • Originally a database developer, but now poss looking for some web dev skills
    • Target a Clojure programmer?


Curation / Annotation

  • New Brugia assembly (MP)
    • state of the genome
      • "Final" version
      • "No bacterial contamination" as Avril's code has been used.
      • Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
      • Haplotype sequences in additional file
    • Annotation
      • 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
      • Gene set will be Projected from old assembly with RATT + gap filling and extension from Augustus.
      • Training Augustus on old models (Min 2 exons with good intergenic spacing).
      • Issues will be:
        • Partially mapped manually curated genes
        • Curated and NOT mapped (Use Exonerate to recover).
        • Substring models......already has code for flattening them down.
    • STAR RNASeq alignments (GW)
  • New C. remanri genome (PD)
    • Strain PX356, from Philllips lab (published)
      • Generated setup configs etc.
      • Have an e! database containing genome
      • Annotations provided by user assessed and found to be bad
        • CDS features found with more exons that CDS features under a parent mRNA.
  • C. elegans reference genome
    • Yet another C. elegans reference sequence error paper (Zhang).
      • Some examples look odd.......deletion in intron where others have reported only a SNP.
      • Some inserts fix genes.
  • Re-working modENCODE RNASeq data
    • Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
    • Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.
  • Gene curation
    • Continues as usual
    • Finished looking at IWM genes (PD)
    • Organise meeting with new UniProt C. elegans curator
    • Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)

Parasite

  • RNASeq data display (JL & BB)
    • bigwig track display for RNASeq data for 3 cestodes
    • Tracks look of but working with various ideas for display
    • Grouped by study
    • Track scaling
      • Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
      • Possible to compare tracks but the scale will invariably be different.
      • Need to get scaling right so as to not smear out points of biological interest.
      • How do other browsers perform?
        • Jbrowse appears to do same as e! browser
        • Looks at UCSC track hub handling
    • Possibly define a set of ubiquitous genes to normalise on?
      • Gary uses ama-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
      • Encode might has software for normalisation in rseq tools
    • Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
  • Imminent release of ParaSite 3 (~30th July) (BB + KH)
    • Done
      • FTP dumps
      • blast dbs
      • REST API
    • To do
      • MartBuild takes ~1week
      • Clearing critical healthchecks
      • Search dumps
      • Testing!
        • 2 New Species and 1 changed
    • Misc
      • Forum for announcing new features
        • News feed page/blog
        • What's new from old releases logged and poss turned into paragraphs for release blog?
        • Possibly re-work the front page to give more prominence to news/information on what's new in the release.
      • GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
      • Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.

Communication

  • Working on retirement of the old Sanger CVS for pipeline code.
    • Have always has a GIT mirroring, this will become primary source
    • Need to consider the models and wspec in general (tagging etc)
    • Need to communicate this with Caltech (new place to pick up models)
  • Discussion about all the places we store work tickets

Misc

  • Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)