July 2015

July 23, 2015

In attendance: KH, PD, MP, TD, JL, BB

General

Build 250
- Plan to update all tier2 species for WS250
- Some species (e.g. P. pac) have few/no gene model changes, but could do with being brought up to latest models and have newest processing code run over them
- MP to create summary of gene model changes for each species
- Lot of work, probably too much for a single builder
- MP will initiate all builds, and then hand over to helpers
- Will include the new C. remanei and C. elegans Hawaiian strain assemblies
WormBook chapters
- Chapter on genomes and annotation submitted
- Comparative genomics chapter nearly there (needs a extra figure or two)

Database migration

Deployed new curation database (the database currently known as "Genetomic") on fresh AWS instance, documenting along the way
Overhauled security code for access to the database
- wormbase.org accounts via the web-app
- SSL certificate for script access
Starting work porting some of the old name server scripts. Agreed that a 1-for-1 re-write not appropriate for new system (many operations will be simplified in the new system)

ParaSite

Test site for release 3 live: test.parasite.wormbase.org
Testing in progess
- New Loa loa almost finished;
- E. canadensis started
FTP site nearly done. Needs release notes (KH)
Home page now includes a panel summarising what is new this release
- Two new genomes
- Updates of 5 other genomes
- REST API
- New BLAST
- RNASeq views for tapeworm genomes (Sanger data)
- Various bug fixes
For next release, need to import the gene descriptions (product names) genereated by Avril Coghlan as part of the 50 HGP analysis

Curation / Data ingration

Caenorhabditis
- EMBL dumping code changes to support UniProt-recommended product names
- Cleanup of pseudogene Brief_identification, in prep for ENA submissions
- Work done toward splitting C. briggsae "chrun" into its consituent supercontigs (with view to removing chrun, chrI_random etc in a future release)
Parasitic worms
- O. volvulus
  - Some gene model and GO term curation done(JL used EBI protein2go to do the GO curation)
- S. ratti
  - Some curation will be done this week, to go into the WS250 build
  - S. ratti will become a UniProt reference proteome from October
    - Proteins will go into Panther, and thus be available in the TE tool
- Brugia
  - v3 assembly ENA submission to saga seems to be nearing completion
  - v4 annotation transfer work on-going
  - UCSC hub for v3/v4 comparison (including CACTUS SNAKE track)
Oschius
- New nematode genome from South African lab
- Performed CEGMA analysis of the assembly (MP). Found to be grossly incomplete. Reported back to the authors. Integration into WormBase on ice

July 16, 2015

In attendance: KH, MP, BB, JL, PD, GW, TD Minuted by MP Start 10:35

ParaSite release 3

planned for 30th July
RNASeq tracks with data from 2 publications ready for release (JL)
assembly hubs will work on ParaSite and UCSC (JL)
work is being doing to improve the display of BigWig files on ParaSite (JL,BB)
webcode bugfixes rolled into ParaSite 3 (BB)
Blast reworked to link results to WormBase-Core for WBC species and their GBrowse (BB)
REST API needs some testing
example ids are being reworked (BB/KH)
steering group meeting coming up, so talks need to be planned (KLH)

Infrastructure

production moved to GitHub, CVS only as archive (MP)
bug tracker consolidation into Jira (PD)
- github sequcur tracker migrated (PD)
- BitBucket seqcur tracker being moved by EBI systems (KH)
- RT migration planned (MT)
Datomic
- new GeneAce import (TD)
- open for wider testing of the GenEnTonic/TrAceView web interface, as backup/restore is like running a script 2minutes (TD,MT)
- get temporary resources to test the build process (GW/KLH)
should there be a separate ParaSite blog, to prevent too many parasite posts on the main blog? (BB,JL,KH)
GW attended WebApollo seminar run by VectorBase / EnsEMBL-Protists (GW)

Data Curation

C.briggsae chromosome cleanup for chromosome un and rnd (PD)
clones removed from toplevel C.elegans chromosomes (PD)
Brief_Identification cleanup for the collaboration with UniProt (PD)
tracking (last updated) rolled into the gene curation tool (PD)
- last reviewed needs to be also designed/added (PD)
B.malayi v3.1 submission from WS248 got delayed. Problems fixed and resubmitted (MP)
RNASeq (GW)
- 432 new C.elegans experiments added to WS250 (GW)
- mockups of RNASeq display improvements as collaboration with Julie Ahringer (GW)
MassSpec (Wen / GW)
- single worm data from Angus Lamont’s lab Dundee
- new peptides / translation levels / post-translational modifications

New genomes

new C.remanei assembly and C.elegans Hawaii strain added to Cactus (MP)
new C.remanei alternate assembly added to build process / core-databases (PD)
B.malayi v4
- RATT gene projection and Augustus prediction done (MP/Sanger Parasite groups)
- STAR alignments for RNASeq done (GW)
- ACeDB database done (MP)
- display and gene model refinements in process (MP)

July 9, 2015

In attendance: KH, MP, BB, JL, PD, GW Minuted by PD Start 16:00

Database migration

Thomas last day 18th Sept.
Documentation Documentation Documentation.....
- Ideal: Anyone can take an ACeDB database and load into a datomic instance
- Documentation is on the database github repository
- Not everyone has access to the db repository currently.
- There may be parts of docs that need to be more proscriptive
Datomic set-up
- Gary had trouble setting up Datomic on his desktop machine
- Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
Datomic build
- WS250 GW to take a 250 acedb and try and work solely from the documentation
- Any issues then use Thomas's expertise while he is still here and improve the documentation
- WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
Models updates
- Done via an augmented models file to make the conversion.
- Simple model additions = Simple extension to the augmented file
- Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
Tools development: Collonade and the AceDb like tree editor TrACeView
- Written in ClojureScript (Clojure compiled into JS)
Thomas's replacement
- Originally a database developer, but now poss looking for some web dev skills
- Target a Clojure programmer?

Curation / Annotation

New Brugia assembly (MP)
- state of the genome
  - "Final" version
  - "No bacterial contamination" as Avril's code has been used.
  - Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
  - Haplotype sequences in additional file
- Annotation
  - 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
  - Gene set will be Projected from old assembly with RATT + gap filling and extension from Augustus.
  - Training Augustus on old models (Min 2 exons with good intergenic spacing).
  - Issues will be:
    - Partially mapped manually curated genes
    - Curated and NOT mapped (Use Exonerate to recover).
    - Substring models......already has code for flattening them down.
- STAR RNASeq alignments (GW)

New C. remanri genome (PD)
- Strain PX356, from Philllips lab (published)
  - Generated setup configs etc.
  - Have an e! database containing genome
  - Annotations provided by user assessed and found to be bad
    - CDS features found with more exons that CDS features under a parent mRNA.

C. elegans reference genome
- Yet another C. elegans reference sequence error paper (Zhang).
  - Some examples look odd.......deletion in intron where others have reported only a SNP.
  - Some inserts fix genes.

Re-working modENCODE RNASeq data
- Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
- Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.

Gene curation
- Continues as usual
- Finished looking at IWM genes (PD)
- Organise meeting with new UniProt C. elegans curator
- Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)

Parasite

RNASeq data display (JL & BB)
- bigwig track display for RNASeq data for 3 cestodes
- Tracks look of but working with various ideas for display
- Grouped by study
- Track scaling
  - Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
  - Possible to compare tracks but the scale will invariably be different.
  - Need to get scaling right so as to not smear out points of biological interest.
  - How do other browsers perform?
    - Jbrowse appears to do same as e! browser
    - Looks at UCSC track hub handling
- Possibly define a set of ubiquitous genes to normalise on?
  - Gary uses ama-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
  - Encode might has software for normalisation in rseq tools
- Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
Imminent release of ParaSite 3 (~30th July) (BB + KH)
- Done
  - FTP dumps
  - blast dbs
  - REST API
- To do
  - MartBuild takes ~1week
  - Clearing critical healthchecks
  - Search dumps
  - Testing!
    - 2 New Species and 1 changed
- Misc
  - Forum for announcing new features
    - News feed page/blog
    - What's new from old releases logged and poss turned into paragraphs for release blog?
    - Possibly re-work the front page to give more prominence to news/information on what's new in the release.
  - GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
  - Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.

Communication

Working on retirement of the old Sanger CVS for pipeline code.
- Have always has a GIT mirroring, this will become primary source
- Need to consider the models and wspec in general (tagging etc)
- Need to communicate this with Caltech (new place to pick up models)
Discussion about all the places we store work tickets
- RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
- https://bitbucket.org/pauld/seqcur_ticketing-system 84 tickets
- GitHub
  - github website - website issues and helpdesk
  - github pipeline - our code
  - https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
- JIRA - EBI projects (incl Ensembl)
- Aim: rationalise
  - Build code tickets should go on wormbase-pipeline github repor
  - PD will work through the bitbucket ones.....mostly tier II
  - JIRA (KH the main user of this system), should migrate to something visible project-wide

Misc

Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)

Hinxton 2015.07- Meeting minutes

Contents

July 2015

July 23, 2015

General

Database migration

ParaSite

Curation / Data ingration

July 16, 2015

ParaSite release 3

Infrastructure

Data Curation

New genomes

July 9, 2015

Database migration

Curation / Annotation

Parasite

Communication

Misc

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools