Hinxton 2015.07- Meeting minutes
From WormBaseWiki
Contents
July 2015
July 16, 2015
In attendance: KH, MP, BB, JL, PD, GW, TD Minuted by PD Start 10:30
ParaSite release 3
- planned for 30th July
- RNASeq tracks with data from 2 publications ready for release (JL)
- assembly hubs will work on ParaSite and UCSC (JL)
- work is being doing to improve the display of BigWig files on ParaSite (JL,BB)
- webcode bugfixes rolled into ParaSite 3 (BB)
- Blast reworked to link results to WormBase-Core for WBC species and their GBrowse (BB)
- REST API needs some testing
- example ids are being reworked (BB/KH)
- steering group meeting coming up, so talks need to be planned (KLH)
Infrastructure
- production moved to GitHub, CVS only as archive (MP)
- bug tracker consolidation into Jira (PD)
- github sequcur tracker migrated (PD)
- BitBucket seqcur tracker being moved by EBI systems (KH)
- RT migration planned (MT)
- Datomic
- new GeneAce import (TD)
- open for wider testing of the GenEnTonic web interface, as backup/restore is like running a script 2minutes (TD,MT)
- get temporary resources to test the build process (GW/KLH)
- should there be a separate ParaSite blog, to prevent too many parasite posts on the main blog? (BB,JL,KH)
- GW attended WebApollo seminar run by VectorBase / EnsEMBL-Protists (GW)
Data Curation
- C.briggsae chromosome cleanup for chromosome un and rnd (PD)
- clones removed from toplevel C.elegans chromosomes (PD)
- Brief_Identification cleanup for the collaboration with UniProt (PD)
- tracking (last updated) rolled into the gene curation tool (PD)
- last reviewed needs to be also designed/added (PD)
- B.malayi v3.1 submission from WS248 hit delayed. Problems fixed and resubmitted (MP)
- RNASeq (GW)
- 432 new C.elegans experiments added to WS250 (GW)
- mockups of RNASeq display improvements as collaboration with Julie Ahringer (GW)
- MassSpec (Wen / GW)
- single worm data from Angus Lamont’s lab Dundee
- new peptides / translation levels / post-translational modifications
new genomes
- new C.remanei assembly and C.elegans Hawaii strain added to Cactus (MP)
- B.malayi v4
- RATT gene projection and Augustus prediction done (MP/Sanger Parasite groups)
- STAR alignments for RNASeq done (GW)
- ACeDB database done (MP)
- display and gene model refinements in process (MP)
July 9, 2015
In attendance: KH, MP, BB, JL, PD, GW Minuted by PD Start 16:00
Database migration
- Thomas last day 18th Sept.
- Documentation Documentation Documentation.....
- Ideal: Anyone can take an ACeDB database and load into a datomic instance
- Documentation is on the database github repository
- Not everyone has access to the db repository currently.
- There may be parts of docs that need to be more proscriptive
- Datomic set-up
- Gary had trouble setting up Datomic on his desktop machine
- Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
- Datomic build
- WS250 GW to take a 250 acedb and try and work solely from the documentation
- Any issues then use Thomas's expertise while he is still here and improve the documentation
- WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
- Models updates
- Done via an augmented models file to make the conversion.
- Simple model additions = Simple extension to the augmented file
- Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
- Tools development: Collonade and the AceDb like tree editor TrACeView
- Written in ClojureScript (Clojure compiled into JS)
- Thomas's replacement
- Originally a database developer, but now poss looking for some web dev skills
- Target a Clojure programmer?
Curation / Annotation
- New Brugia assembly (MP)
- state of the genome
- "Final" version
- "No bacterial contamination" as Avril's code has been used.
- Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
- Haplotype sequences in additional file
- Annotation
- 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
- Gene set will be Projected from old assembly with RATT + gap filling and extension from Augustus.
- Training Augustus on old models (Min 2 exons with good intergenic spacing).
- Issues will be:
- Partially mapped manually curated genes
- Curated and NOT mapped (Use Exonerate to recover).
- Substring models......already has code for flattening them down.
- STAR RNASeq alignments (GW)
- state of the genome
- New C. remanri genome (PD)
- Strain PX356, from Philllips lab (published)
- Generated setup configs etc.
- Have an e! database containing genome
- Annotations provided by user assessed and found to be bad
- CDS features found with more exons that CDS features under a parent mRNA.
- Strain PX356, from Philllips lab (published)
- C. elegans reference genome
- Yet another C. elegans reference sequence error paper (Zhang).
- Some examples look odd.......deletion in intron where others have reported only a SNP.
- Some inserts fix genes.
- Yet another C. elegans reference sequence error paper (Zhang).
- Re-working modENCODE RNASeq data
- Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
- Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.
- Gene curation
- Continues as usual
- Finished looking at IWM genes (PD)
- Organise meeting with new UniProt C. elegans curator
- Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)
Parasite
- RNASeq data display (JL & BB)
- bigwig track display for RNASeq data for 3 cestodes
- Tracks look of but working with various ideas for display
- Grouped by study
- Track scaling
- Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
- Possible to compare tracks but the scale will invariably be different.
- Need to get scaling right so as to not smear out points of biological interest.
- How do other browsers perform?
- Jbrowse appears to do same as e! browser
- Looks at UCSC track hub handling
- Possibly define a set of ubiquitous genes to normalise on?
- Gary uses ama-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
- Encode might has software for normalisation in rseq tools
- Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
- Imminent release of ParaSite 3 (~30th July) (BB + KH)
- Done
- FTP dumps
- blast dbs
- REST API
- To do
- MartBuild takes ~1week
- Clearing critical healthchecks
- Search dumps
- Testing!
- 2 New Species and 1 changed
- Misc
- Forum for announcing new features
- News feed page/blog
- What's new from old releases logged and poss turned into paragraphs for release blog?
- Possibly re-work the front page to give more prominence to news/information on what's new in the release.
- GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
- Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.
- Forum for announcing new features
- Done
Communication
- Working on retirement of the old Sanger CVS for pipeline code.
- Have always has a GIT mirroring, this will become primary source
- Need to consider the models and wspec in general (tagging etc)
- Need to communicate this with Caltech (new place to pick up models)
- Discussion about all the places we store work tickets
- RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
- https://bitbucket.org/pauld/seqcur_ticketing-system 84 tickets
- GitHub
- github website - website issues and helpdesk
- github pipeline - our code
- https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
- JIRA - EBI projects (incl Ensembl)
- Aim: rationalise
- Build code tickets should go on wormbase-pipeline github repor
- PD will work through the bitbucket ones.....mostly tier II
- JIRA (KH the main user of this system), should migrate to something visible project-wide
Misc
- Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)