Difference between revisions of "Hinxton 2015.07- Meeting minutes"
From WormBaseWiki
Jump to navigationJump to searchm (→July 16, 2015) |
m |
||
(12 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
= July 2015 = | = July 2015 = | ||
+ | |||
+ | == July 23, 2015 == | ||
+ | |||
+ | In attendance: KH, PD, MP, TD, JL, BB | ||
+ | |||
+ | === General === | ||
+ | |||
+ | * Build 250 | ||
+ | ** Plan to update all tier2 species for WS250 | ||
+ | ** Some species (e.g. P. pac) have few/no gene model changes, but could do with being brought up to latest models and have newest processing code run over them | ||
+ | ** MP to create summary of gene model changes for each species | ||
+ | ** Lot of work, probably too much for a single builder | ||
+ | ** MP will initiate all builds, and then hand over to helpers | ||
+ | ** Will include the new C. remanei and C. elegans Hawaiian strain assemblies | ||
+ | * WormBook chapters | ||
+ | ** Chapter on genomes and annotation submitted | ||
+ | ** Comparative genomics chapter nearly there (needs a extra figure or two) | ||
+ | |||
+ | === Database migration === | ||
+ | * Deployed new curation database (the database currently known as "Genetomic") on fresh AWS instance, documenting along the way | ||
+ | * Overhauled security code for access to the database | ||
+ | ** wormbase.org accounts via the web-app | ||
+ | ** SSL certificate for script access | ||
+ | * Starting work porting some of the old name server scripts. Agreed that a 1-for-1 re-write not appropriate for new system (many operations will be simplified in the new system) | ||
+ | |||
+ | === ParaSite === | ||
+ | * Test site for release 3 live: test.parasite.wormbase.org | ||
+ | * Testing in progess | ||
+ | ** New Loa loa almost finished; | ||
+ | ** E. canadensis started | ||
+ | * FTP site nearly done. Needs release notes (KH) | ||
+ | * Home page now includes a panel summarising what is new this release | ||
+ | ** Two new genomes | ||
+ | ** Updates of 5 other genomes | ||
+ | ** REST API | ||
+ | ** New BLAST | ||
+ | ** RNASeq views for tapeworm genomes (Sanger data) | ||
+ | ** Various bug fixes | ||
+ | * For next release, need to import the gene descriptions (product names) genereated by Avril Coghlan as part of the 50 HGP analysis | ||
+ | |||
+ | === Curation / Data ingration === | ||
+ | |||
+ | * Caenorhabditis | ||
+ | ** EMBL dumping code changes to support UniProt-recommended product names | ||
+ | ** Cleanup of pseudogene Brief_identification, in prep for ENA submissions | ||
+ | ** Work done toward splitting C. briggsae "chrun" into its consituent supercontigs (with view to removing chrun, chrI_random etc in a future release) | ||
+ | * Parasitic worms | ||
+ | ** O. volvulus | ||
+ | *** Some gene model and GO term curation done(JL used EBI protein2go to do the GO curation) | ||
+ | ** S. ratti | ||
+ | *** Some curation will be done this week, to go into the WS250 build | ||
+ | *** S. ratti will become a UniProt reference proteome from October | ||
+ | **** Proteins will go into Panther, and thus be available in the TE tool | ||
+ | ** Brugia | ||
+ | *** v3 assembly ENA submission to saga seems to be nearing completion | ||
+ | *** v4 annotation transfer work on-going | ||
+ | *** UCSC hub for v3/v4 comparison (including CACTUS SNAKE track) | ||
+ | * Oschius | ||
+ | ** New nematode genome from South African lab | ||
+ | ** Performed CEGMA analysis of the assembly (MP). Found to be grossly incomplete. Reported back to the authors. Integration into WormBase on ice | ||
+ | |||
+ | |||
+ | |||
+ | |||
== July 16, 2015 == | == July 16, 2015 == | ||
In attendance: KH, MP, BB, JL, PD, GW, TD | In attendance: KH, MP, BB, JL, PD, GW, TD | ||
− | Minuted by | + | Minuted by MP |
− | Start 10: | + | Start 10:35 |
=== ParaSite release 3 === | === ParaSite release 3 === | ||
Line 16: | Line 80: | ||
* example ids are being reworked (BB/KH) | * example ids are being reworked (BB/KH) | ||
* steering group meeting coming up, so talks need to be planned (KLH) | * steering group meeting coming up, so talks need to be planned (KLH) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Infrastructure === | === Infrastructure === | ||
Line 32: | Line 87: | ||
** BitBucket seqcur tracker being moved by EBI systems (KH) | ** BitBucket seqcur tracker being moved by EBI systems (KH) | ||
** RT migration planned (MT) | ** RT migration planned (MT) | ||
+ | * Datomic | ||
+ | ** new GeneAce import (TD) | ||
+ | ** open for wider testing of the [http://db.wormbase.org:8220/view/gene/WBGene00003020 GenEnTonic/TrAceView] web interface, as backup/restore is like running a script 2minutes (TD,MT) | ||
+ | ** get temporary resources to test the build process (GW/KLH) | ||
+ | * should there be a separate ParaSite blog, to prevent too many parasite posts on the main blog? (BB,JL,KH) | ||
+ | * GW attended WebApollo seminar run by VectorBase / EnsEMBL-Protists (GW) | ||
=== Data Curation === | === Data Curation === | ||
− | * C.briggsae chromosome cleanup for chromosome un and rnd (PD) | + | * ''C.briggsae'' chromosome cleanup for chromosome un and rnd (PD) |
− | * clones removed from toplevel C.elegans chromosomes (PD) | + | * clones removed from toplevel ''C.elegans'' chromosomes (PD) |
* Brief_Identification cleanup for the collaboration with UniProt (PD) | * Brief_Identification cleanup for the collaboration with UniProt (PD) | ||
* tracking (last updated) rolled into the gene curation tool (PD) | * tracking (last updated) rolled into the gene curation tool (PD) | ||
** last reviewed needs to be also designed/added (PD) | ** last reviewed needs to be also designed/added (PD) | ||
− | * B.malayi v3.1 submission from WS248 | + | * ''B.malayi'' v3.1 submission from WS248 got delayed. Problems fixed and resubmitted (MP) |
* RNASeq (GW) | * RNASeq (GW) | ||
− | ** 432 new C.elegans experiments added to WS250 (GW) | + | ** 432 new ''C.elegans'' experiments added to WS250 (GW) |
** mockups of RNASeq display improvements as collaboration with Julie Ahringer (GW) | ** mockups of RNASeq display improvements as collaboration with Julie Ahringer (GW) | ||
* MassSpec (Wen / GW) | * MassSpec (Wen / GW) | ||
Line 47: | Line 108: | ||
** new peptides / translation levels / post-translational modifications | ** new peptides / translation levels / post-translational modifications | ||
− | === | + | === New genomes === |
− | * new C.remanei assembly and C.elegans Hawaii strain added to Cactus (MP) | + | * new ''C.remanei'' assembly and ''C.elegans'' Hawaii strain added to Cactus (MP) |
− | * B.malayi v4 | + | * new ''C.remanei'' alternate assembly added to build process / core-databases (PD) |
+ | * ''B.malayi'' v4 | ||
** RATT gene projection and Augustus prediction done (MP/Sanger Parasite groups) | ** RATT gene projection and Augustus prediction done (MP/Sanger Parasite groups) | ||
** STAR alignments for RNASeq done (GW) | ** STAR alignments for RNASeq done (GW) | ||
Line 154: | Line 216: | ||
*** Search dumps | *** Search dumps | ||
*** Testing! | *** Testing! | ||
− | *** 2 New Species and 1 changed | + | **** 2 New Species and 1 changed |
** Misc | ** Misc | ||
*** Forum for announcing new features | *** Forum for announcing new features |
Latest revision as of 11:04, 27 July 2015
July 2015
July 23, 2015
In attendance: KH, PD, MP, TD, JL, BB
General
- Build 250
- Plan to update all tier2 species for WS250
- Some species (e.g. P. pac) have few/no gene model changes, but could do with being brought up to latest models and have newest processing code run over them
- MP to create summary of gene model changes for each species
- Lot of work, probably too much for a single builder
- MP will initiate all builds, and then hand over to helpers
- Will include the new C. remanei and C. elegans Hawaiian strain assemblies
- WormBook chapters
- Chapter on genomes and annotation submitted
- Comparative genomics chapter nearly there (needs a extra figure or two)
Database migration
- Deployed new curation database (the database currently known as "Genetomic") on fresh AWS instance, documenting along the way
- Overhauled security code for access to the database
- wormbase.org accounts via the web-app
- SSL certificate for script access
- Starting work porting some of the old name server scripts. Agreed that a 1-for-1 re-write not appropriate for new system (many operations will be simplified in the new system)
ParaSite
- Test site for release 3 live: test.parasite.wormbase.org
- Testing in progess
- New Loa loa almost finished;
- E. canadensis started
- FTP site nearly done. Needs release notes (KH)
- Home page now includes a panel summarising what is new this release
- Two new genomes
- Updates of 5 other genomes
- REST API
- New BLAST
- RNASeq views for tapeworm genomes (Sanger data)
- Various bug fixes
- For next release, need to import the gene descriptions (product names) genereated by Avril Coghlan as part of the 50 HGP analysis
Curation / Data ingration
- Caenorhabditis
- EMBL dumping code changes to support UniProt-recommended product names
- Cleanup of pseudogene Brief_identification, in prep for ENA submissions
- Work done toward splitting C. briggsae "chrun" into its consituent supercontigs (with view to removing chrun, chrI_random etc in a future release)
- Parasitic worms
- O. volvulus
- Some gene model and GO term curation done(JL used EBI protein2go to do the GO curation)
- S. ratti
- Some curation will be done this week, to go into the WS250 build
- S. ratti will become a UniProt reference proteome from October
- Proteins will go into Panther, and thus be available in the TE tool
- Brugia
- v3 assembly ENA submission to saga seems to be nearing completion
- v4 annotation transfer work on-going
- UCSC hub for v3/v4 comparison (including CACTUS SNAKE track)
- O. volvulus
- Oschius
- New nematode genome from South African lab
- Performed CEGMA analysis of the assembly (MP). Found to be grossly incomplete. Reported back to the authors. Integration into WormBase on ice
July 16, 2015
In attendance: KH, MP, BB, JL, PD, GW, TD Minuted by MP Start 10:35
ParaSite release 3
- planned for 30th July
- RNASeq tracks with data from 2 publications ready for release (JL)
- assembly hubs will work on ParaSite and UCSC (JL)
- work is being doing to improve the display of BigWig files on ParaSite (JL,BB)
- webcode bugfixes rolled into ParaSite 3 (BB)
- Blast reworked to link results to WormBase-Core for WBC species and their GBrowse (BB)
- REST API needs some testing
- example ids are being reworked (BB/KH)
- steering group meeting coming up, so talks need to be planned (KLH)
Infrastructure
- production moved to GitHub, CVS only as archive (MP)
- bug tracker consolidation into Jira (PD)
- github sequcur tracker migrated (PD)
- BitBucket seqcur tracker being moved by EBI systems (KH)
- RT migration planned (MT)
- Datomic
- new GeneAce import (TD)
- open for wider testing of the GenEnTonic/TrAceView web interface, as backup/restore is like running a script 2minutes (TD,MT)
- get temporary resources to test the build process (GW/KLH)
- should there be a separate ParaSite blog, to prevent too many parasite posts on the main blog? (BB,JL,KH)
- GW attended WebApollo seminar run by VectorBase / EnsEMBL-Protists (GW)
Data Curation
- C.briggsae chromosome cleanup for chromosome un and rnd (PD)
- clones removed from toplevel C.elegans chromosomes (PD)
- Brief_Identification cleanup for the collaboration with UniProt (PD)
- tracking (last updated) rolled into the gene curation tool (PD)
- last reviewed needs to be also designed/added (PD)
- B.malayi v3.1 submission from WS248 got delayed. Problems fixed and resubmitted (MP)
- RNASeq (GW)
- 432 new C.elegans experiments added to WS250 (GW)
- mockups of RNASeq display improvements as collaboration with Julie Ahringer (GW)
- MassSpec (Wen / GW)
- single worm data from Angus Lamont’s lab Dundee
- new peptides / translation levels / post-translational modifications
New genomes
- new C.remanei assembly and C.elegans Hawaii strain added to Cactus (MP)
- new C.remanei alternate assembly added to build process / core-databases (PD)
- B.malayi v4
- RATT gene projection and Augustus prediction done (MP/Sanger Parasite groups)
- STAR alignments for RNASeq done (GW)
- ACeDB database done (MP)
- display and gene model refinements in process (MP)
July 9, 2015
In attendance: KH, MP, BB, JL, PD, GW Minuted by PD Start 16:00
Database migration
- Thomas last day 18th Sept.
- Documentation Documentation Documentation.....
- Ideal: Anyone can take an ACeDB database and load into a datomic instance
- Documentation is on the database github repository
- Not everyone has access to the db repository currently.
- There may be parts of docs that need to be more proscriptive
- Datomic set-up
- Gary had trouble setting up Datomic on his desktop machine
- Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
- Datomic build
- WS250 GW to take a 250 acedb and try and work solely from the documentation
- Any issues then use Thomas's expertise while he is still here and improve the documentation
- WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
- Models updates
- Done via an augmented models file to make the conversion.
- Simple model additions = Simple extension to the augmented file
- Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
- Tools development: Collonade and the AceDb like tree editor TrACeView
- Written in ClojureScript (Clojure compiled into JS)
- Thomas's replacement
- Originally a database developer, but now poss looking for some web dev skills
- Target a Clojure programmer?
Curation / Annotation
- New Brugia assembly (MP)
- state of the genome
- "Final" version
- "No bacterial contamination" as Avril's code has been used.
- Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
- Haplotype sequences in additional file
- Annotation
- 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
- Gene set will be Projected from old assembly with RATT + gap filling and extension from Augustus.
- Training Augustus on old models (Min 2 exons with good intergenic spacing).
- Issues will be:
- Partially mapped manually curated genes
- Curated and NOT mapped (Use Exonerate to recover).
- Substring models......already has code for flattening them down.
- STAR RNASeq alignments (GW)
- state of the genome
- New C. remanri genome (PD)
- Strain PX356, from Philllips lab (published)
- Generated setup configs etc.
- Have an e! database containing genome
- Annotations provided by user assessed and found to be bad
- CDS features found with more exons that CDS features under a parent mRNA.
- Strain PX356, from Philllips lab (published)
- C. elegans reference genome
- Yet another C. elegans reference sequence error paper (Zhang).
- Some examples look odd.......deletion in intron where others have reported only a SNP.
- Some inserts fix genes.
- Yet another C. elegans reference sequence error paper (Zhang).
- Re-working modENCODE RNASeq data
- Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
- Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.
- Gene curation
- Continues as usual
- Finished looking at IWM genes (PD)
- Organise meeting with new UniProt C. elegans curator
- Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)
Parasite
- RNASeq data display (JL & BB)
- bigwig track display for RNASeq data for 3 cestodes
- Tracks look of but working with various ideas for display
- Grouped by study
- Track scaling
- Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
- Possible to compare tracks but the scale will invariably be different.
- Need to get scaling right so as to not smear out points of biological interest.
- How do other browsers perform?
- Jbrowse appears to do same as e! browser
- Looks at UCSC track hub handling
- Possibly define a set of ubiquitous genes to normalise on?
- Gary uses ama-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
- Encode might has software for normalisation in rseq tools
- Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
- Imminent release of ParaSite 3 (~30th July) (BB + KH)
- Done
- FTP dumps
- blast dbs
- REST API
- To do
- MartBuild takes ~1week
- Clearing critical healthchecks
- Search dumps
- Testing!
- 2 New Species and 1 changed
- Misc
- Forum for announcing new features
- News feed page/blog
- What's new from old releases logged and poss turned into paragraphs for release blog?
- Possibly re-work the front page to give more prominence to news/information on what's new in the release.
- GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
- Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.
- Forum for announcing new features
- Done
Communication
- Working on retirement of the old Sanger CVS for pipeline code.
- Have always has a GIT mirroring, this will become primary source
- Need to consider the models and wspec in general (tagging etc)
- Need to communicate this with Caltech (new place to pick up models)
- Discussion about all the places we store work tickets
- RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
- https://bitbucket.org/pauld/seqcur_ticketing-system 84 tickets
- GitHub
- github website - website issues and helpdesk
- github pipeline - our code
- https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
- JIRA - EBI projects (incl Ensembl)
- Aim: rationalise
- Build code tickets should go on wormbase-pipeline github repor
- PD will work through the bitbucket ones.....mostly tier II
- JIRA (KH the main user of this system), should migrate to something visible project-wide
Misc
- Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)