Difference between revisions of "Hinxton 2015.07- Meeting minutes"

From WormBaseWiki
Jump to navigationJump to search
m
 
(32 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
= July 2015 =
 
= July 2015 =
 +
 +
== July 23, 2015 ==
 +
 +
In attendance: KH, PD, MP, TD, JL, BB
 +
 +
=== General ===
 +
 +
* Build 250
 +
** Plan to update all tier2 species for WS250
 +
** Some species (e.g. P. pac) have few/no gene model changes, but could do with being brought up to latest models and have newest processing code run over them
 +
** MP to create summary of gene model changes for each species
 +
** Lot of work, probably too much for a single builder
 +
** MP will initiate all builds, and then hand over to helpers
 +
** Will include the new C. remanei and C. elegans Hawaiian strain assemblies
 +
* WormBook chapters
 +
** Chapter on genomes and annotation submitted
 +
** Comparative genomics chapter nearly there (needs a extra figure or two)
 +
 +
=== Database migration ===
 +
* Deployed new curation database (the database currently known as "Genetomic") on fresh AWS instance, documenting along the way
 +
* Overhauled security code for access to the database
 +
** wormbase.org accounts via the web-app
 +
** SSL certificate for script access
 +
* Starting work porting some of the old name server scripts. Agreed that a 1-for-1 re-write not appropriate for new system (many operations will be simplified in the new system)
 +
 +
=== ParaSite ===
 +
* Test site for release 3 live: test.parasite.wormbase.org
 +
* Testing in progess
 +
** New Loa loa almost finished;
 +
** E. canadensis started
 +
* FTP site nearly done. Needs release notes (KH)
 +
* Home page now includes a panel summarising what is new this release
 +
** Two new genomes
 +
** Updates of 5 other genomes
 +
** REST API
 +
** New BLAST
 +
** RNASeq views for tapeworm genomes (Sanger data)
 +
** Various bug fixes
 +
* For next release, need to import the gene descriptions (product names) genereated by Avril Coghlan as part of the 50 HGP analysis
 +
 +
=== Curation / Data ingration ===
 +
 +
* Caenorhabditis
 +
** EMBL dumping code changes to support UniProt-recommended product names
 +
** Cleanup of pseudogene Brief_identification, in prep for ENA submissions
 +
** Work done toward splitting C. briggsae "chrun" into its consituent supercontigs (with view to removing chrun, chrI_random etc in a future release)
 +
* Parasitic worms
 +
** O. volvulus
 +
*** Some gene model and GO term curation done(JL used EBI protein2go to do the GO curation)
 +
** S. ratti
 +
*** Some curation will be done this week, to go into the WS250 build
 +
*** S. ratti will become a UniProt reference proteome from October
 +
**** Proteins will go into Panther, and thus be available in the TE tool
 +
** Brugia
 +
*** v3 assembly ENA submission to saga seems to be nearing completion
 +
*** v4 annotation transfer work on-going
 +
*** UCSC hub for v3/v4 comparison (including CACTUS SNAKE track)
 +
* Oschius
 +
** New nematode genome from South African lab
 +
** Performed CEGMA analysis of the assembly (MP). Found to be grossly incomplete. Reported back to the authors. Integration into WormBase on ice
 +
 +
 +
 +
 +
== July 16, 2015 ==
 +
 +
In attendance: KH, MP, BB, JL, PD, GW, TD
 +
Minuted by MP
 +
Start 10:35
 +
 +
=== ParaSite release 3 ===
 +
* planned for 30th July
 +
* RNASeq tracks with data from 2 publications ready for release (JL)
 +
* assembly hubs will work on ParaSite and UCSC (JL)
 +
* work is being doing to improve the display of BigWig files on ParaSite (JL,BB)
 +
* webcode bugfixes rolled into ParaSite 3 (BB)
 +
* Blast reworked to link results to WormBase-Core for WBC species and their GBrowse (BB)
 +
* REST API needs some testing
 +
* example ids are being reworked (BB/KH)
 +
* steering group meeting coming up, so talks need to be planned (KLH)
 +
 +
=== Infrastructure ===
 +
* production moved to GitHub, CVS only as archive (MP)
 +
* bug tracker consolidation into Jira (PD)
 +
** github sequcur tracker migrated (PD)
 +
** BitBucket seqcur tracker being moved by EBI systems (KH)
 +
** RT migration planned (MT)
 +
* Datomic
 +
** new GeneAce import (TD)
 +
** open for wider testing of the [http://db.wormbase.org:8220/view/gene/WBGene00003020 GenEnTonic/TrAceView] web interface, as backup/restore is like running a script 2minutes (TD,MT)
 +
** get temporary resources to test the build process (GW/KLH)
 +
* should there be a separate ParaSite blog, to prevent too many parasite posts on the main blog? (BB,JL,KH)
 +
* GW attended WebApollo seminar run by VectorBase / EnsEMBL-Protists (GW)
 +
 +
=== Data Curation ===
 +
* ''C.briggsae'' chromosome cleanup for chromosome un and rnd (PD)
 +
* clones removed from toplevel ''C.elegans'' chromosomes (PD)
 +
* Brief_Identification cleanup for the collaboration with UniProt (PD)
 +
* tracking (last updated) rolled into the gene curation tool (PD)
 +
** last reviewed needs to be also designed/added (PD)
 +
* ''B.malayi'' v3.1 submission from WS248 got delayed. Problems fixed and resubmitted (MP)
 +
* RNASeq (GW)
 +
** 432 new ''C.elegans'' experiments added to WS250 (GW)
 +
** mockups of RNASeq display improvements as collaboration with Julie Ahringer (GW)
 +
* MassSpec (Wen / GW)
 +
** single worm data from Angus Lamont’s lab Dundee
 +
** new peptides / translation levels / post-translational modifications
 +
 +
=== New genomes ===
 +
* new ''C.remanei'' assembly and ''C.elegans'' Hawaii strain added to Cactus (MP)
 +
* new ''C.remanei'' alternate assembly added to build process / core-databases (PD)
 +
* ''B.malayi'' v4
 +
** RATT gene projection and Augustus prediction done (MP/Sanger Parasite groups)
 +
** STAR alignments for RNASeq done (GW)
 +
** ACeDB database done (MP)
 +
** display and gene model refinements in process (MP)
  
 
== July 9, 2015 ==
 
== July 9, 2015 ==
 +
 +
In attendance: KH, MP, BB, JL, PD, GW
 +
Minuted by PD
 +
Start 16:00
 +
 +
=== Database migration ===
 +
 +
* Thomas last day 18th Sept.
 +
* Documentation Documentation Documentation.....
 +
** Ideal: Anyone can take an ACeDB  database and load into a datomic instance
 +
** Documentation is on the [https://github.com/WormBase/db/wiki database github repository]
 +
** Not everyone has access to the db repository currently.
 +
** There may be parts of docs that need to be more proscriptive
 +
* Datomic set-up
 +
** Gary had trouble setting up Datomic on his desktop machine
 +
** Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
 +
* Datomic build
 +
** WS250 GW to take a 250 acedb and try and work solely from the documentation
 +
** Any issues then use Thomas's expertise while he is still here and improve the documentation
 +
** WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
 +
* Models updates
 +
** Done via an augmented models file to make the conversion.
 +
** Simple model additions = Simple extension to the augmented file
 +
** Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
 +
* Tools development: [http://db.wormbase.org:8120/colonnade Collonade] and the AceDb like tree editor [http://db.wormbase.org:8120/view/gene/WBGene00004013 TrACeView]
 +
** Written in [https://github.com/clojure/clojurescript ClojureScript] (Clojure compiled into JS)
 +
* Thomas's replacement
 +
** Originally a database developer, but now poss looking for some web dev skills
 +
** Target a Clojure programmer?
 +
 +
 +
=== Curation / Annotation ===
 +
 +
* New Brugia assembly (MP)
 +
** state of the genome
 +
*** "Final" version
 +
*** "No bacterial contamination" as Avril's code has been used.
 +
*** Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
 +
*** Haplotype sequences in additional file
 +
** Annotation
 +
*** 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
 +
*** Gene set will be Projected from old assembly with RATT + gap filling and extension from Augustus.
 +
*** Training Augustus on old models (Min 2 exons with good intergenic spacing).
 +
*** Issues will be:
 +
**** Partially mapped manually curated genes
 +
**** Curated and NOT mapped (Use Exonerate to recover).
 +
**** Substring models......already has code for flattening them down.
 +
** STAR RNASeq alignments (GW)
 +
 +
* New C. remanri genome (PD)
 +
** Strain PX356, from Philllips lab (published)
 +
*** Generated setup configs etc.
 +
*** Have an e! database containing genome
 +
*** Annotations provided by user assessed and found to be bad
 +
**** CDS features found with more exons that CDS features under a parent mRNA.
 +
 +
* C. elegans reference genome
 +
** Yet another C. elegans reference sequence error paper (Zhang).
 +
*** Some examples look odd.......deletion in intron where others have reported only a SNP.
 +
*** Some inserts fix genes.
 +
 +
* Re-working modENCODE RNASeq data
 +
** Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
 +
** Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.
 +
 +
* Gene curation
 +
** Continues as usual
 +
** Finished looking at IWM genes (PD)
 +
** Organise meeting with new UniProt C. elegans curator
 +
** Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)
 +
 +
=== Parasite ===
 +
 +
* RNASeq data display (JL & BB)
 +
** bigwig track display for RNASeq data for 3 cestodes
 +
** Tracks look of but working with various ideas for display
 +
** Grouped by study
 +
** Track scaling
 +
*** Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
 +
*** Possible to compare tracks but the scale will invariably be different.
 +
*** Need to get scaling right so as to not smear out points of biological interest.
 +
*** How do other browsers perform?
 +
**** Jbrowse appears to do same as e! browser
 +
**** Looks at UCSC track hub handling
 +
** Possibly define a set of ubiquitous genes to normalise on?
 +
*** Gary uses ama-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
 +
*** Encode might has software for normalisation in rseq tools
 +
** Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
 +
* Imminent release of ParaSite 3 (~30th July) (BB + KH)
 +
** Done
 +
*** FTP dumps
 +
*** blast dbs
 +
*** REST API
 +
** To do
 +
*** MartBuild takes ~1week
 +
*** Clearing critical healthchecks
 +
*** Search dumps
 +
*** Testing!
 +
**** 2 New Species and 1 changed
 +
** Misc
 +
*** Forum for announcing new features
 +
**** News feed page/blog
 +
**** What's new from old releases logged and poss turned into paragraphs for release blog?
 +
**** Possibly re-work the front page to give more prominence to news/information on what's new in the release.
 +
*** GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
 +
*** Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.
 +
 +
=== Communication ===
 +
 +
* Working on retirement of the old Sanger CVS for pipeline code.
 +
** Have always has a GIT mirroring, this will become primary source
 +
** Need to consider the models and wspec in general (tagging etc)
 +
** Need to communicate this with Caltech (new place to pick up models)
 +
* Discussion about all the places we store work tickets
 +
** RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
 +
** https://bitbucket.org/pauld/seqcur_ticketing-system 84 tickets
 +
** GitHub
 +
*** github website - website issues and helpdesk
 +
*** github pipeline - our code
 +
*** https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
 +
**JIRA - EBI projects (incl Ensembl)
 +
** Aim: rationalise
 +
*** Build code tickets should go on wormbase-pipeline github repor
 +
*** PD will work through the bitbucket ones.....mostly tier II
 +
*** JIRA (KH the main user of this system), should migrate to something visible project-wide
 +
 +
=== Misc ===
 +
 +
* Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)

Latest revision as of 11:04, 27 July 2015

July 2015

July 23, 2015

In attendance: KH, PD, MP, TD, JL, BB

General

  • Build 250
    • Plan to update all tier2 species for WS250
    • Some species (e.g. P. pac) have few/no gene model changes, but could do with being brought up to latest models and have newest processing code run over them
    • MP to create summary of gene model changes for each species
    • Lot of work, probably too much for a single builder
    • MP will initiate all builds, and then hand over to helpers
    • Will include the new C. remanei and C. elegans Hawaiian strain assemblies
  • WormBook chapters
    • Chapter on genomes and annotation submitted
    • Comparative genomics chapter nearly there (needs a extra figure or two)

Database migration

  • Deployed new curation database (the database currently known as "Genetomic") on fresh AWS instance, documenting along the way
  • Overhauled security code for access to the database
    • wormbase.org accounts via the web-app
    • SSL certificate for script access
  • Starting work porting some of the old name server scripts. Agreed that a 1-for-1 re-write not appropriate for new system (many operations will be simplified in the new system)

ParaSite

  • Test site for release 3 live: test.parasite.wormbase.org
  • Testing in progess
    • New Loa loa almost finished;
    • E. canadensis started
  • FTP site nearly done. Needs release notes (KH)
  • Home page now includes a panel summarising what is new this release
    • Two new genomes
    • Updates of 5 other genomes
    • REST API
    • New BLAST
    • RNASeq views for tapeworm genomes (Sanger data)
    • Various bug fixes
  • For next release, need to import the gene descriptions (product names) genereated by Avril Coghlan as part of the 50 HGP analysis

Curation / Data ingration

  • Caenorhabditis
    • EMBL dumping code changes to support UniProt-recommended product names
    • Cleanup of pseudogene Brief_identification, in prep for ENA submissions
    • Work done toward splitting C. briggsae "chrun" into its consituent supercontigs (with view to removing chrun, chrI_random etc in a future release)
  • Parasitic worms
    • O. volvulus
      • Some gene model and GO term curation done(JL used EBI protein2go to do the GO curation)
    • S. ratti
      • Some curation will be done this week, to go into the WS250 build
      • S. ratti will become a UniProt reference proteome from October
        • Proteins will go into Panther, and thus be available in the TE tool
    • Brugia
      • v3 assembly ENA submission to saga seems to be nearing completion
      • v4 annotation transfer work on-going
      • UCSC hub for v3/v4 comparison (including CACTUS SNAKE track)
  • Oschius
    • New nematode genome from South African lab
    • Performed CEGMA analysis of the assembly (MP). Found to be grossly incomplete. Reported back to the authors. Integration into WormBase on ice



July 16, 2015

In attendance: KH, MP, BB, JL, PD, GW, TD Minuted by MP Start 10:35

ParaSite release 3

  • planned for 30th July
  • RNASeq tracks with data from 2 publications ready for release (JL)
  • assembly hubs will work on ParaSite and UCSC (JL)
  • work is being doing to improve the display of BigWig files on ParaSite (JL,BB)
  • webcode bugfixes rolled into ParaSite 3 (BB)
  • Blast reworked to link results to WormBase-Core for WBC species and their GBrowse (BB)
  • REST API needs some testing
  • example ids are being reworked (BB/KH)
  • steering group meeting coming up, so talks need to be planned (KLH)

Infrastructure

  • production moved to GitHub, CVS only as archive (MP)
  • bug tracker consolidation into Jira (PD)
    • github sequcur tracker migrated (PD)
    • BitBucket seqcur tracker being moved by EBI systems (KH)
    • RT migration planned (MT)
  • Datomic
    • new GeneAce import (TD)
    • open for wider testing of the GenEnTonic/TrAceView web interface, as backup/restore is like running a script 2minutes (TD,MT)
    • get temporary resources to test the build process (GW/KLH)
  • should there be a separate ParaSite blog, to prevent too many parasite posts on the main blog? (BB,JL,KH)
  • GW attended WebApollo seminar run by VectorBase / EnsEMBL-Protists (GW)

Data Curation

  • C.briggsae chromosome cleanup for chromosome un and rnd (PD)
  • clones removed from toplevel C.elegans chromosomes (PD)
  • Brief_Identification cleanup for the collaboration with UniProt (PD)
  • tracking (last updated) rolled into the gene curation tool (PD)
    • last reviewed needs to be also designed/added (PD)
  • B.malayi v3.1 submission from WS248 got delayed. Problems fixed and resubmitted (MP)
  • RNASeq (GW)
    • 432 new C.elegans experiments added to WS250 (GW)
    • mockups of RNASeq display improvements as collaboration with Julie Ahringer (GW)
  • MassSpec (Wen / GW)
    • single worm data from Angus Lamont’s lab Dundee
    • new peptides / translation levels / post-translational modifications

New genomes

  • new C.remanei assembly and C.elegans Hawaii strain added to Cactus (MP)
  • new C.remanei alternate assembly added to build process / core-databases (PD)
  • B.malayi v4
    • RATT gene projection and Augustus prediction done (MP/Sanger Parasite groups)
    • STAR alignments for RNASeq done (GW)
    • ACeDB database done (MP)
    • display and gene model refinements in process (MP)

July 9, 2015

In attendance: KH, MP, BB, JL, PD, GW Minuted by PD Start 16:00

Database migration

  • Thomas last day 18th Sept.
  • Documentation Documentation Documentation.....
    • Ideal: Anyone can take an ACeDB database and load into a datomic instance
    • Documentation is on the database github repository
    • Not everyone has access to the db repository currently.
    • There may be parts of docs that need to be more proscriptive
  • Datomic set-up
    • Gary had trouble setting up Datomic on his desktop machine
    • Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
  • Datomic build
    • WS250 GW to take a 250 acedb and try and work solely from the documentation
    • Any issues then use Thomas's expertise while he is still here and improve the documentation
    • WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
  • Models updates
    • Done via an augmented models file to make the conversion.
    • Simple model additions = Simple extension to the augmented file
    • Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
  • Tools development: Collonade and the AceDb like tree editor TrACeView
  • Thomas's replacement
    • Originally a database developer, but now poss looking for some web dev skills
    • Target a Clojure programmer?


Curation / Annotation

  • New Brugia assembly (MP)
    • state of the genome
      • "Final" version
      • "No bacterial contamination" as Avril's code has been used.
      • Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
      • Haplotype sequences in additional file
    • Annotation
      • 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
      • Gene set will be Projected from old assembly with RATT + gap filling and extension from Augustus.
      • Training Augustus on old models (Min 2 exons with good intergenic spacing).
      • Issues will be:
        • Partially mapped manually curated genes
        • Curated and NOT mapped (Use Exonerate to recover).
        • Substring models......already has code for flattening them down.
    • STAR RNASeq alignments (GW)
  • New C. remanri genome (PD)
    • Strain PX356, from Philllips lab (published)
      • Generated setup configs etc.
      • Have an e! database containing genome
      • Annotations provided by user assessed and found to be bad
        • CDS features found with more exons that CDS features under a parent mRNA.
  • C. elegans reference genome
    • Yet another C. elegans reference sequence error paper (Zhang).
      • Some examples look odd.......deletion in intron where others have reported only a SNP.
      • Some inserts fix genes.
  • Re-working modENCODE RNASeq data
    • Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
    • Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.
  • Gene curation
    • Continues as usual
    • Finished looking at IWM genes (PD)
    • Organise meeting with new UniProt C. elegans curator
    • Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)

Parasite

  • RNASeq data display (JL & BB)
    • bigwig track display for RNASeq data for 3 cestodes
    • Tracks look of but working with various ideas for display
    • Grouped by study
    • Track scaling
      • Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
      • Possible to compare tracks but the scale will invariably be different.
      • Need to get scaling right so as to not smear out points of biological interest.
      • How do other browsers perform?
        • Jbrowse appears to do same as e! browser
        • Looks at UCSC track hub handling
    • Possibly define a set of ubiquitous genes to normalise on?
      • Gary uses ama-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
      • Encode might has software for normalisation in rseq tools
    • Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
  • Imminent release of ParaSite 3 (~30th July) (BB + KH)
    • Done
      • FTP dumps
      • blast dbs
      • REST API
    • To do
      • MartBuild takes ~1week
      • Clearing critical healthchecks
      • Search dumps
      • Testing!
        • 2 New Species and 1 changed
    • Misc
      • Forum for announcing new features
        • News feed page/blog
        • What's new from old releases logged and poss turned into paragraphs for release blog?
        • Possibly re-work the front page to give more prominence to news/information on what's new in the release.
      • GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
      • Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.

Communication

  • Working on retirement of the old Sanger CVS for pipeline code.
    • Have always has a GIT mirroring, this will become primary source
    • Need to consider the models and wspec in general (tagging etc)
    • Need to communicate this with Caltech (new place to pick up models)
  • Discussion about all the places we store work tickets

Misc

  • Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)