Difference between revisions of "Hinxton 2015.07- Meeting minutes"

From WormBaseWiki
Jump to navigationJump to search
Line 2: Line 2:
  
 
== July 9, 2015 ==
 
== July 9, 2015 ==
 +
 +
In attendance: KH, MP, BB, JL, PD, GW
 +
Minuted by PD
 +
Start 16:00
 +
 +
=== Thomas's Departure ===
 +
 +
*Last day 18th Sept.
 +
* Documentation Documentation Documentation.....
 +
** Ideal: Anyone can take an ACeDB  database and load into a datomic instance
 +
** Documentation is on the database github repository
 +
*** Not everyone has access at the moment.
 +
*** There may be elements that Thomas assumes are so trivial that they have not bee documented
 +
 +
* Action items:
 +
** WS250 Gary to take a 250 acedb and try and work solely from the documentation
 +
*** Any issues then use Thomas's expertise while he is still here and improve the documentation
 +
** WS249 could be used as a test case even though a datomic already exists????
 +
 +
* Models updates are done via an augmented models file to make the conversion.
 +
** Simple model additions = Simple extension to the augmented file
 +
**Complexed changes/reworking of classes = augmented file meeds munging.......need to look at this process
 +
 +
* Gary has been through some of the steps necessary to set up a Datomic database, but this was on his desktop machine that brought with it architecture issues so defaulted to an instance that Thomas had set up.
 +
** Better test would be to document/try and set up Datomic from a fresh AWS instance? can it be done from documentation?
 +
 +
* Tools development: Collonade and the AceDb like tree editor
 +
**Written in ClosureScript (Closure compiled into JS)
 +
 +
=== Job Advert ===
 +
 +
*Originally a database developer, but now poss looking for some web dev skills
 +
**Closure programming
 +
 +
 +
=== Parasite ===
 +
 +
==== Working on ====
 +
 +
* bigwig track display for 3 cestodes
 +
** Tracks look of but working with various ideas for display
 +
** Grouped by study
 +
** Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
 +
*** Webcode doesn't appear to have mechanism to sort scaling
 +
** Possible to compare tracks but the scale will invariably be different.
 +
** Need to get correct so as to not smear out points of biological interest.
 +
*** How do other browsers perform?
 +
**** Jbrowse appears to do same as e! browser
 +
**** Looks at UCSC track hub handling
 +
** Possibly define a set of ubiquitous genes to normalise on?
 +
** Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
 +
** Encode might has software for normalisation in rseq tools
 +
*** Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
 +
 +
* Imminent release of ParaSite 3 (~30th July)
 +
** Finalising
 +
*** blast dbs
 +
*** REST
 +
*** healthchecks
 +
** Other
 +
*** GO_term population, dropped by EG as have UniProt assistance, we do not so have to run this pipeline.
 +
*** Ready for search dumps, if not critical healthcheck failures
 +
*** GFF dumps all in place
 +
*** Need to do testing
 +
**** 2 New Species and 1 changed
 +
*** MartBuild takes ~1week
 +
 +
* Better forum for announcing new features
 +
** News feed page/blog
 +
** What's new from old releases logged and poss turned into paragraphs for release blog?
 +
** Possibly re-work the front page to give more prominence to news/information on what's new in the release.
 +
 +
* Links back to WormBase central are in place for release 3.
 +
 +
* Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.
 +
 +
=== WormBase Central ===
 +
 +
* Gary
 +
** STAR RNASeq alignments for the brugia v4 assembly
 +
** Yet another C. elegans reference sequence error paper.
 +
*** Some examples look odd.......deletion in intron where others have reported only a SNP.
 +
*** Some inserts fix genes.
 +
** Working on test code for generating MODEncode expression graphs with the scientific scrutiny of J.A.
 +
** Gary will work with Sibyl once the science is happy to try and utilise the code for web usage.
 +
 +
* Michael
 +
** Ann Hart genes conserved across Caenorhabditis, checking for motifs
 +
** New brugia assembly
 +
*** Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
 +
*** Training Augustus on old models (Min 2 exons with good intergenic spacing).
 +
*** This is the FINAL version of the genome
 +
**** reasonably finished state
 +
**** Haplotype sequences in additional file
 +
**** No bacterial contamination as Avril's code has been used.
 +
**** 18% duplication in CEGMA genes.........old assembly <10% TIGR >20%
 +
*** Gene set will be Projected from old assembly + gap filling and extension from Augustus.
 +
**** Issues will be:
 +
***** Partially mapped manually curated genes
 +
***** Curated and NOT mapped (Use Exonerate to recover).
 +
***** Substring models......already has code for flattening them down.
 +
 +
* Paul
 +
** Working on retirement of the old Sanger CVS for pipeline code.
 +
*** Have always has a GIT mirroring, this will become primary source
 +
*** Need to consider the models and wspec in general.
 +
*** C. remanei PX356
 +
**** Generated setup configs etc.
 +
**** Have an e! database containing genome
 +
**** Annotations provided by user assessed and found to be bad
 +
***** CDS features found with more exons that CDS features under a parent mRNA.
 +
*** Trying to meet with the new Uniprot C. elegans curator
 +
*** Finished looking at IWM genes
 +
 +
* Discussion about all the places we store work tickets
 +
** RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
 +
** https://bitbucket.org/pauld/seqcur_ticketing-system 84 tickets
 +
** GitHub
 +
*** github website - website issues and helpdesk
 +
*** github pipeline - our code
 +
*** https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
 +
**JIRA - ensembl genomes
 +
*Need to limit to as few as possible
 +
** What can be lost
 +
*** Paul will work through the bitbucket ones.....mostly tier II
 +
*** JIRA kev is the main user of this system
 +
 +
* Need to discuss with the projects as a whole!!!!!!!!

Revision as of 10:22, 10 July 2015

July 2015

July 9, 2015

In attendance: KH, MP, BB, JL, PD, GW Minuted by PD Start 16:00

Thomas's Departure

  • Last day 18th Sept.
  • Documentation Documentation Documentation.....
    • Ideal: Anyone can take an ACeDB database and load into a datomic instance
    • Documentation is on the database github repository
      • Not everyone has access at the moment.
      • There may be elements that Thomas assumes are so trivial that they have not bee documented
  • Action items:
    • WS250 Gary to take a 250 acedb and try and work solely from the documentation
      • Any issues then use Thomas's expertise while he is still here and improve the documentation
    • WS249 could be used as a test case even though a datomic already exists????
  • Models updates are done via an augmented models file to make the conversion.
    • Simple model additions = Simple extension to the augmented file
    • Complexed changes/reworking of classes = augmented file meeds munging.......need to look at this process
  • Gary has been through some of the steps necessary to set up a Datomic database, but this was on his desktop machine that brought with it architecture issues so defaulted to an instance that Thomas had set up.
    • Better test would be to document/try and set up Datomic from a fresh AWS instance? can it be done from documentation?
  • Tools development: Collonade and the AceDb like tree editor
    • Written in ClosureScript (Closure compiled into JS)

Job Advert

  • Originally a database developer, but now poss looking for some web dev skills
    • Closure programming


Parasite

Working on

  • bigwig track display for 3 cestodes
    • Tracks look of but working with various ideas for display
    • Grouped by study
    • Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
      • Webcode doesn't appear to have mechanism to sort scaling
    • Possible to compare tracks but the scale will invariably be different.
    • Need to get correct so as to not smear out points of biological interest.
      • How do other browsers perform?
        • Jbrowse appears to do same as e! browser
        • Looks at UCSC track hub handling
    • Possibly define a set of ubiquitous genes to normalise on?
    • Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
    • Encode might has software for normalisation in rseq tools
      • Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
  • Imminent release of ParaSite 3 (~30th July)
    • Finalising
      • blast dbs
      • REST
      • healthchecks
    • Other
      • GO_term population, dropped by EG as have UniProt assistance, we do not so have to run this pipeline.
      • Ready for search dumps, if not critical healthcheck failures
      • GFF dumps all in place
      • Need to do testing
        • 2 New Species and 1 changed
      • MartBuild takes ~1week
  • Better forum for announcing new features
    • News feed page/blog
    • What's new from old releases logged and poss turned into paragraphs for release blog?
    • Possibly re-work the front page to give more prominence to news/information on what's new in the release.
  • Links back to WormBase central are in place for release 3.
  • Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.

WormBase Central

  • Gary
    • STAR RNASeq alignments for the brugia v4 assembly
    • Yet another C. elegans reference sequence error paper.
      • Some examples look odd.......deletion in intron where others have reported only a SNP.
      • Some inserts fix genes.
    • Working on test code for generating MODEncode expression graphs with the scientific scrutiny of J.A.
    • Gary will work with Sibyl once the science is happy to try and utilise the code for web usage.
  • Michael
    • Ann Hart genes conserved across Caenorhabditis, checking for motifs
    • New brugia assembly
      • Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
      • Training Augustus on old models (Min 2 exons with good intergenic spacing).
      • This is the FINAL version of the genome
        • reasonably finished state
        • Haplotype sequences in additional file
        • No bacterial contamination as Avril's code has been used.
        • 18% duplication in CEGMA genes.........old assembly <10% TIGR >20%
      • Gene set will be Projected from old assembly + gap filling and extension from Augustus.
        • Issues will be:
          • Partially mapped manually curated genes
          • Curated and NOT mapped (Use Exonerate to recover).
          • Substring models......already has code for flattening them down.
  • Paul
    • Working on retirement of the old Sanger CVS for pipeline code.
      • Have always has a GIT mirroring, this will become primary source
      • Need to consider the models and wspec in general.
      • C. remanei PX356
        • Generated setup configs etc.
        • Have an e! database containing genome
        • Annotations provided by user assessed and found to be bad
          • CDS features found with more exons that CDS features under a parent mRNA.
      • Trying to meet with the new Uniprot C. elegans curator
      • Finished looking at IWM genes
  • Need to discuss with the projects as a whole!!!!!!!!