Difference between revisions of "Hinxton 2015.07- Meeting minutes"

From WormBaseWiki
Jump to navigationJump to search
Line 7: Line 7:
 
Start 16:00
 
Start 16:00
  
=== Thomas's Departure ===
+
=== Database migration ===
  
*Last day 18th Sept.
+
* Thomas last day 18th Sept.
 
* Documentation Documentation Documentation.....
 
* Documentation Documentation Documentation.....
 
** Ideal: Anyone can take an ACeDB  database and load into a datomic instance
 
** Ideal: Anyone can take an ACeDB  database and load into a datomic instance
 
** Documentation is on the [https://github.com/WormBase/db/wiki database github repository]
 
** Documentation is on the [https://github.com/WormBase/db/wiki database github repository]
*** Not everyone has access at the moment.
+
** Not everyone has access to the db repository currently.
*** There may be elements that Thomas assumes are so trivial that they have not bee documented
+
** There may be parts of docs that need to be more proscriptive
 +
* Datomic set-up
 +
** Gary had trouble setting up Datomic on his desktop machine
 +
** Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
 +
* Datomic build
 +
** WS250 GW to take a 250 acedb and try and work solely from the documentation
 +
** Any issues then use Thomas's expertise while he is still here and improve the documentation
 +
** WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
 +
* Models updates
 +
** Done via an augmented models file to make the conversion.
 +
** Simple model additions = Simple extension to the augmented file
 +
** Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
 +
* Tools development: [http://db.wormbase.org:8120/colonnade Collonade] and the AceDb like tree editor [http://db.wormbase.org:8120/view/gene/WBGene00004013 TrACeView]
 +
** Written in [https://github.com/clojure/clojurescript ClojureScript] (Clojure compiled into JS)
 +
* Thomas's replacement
 +
** Originally a database developer, but now poss looking for some web dev skills
 +
** Target a Clojure programmer?
 +
 
  
* Action items:
+
=== Curation / Annotation ===
** WS250 Gary to take a 250 acedb and try and work solely from the documentation
 
*** Any issues then use Thomas's expertise while he is still here and improve the documentation
 
** WS249 could be used as a test case even though a datomic already exists????
 
  
* Models updates are done via an augmented models file to make the conversion.
+
* New Brugia assembly (MP)
** Simple model additions = Simple extension to the augmented file
+
** state of the genome
**Complexed changes/reworking of classes = augmented file meeds munging.......need to look at this process
+
*** "Final" version
 +
*** "No bacterial contamination" as Avril's code has been used.
 +
*** Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
 +
*** Haplotype sequences in additional file
 +
** Annotation
 +
*** 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
 +
*** Gene set will be Projected from old assembly + gap filling and extension from Augustus.
 +
*** Training Augustus on old models (Min 2 exons with good intergenic spacing).
 +
*** Issues will be:
 +
**** Partially mapped manually curated genes
 +
**** Curated and NOT mapped (Use Exonerate to recover).
 +
**** Substring models......already has code for flattening them down.
 +
** STAR RNASeq alignments (GW)
  
* Gary has been through some of the steps necessary to set up a Datomic database, but this was on his desktop machine that brought with it architecture issues so defaulted to an instance that Thomas had set up.
+
* New C. remanri genome (PD)
** Better test would be to document/try and set up Datomic from a fresh AWS instance? can it be done from documentation?
+
** Strain PX356, from Philllips lab (published)
 +
*** Generated setup configs etc.
 +
*** Have an e! database containing genome
 +
*** Annotations provided by user assessed and found to be bad
 +
**** CDS features found with more exons that CDS features under a parent mRNA.
  
* Tools development: [http://db.wormbase.org:8120/colonnade Collonade] and the AceDb like tree editor [http://db.wormbase.org:8120/view/gene/WBGene00004013 TrACeView]
+
* C. elegans reference genome
**Written in [https://github.com/clojure/clojurescript ClojureScript] (Clojure compiled into JS)
+
** Yet another C. elegans reference sequence error paper (Zhang).
 +
*** Some examples look odd.......deletion in intron where others have reported only a SNP.
 +
*** Some inserts fix genes.
  
=== Job Advert ===
+
* Re-working modENCODE RNASeq data
 +
** Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
 +
** Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.
  
*Originally a database developer, but now poss looking for some web dev skills
+
* Gene curation
**Clojure programming
+
** Continues as usual
 +
** Finished looking at IWM genes (PD)
 +
** Organise meeting with new UniProt C. elegans curator
 +
** Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)
  
  
 
=== Parasite ===
 
=== Parasite ===
  
* Jane and Bruce
+
* RNASeq data display (JL & BB)
* bigwig track display for 3 cestodes
+
** bigwig track display for RNASeq data for 3 cestodes
 
** Tracks look of but working with various ideas for display
 
** Tracks look of but working with various ideas for display
 
** Grouped by study
 
** Grouped by study
** Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
+
** Track scaling
*** Webcode doesn't appear to have mechanism to sort scaling
+
*** Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
** Possible to compare tracks but the scale will invariably be different.
+
*** Possible to compare tracks but the scale will invariably be different.
** Need to get correct so as to not smear out points of biological interest.
+
*** Need to get scaling right so as to not smear out points of biological interest.
 
*** How do other browsers perform?
 
*** How do other browsers perform?
 
**** Jbrowse appears to do same as e! browser
 
**** Jbrowse appears to do same as e! browser
 
**** Looks at UCSC track hub handling
 
**** Looks at UCSC track hub handling
 
** Possibly define a set of ubiquitous genes to normalise on?
 
** Possibly define a set of ubiquitous genes to normalise on?
** Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
+
*** Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
** Encode might has software for normalisation in rseq tools
+
*** Encode might has software for normalisation in rseq tools
*** Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
+
** Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
 
+
* Imminent release of ParaSite 3 (~30th July) (BB + KH)
* Imminent release of ParaSite 3 (~30th July)
+
** Done
** Finalising
+
*** FTP dumps
 
*** blast dbs
 
*** blast dbs
*** REST
+
*** REST API
*** healthchecks
+
** To do
** Other
 
*** GO_term population, dropped by EG as have UniProt assistance, we do not so have to run this pipeline.
 
*** Ready for search dumps, if not critical healthcheck failures
 
*** GFF dumps all in place
 
*** Need to do testing
 
**** 2 New Species and 1 changed
 
 
*** MartBuild takes ~1week
 
*** MartBuild takes ~1week
 +
*** Clearing critical healthchecks
 +
*** Search dumps
 +
*** Testing!
 +
*** 2 New Species and 1 changed
 +
** Misc
 +
*** Forum for announcing new features
 +
**** News feed page/blog
 +
**** What's new from old releases logged and poss turned into paragraphs for release blog?
 +
**** Possibly re-work the front page to give more prominence to news/information on what's new in the release.
 +
*** GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
 +
*** Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.
  
* Better forum for announcing new features
 
** News feed page/blog
 
** What's new from old releases logged and poss turned into paragraphs for release blog?
 
** Possibly re-work the front page to give more prominence to news/information on what's new in the release.
 
 
* Links back to WormBase central are in place for release 3.
 
 
* Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.
 
 
=== WormBase Central ===
 
  
* Gary
+
=== Communication ===
** STAR RNASeq alignments for the brugia v4 assembly
 
** Yet another C. elegans reference sequence error paper.
 
*** Some examples look odd.......deletion in intron where others have reported only a SNP.
 
*** Some inserts fix genes.
 
** Working on test code for generating MODEncode expression graphs with the scientific scrutiny of J.A.
 
** Gary will work with Sibyl once the science is happy to try and utilise the code for web usage.
 
 
 
* Michael
 
** for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus
 
** New brugia assembly
 
*** Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
 
*** Training Augustus on old models (Min 2 exons with good intergenic spacing).
 
*** This is the FINAL version of the genome
 
**** reasonably finished state
 
**** Haplotype sequences in additional file
 
**** No bacterial contamination as Avril's code has been used.
 
**** 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
 
*** Gene set will be Projected from old assembly + gap filling and extension from Augustus.
 
**** Issues will be:
 
***** Partially mapped manually curated genes
 
***** Curated and NOT mapped (Use Exonerate to recover).
 
***** Substring models......already has code for flattening them down.
 
 
 
* Paul
 
** Working on retirement of the old Sanger CVS for pipeline code.
 
*** Have always has a GIT mirroring, this will become primary source
 
*** Need to consider the models and wspec in general.
 
*** C. remanei PX356
 
**** Generated setup configs etc.
 
**** Have an e! database containing genome
 
**** Annotations provided by user assessed and found to be bad
 
***** CDS features found with more exons that CDS features under a parent mRNA.
 
*** Trying to meet with the new Uniprot C. elegans curator
 
*** Finished looking at IWM genes
 
  
 +
* Working on retirement of the old Sanger CVS for pipeline code.
 +
** Have always has a GIT mirroring, this will become primary source
 +
** Need to consider the models and wspec in general (tagging etc)
 +
** Need to communicate this with Caltech (new place to pick up models)
 
* Discussion about all the places we store work tickets
 
* Discussion about all the places we store work tickets
 
** RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
 
** RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
Line 122: Line 124:
 
*** github pipeline - our code
 
*** github pipeline - our code
 
*** https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
 
*** https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
**JIRA - ensembl genomes
+
**JIRA - EBI projects (incl Ensembl)
*Need to limit to as few as possible
+
** Aim: rationalise
** What can be lost
+
*** Build code tickets should go on wormbase-pipeline github repor
*** Paul will work through the bitbucket ones.....mostly tier II
+
*** PD will work through the bitbucket ones.....mostly tier II
*** JIRA kev is the main user of this system
+
*** JIRA (KH the main user of this system), should migrate to something visible project-wide
 +
 
 +
=== Misc ===
  
* Need to discuss with the projects as a whole!!!!!!!!
+
* Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)

Revision as of 13:25, 10 July 2015

July 2015

July 9, 2015

In attendance: KH, MP, BB, JL, PD, GW Minuted by PD Start 16:00

Database migration

  • Thomas last day 18th Sept.
  • Documentation Documentation Documentation.....
    • Ideal: Anyone can take an ACeDB database and load into a datomic instance
    • Documentation is on the database github repository
    • Not everyone has access to the db repository currently.
    • There may be parts of docs that need to be more proscriptive
  • Datomic set-up
    • Gary had trouble setting up Datomic on his desktop machine
    • Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
  • Datomic build
    • WS250 GW to take a 250 acedb and try and work solely from the documentation
    • Any issues then use Thomas's expertise while he is still here and improve the documentation
    • WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
  • Models updates
    • Done via an augmented models file to make the conversion.
    • Simple model additions = Simple extension to the augmented file
    • Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
  • Tools development: Collonade and the AceDb like tree editor TrACeView
  • Thomas's replacement
    • Originally a database developer, but now poss looking for some web dev skills
    • Target a Clojure programmer?


Curation / Annotation

  • New Brugia assembly (MP)
    • state of the genome
      • "Final" version
      • "No bacterial contamination" as Avril's code has been used.
      • Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
      • Haplotype sequences in additional file
    • Annotation
      • 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
      • Gene set will be Projected from old assembly + gap filling and extension from Augustus.
      • Training Augustus on old models (Min 2 exons with good intergenic spacing).
      • Issues will be:
        • Partially mapped manually curated genes
        • Curated and NOT mapped (Use Exonerate to recover).
        • Substring models......already has code for flattening them down.
    • STAR RNASeq alignments (GW)
  • New C. remanri genome (PD)
    • Strain PX356, from Philllips lab (published)
      • Generated setup configs etc.
      • Have an e! database containing genome
      • Annotations provided by user assessed and found to be bad
        • CDS features found with more exons that CDS features under a parent mRNA.
  • C. elegans reference genome
    • Yet another C. elegans reference sequence error paper (Zhang).
      • Some examples look odd.......deletion in intron where others have reported only a SNP.
      • Some inserts fix genes.
  • Re-working modENCODE RNASeq data
    • Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
    • Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.
  • Gene curation
    • Continues as usual
    • Finished looking at IWM genes (PD)
    • Organise meeting with new UniProt C. elegans curator
    • Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)


Parasite

  • RNASeq data display (JL & BB)
    • bigwig track display for RNASeq data for 3 cestodes
    • Tracks look of but working with various ideas for display
    • Grouped by study
    • Track scaling
      • Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
      • Possible to compare tracks but the scale will invariably be different.
      • Need to get scaling right so as to not smear out points of biological interest.
      • How do other browsers perform?
        • Jbrowse appears to do same as e! browser
        • Looks at UCSC track hub handling
    • Possibly define a set of ubiquitous genes to normalise on?
      • Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
      • Encode might has software for normalisation in rseq tools
    • Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
  • Imminent release of ParaSite 3 (~30th July) (BB + KH)
    • Done
      • FTP dumps
      • blast dbs
      • REST API
    • To do
      • MartBuild takes ~1week
      • Clearing critical healthchecks
      • Search dumps
      • Testing!
      • 2 New Species and 1 changed
    • Misc
      • Forum for announcing new features
        • News feed page/blog
        • What's new from old releases logged and poss turned into paragraphs for release blog?
        • Possibly re-work the front page to give more prominence to news/information on what's new in the release.
      • GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
      • Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.


Communication

  • Working on retirement of the old Sanger CVS for pipeline code.
    • Have always has a GIT mirroring, this will become primary source
    • Need to consider the models and wspec in general (tagging etc)
    • Need to communicate this with Caltech (new place to pick up models)
  • Discussion about all the places we store work tickets

Misc

  • Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)