Revision as of 13:25, 10 July 2015

July 2015

July 9, 2015

In attendance: KH, MP, BB, JL, PD, GW Minuted by PD Start 16:00

Database migration

Thomas last day 18th Sept.
Documentation Documentation Documentation.....
- Ideal: Anyone can take an ACeDB database and load into a datomic instance
- Documentation is on the database github repository
- Not everyone has access to the db repository currently.
- There may be parts of docs that need to be more proscriptive
Datomic set-up
- Gary had trouble setting up Datomic on his desktop machine
- Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
Datomic build
- WS250 GW to take a 250 acedb and try and work solely from the documentation
- Any issues then use Thomas's expertise while he is still here and improve the documentation
- WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
Models updates
- Done via an augmented models file to make the conversion.
- Simple model additions = Simple extension to the augmented file
- Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
Tools development: Collonade and the AceDb like tree editor TrACeView
- Written in ClojureScript (Clojure compiled into JS)
Thomas's replacement
- Originally a database developer, but now poss looking for some web dev skills
- Target a Clojure programmer?

Curation / Annotation

New Brugia assembly (MP)
- state of the genome
  - "Final" version
  - "No bacterial contamination" as Avril's code has been used.
  - Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
  - Haplotype sequences in additional file
- Annotation
  - 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
  - Gene set will be Projected from old assembly + gap filling and extension from Augustus.
  - Training Augustus on old models (Min 2 exons with good intergenic spacing).
  - Issues will be:
    - Partially mapped manually curated genes
    - Curated and NOT mapped (Use Exonerate to recover).
    - Substring models......already has code for flattening them down.
- STAR RNASeq alignments (GW)

New C. remanri genome (PD)
- Strain PX356, from Philllips lab (published)
  - Generated setup configs etc.
  - Have an e! database containing genome
  - Annotations provided by user assessed and found to be bad
    - CDS features found with more exons that CDS features under a parent mRNA.

C. elegans reference genome
- Yet another C. elegans reference sequence error paper (Zhang).
  - Some examples look odd.......deletion in intron where others have reported only a SNP.
  - Some inserts fix genes.

Re-working modENCODE RNASeq data
- Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
- Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.

Gene curation
- Continues as usual
- Finished looking at IWM genes (PD)
- Organise meeting with new UniProt C. elegans curator
- Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)

Parasite

RNASeq data display (JL & BB)
- bigwig track display for RNASeq data for 3 cestodes
- Tracks look of but working with various ideas for display
- Grouped by study
- Track scaling
  - Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
  - Possible to compare tracks but the scale will invariably be different.
  - Need to get scaling right so as to not smear out points of biological interest.
  - How do other browsers perform?
    - Jbrowse appears to do same as e! browser
    - Looks at UCSC track hub handling
- Possibly define a set of ubiquitous genes to normalise on?
  - Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
  - Encode might has software for normalisation in rseq tools
- Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
Imminent release of ParaSite 3 (~30th July) (BB + KH)
- Done
  - FTP dumps
  - blast dbs
  - REST API
- To do
  - MartBuild takes ~1week
  - Clearing critical healthchecks
  - Search dumps
  - Testing!
  - 2 New Species and 1 changed
- Misc
  - Forum for announcing new features
    - News feed page/blog
    - What's new from old releases logged and poss turned into paragraphs for release blog?
    - Possibly re-work the front page to give more prominence to news/information on what's new in the release.
  - GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
  - Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.

Communication

Working on retirement of the old Sanger CVS for pipeline code.
- Have always has a GIT mirroring, this will become primary source
- Need to consider the models and wspec in general (tagging etc)
- Need to communicate this with Caltech (new place to pick up models)
Discussion about all the places we store work tickets
- RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
- https://bitbucket.org/pauld/seqcur_ticketing-system 84 tickets
- GitHub
  - github website - website issues and helpdesk
  - github pipeline - our code
  - https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
- JIRA - EBI projects (incl Ensembl)
- Aim: rationalise
  - Build code tickets should go on wormbase-pipeline github repor
  - PD will work through the bitbucket ones.....mostly tier II
  - JIRA (KH the main user of this system), should migrate to something visible project-wide

Misc

Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)

@@ Line 7: / Line 7: @@
 Start 16:00
-=== Thomas's Departure ===
+=== Database migration ===
-*Last day 18th Sept.
+* Thomas last day 18th Sept.
 * Documentation Documentation Documentation.....
 ** Ideal: Anyone can take an ACeDB  database and load into a datomic instance
 ** Documentation is on the [https://github.com/WormBase/db/wiki database github repository]
-*** Not everyone has access at the moment.
+** Not everyone has access to the db repository currently.
-*** There may be elements that Thomas assumes are so trivial that they have not bee documented
+** There may be parts of docs that need to be more proscriptive
+* Datomic set-up
+** Gary had trouble setting up Datomic on his desktop machine
+** Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
+* Datomic build
+** WS250 GW to take a 250 acedb and try and work solely from the documentation
+** Any issues then use Thomas's expertise while he is still here and improve the documentation
+** WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
+* Models updates
+** Done via an augmented models file to make the conversion.
+** Simple model additions = Simple extension to the augmented file
+** Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
+* Tools development: [http://db.wormbase.org:8120/colonnade Collonade] and the AceDb like tree editor [http://db.wormbase.org:8120/view/gene/WBGene00004013 TrACeView]
+** Written in [https://github.com/clojure/clojurescript ClojureScript] (Clojure compiled into JS)
+* Thomas's replacement
+** Originally a database developer, but now poss looking for some web dev skills
+** Target a Clojure programmer?
-* Action items:
+=== Curation / Annotation ===
-** WS250 Gary to take a 250 acedb and try and work solely from the documentation
-*** Any issues then use Thomas's expertise while he is still here and improve the documentation
-** WS249 could be used as a test case even though a datomic already exists????
-* Models updates are done via an augmented models file to make the conversion.
+* New Brugia assembly (MP)
-** Simple model additions = Simple extension to the augmented file
+** state of the genome
-**Complexed changes/reworking of classes = augmented file meeds munging.......need to look at this process
+*** "Final" version
+*** "No bacterial contamination" as Avril's code has been used.
+*** Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
+*** Haplotype sequences in additional file
+** Annotation
+*** 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
+*** Gene set will be Projected from old assembly + gap filling and extension from Augustus.
+*** Training Augustus on old models (Min 2 exons with good intergenic spacing).
+*** Issues will be:
+**** Partially mapped manually curated genes
+**** Curated and NOT mapped (Use Exonerate to recover).
+**** Substring models......already has code for flattening them down.
+** STAR RNASeq alignments (GW)
-* Gary has been through some of the steps necessary to set up a Datomic database, but this was on his desktop machine that brought with it architecture issues so defaulted to an instance that Thomas had set up.
+* New C. remanri genome (PD)
-** Better test would be to document/try and set up Datomic from a fresh AWS instance? can it be done from documentation?
+** Strain PX356, from Philllips lab (published)
+*** Generated setup configs etc.
+*** Have an e! database containing genome
+*** Annotations provided by user assessed and found to be bad
+**** CDS features found with more exons that CDS features under a parent mRNA.
-* Tools development: [http://db.wormbase.org:8120/colonnade Collonade] and the AceDb like tree editor [http://db.wormbase.org:8120/view/gene/WBGene00004013 TrACeView]
+* C. elegans reference genome
-**Written in [https://github.com/clojure/clojurescript ClojureScript] (Clojure compiled into JS)
+** Yet another C. elegans reference sequence error paper (Zhang).
+*** Some examples look odd.......deletion in intron where others have reported only a SNP.
+*** Some inserts fix genes.
-=== Job Advert ===
+* Re-working modENCODE RNASeq data
+** Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
+** Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.
-*Originally a database developer, but now poss looking for some web dev skills
+* Gene curation
-**Clojure programming
+** Continues as usual
+** Finished looking at IWM genes (PD)
+** Organise meeting with new UniProt C. elegans curator
+** Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)
 === Parasite ===
-* Jane and Bruce
+* RNASeq data display (JL & BB)
-* bigwig track display for 3 cestodes
+** bigwig track display for RNASeq data for 3 cestodes
 ** Tracks look of but working with various ideas for display
 ** Grouped by study
-** Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
+** Track scaling
-*** Webcode doesn't appear to have mechanism to sort scaling
+*** Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
-** Possible to compare tracks but the scale will invariably be different.
+*** Possible to compare tracks but the scale will invariably be different.
-** Need to get correct so as to not smear out points of biological interest.
+*** Need to get scaling right so as to not smear out points of biological interest.
 *** How do other browsers perform?
 **** Jbrowse appears to do same as e! browser
 **** Looks at UCSC track hub handling
 ** Possibly define a set of ubiquitous genes to normalise on?
-** Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
+*** Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
-** Encode might has software for normalisation in rseq tools
+*** Encode might has software for normalisation in rseq tools
-*** Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
+** Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
+* Imminent release of ParaSite 3 (~30th July) (BB + KH)
-* Imminent release of ParaSite 3 (~30th July)
+** Done
-** Finalising
+*** FTP dumps
 *** blast dbs
-*** REST
+*** REST API
-*** healthchecks
+** To do
-** Other
-*** GO_term population, dropped by EG as have UniProt assistance, we do not so have to run this pipeline.
-*** Ready for search dumps, if not critical healthcheck failures
-*** GFF dumps all in place
-*** Need to do testing
-**** 2 New Species and 1 changed
 *** MartBuild takes ~1week
+*** Clearing critical healthchecks
+*** Search dumps
+*** Testing!
+*** 2 New Species and 1 changed
+** Misc
+*** Forum for announcing new features
+**** News feed page/blog
+**** What's new from old releases logged and poss turned into paragraphs for release blog?
+**** Possibly re-work the front page to give more prominence to news/information on what's new in the release.
+*** GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
+*** Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.
-* Better forum for announcing new features
-** News feed page/blog
-** What's new from old releases logged and poss turned into paragraphs for release blog?
-** Possibly re-work the front page to give more prominence to news/information on what's new in the release.
-* Links back to WormBase central are in place for release 3.
-* Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.
-=== WormBase Central ===
-* Gary
+=== Communication ===
-** STAR RNASeq alignments for the brugia v4 assembly
-** Yet another C. elegans reference sequence error paper.
-*** Some examples look odd.......deletion in intron where others have reported only a SNP.
-*** Some inserts fix genes.
-** Working on test code for generating MODEncode expression graphs with the scientific scrutiny of J.A.
-** Gary will work with Sibyl once the science is happy to try and utilise the code for web usage.
-* Michael
-** for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus
-** New brugia assembly
-*** Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
-*** Training Augustus on old models (Min 2 exons with good intergenic spacing).
-*** This is the FINAL version of the genome
-**** reasonably finished state
-**** Haplotype sequences in additional file
-**** No bacterial contamination as Avril's code has been used.
-**** 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
-*** Gene set will be Projected from old assembly + gap filling and extension from Augustus.
-**** Issues will be:
-***** Partially mapped manually curated genes
-***** Curated and NOT mapped (Use Exonerate to recover).
-***** Substring models......already has code for flattening them down.
-* Paul
-** Working on retirement of the old Sanger CVS for pipeline code.
-*** Have always has a GIT mirroring, this will become primary source
-*** Need to consider the models and wspec in general.
-*** C. remanei PX356
-**** Generated setup configs etc.
-**** Have an e! database containing genome
-**** Annotations provided by user assessed and found to be bad
-***** CDS features found with more exons that CDS features under a parent mRNA.
-*** Trying to meet with the new Uniprot C. elegans curator
-*** Finished looking at IWM genes
+* Working on retirement of the old Sanger CVS for pipeline code.
+** Have always has a GIT mirroring, this will become primary source
+** Need to consider the models and wspec in general (tagging etc)
+** Need to communicate this with Caltech (new place to pick up models)
 * Discussion about all the places we store work tickets
 ** RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
@@ Line 122: / Line 124: @@
 *** github pipeline - our code
 *** https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
-**JIRA - ensembl genomes
+**JIRA - EBI projects (incl Ensembl)
-*Need to limit to as few as possible
+** Aim: rationalise
-** What can be lost
+*** Build code tickets should go on wormbase-pipeline github repor
-*** Paul will work through the bitbucket ones.....mostly tier II
+*** PD will work through the bitbucket ones.....mostly tier II
-*** JIRA kev is the main user of this system
+*** JIRA (KH the main user of this system), should migrate to something visible project-wide
+=== Misc ===
-* Need to discuss with the projects as a whole!!!!!!!!
+* Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)

Difference between revisions of "Hinxton 2015.07- Meeting minutes"

Revision as of 13:25, 10 July 2015

Contents

July 2015

July 9, 2015

Database migration

Curation / Annotation

Parasite

Communication

Misc

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools