Difference between revisions of "Hinxton 2015.07- Meeting minutes"
From WormBaseWiki
Jump to navigationJump to searchLine 7: | Line 7: | ||
Start 16:00 | Start 16:00 | ||
− | === | + | === Database migration === |
− | * | + | * Thomas last day 18th Sept. |
* Documentation Documentation Documentation..... | * Documentation Documentation Documentation..... | ||
** Ideal: Anyone can take an ACeDB database and load into a datomic instance | ** Ideal: Anyone can take an ACeDB database and load into a datomic instance | ||
** Documentation is on the [https://github.com/WormBase/db/wiki database github repository] | ** Documentation is on the [https://github.com/WormBase/db/wiki database github repository] | ||
− | + | ** Not everyone has access to the db repository currently. | |
− | + | ** There may be parts of docs that need to be more proscriptive | |
+ | * Datomic set-up | ||
+ | ** Gary had trouble setting up Datomic on his desktop machine | ||
+ | ** Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation? | ||
+ | * Datomic build | ||
+ | ** WS250 GW to take a 250 acedb and try and work solely from the documentation | ||
+ | ** Any issues then use Thomas's expertise while he is still here and improve the documentation | ||
+ | ** WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250) | ||
+ | * Models updates | ||
+ | ** Done via an augmented models file to make the conversion. | ||
+ | ** Simple model additions = Simple extension to the augmented file | ||
+ | ** Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this | ||
+ | * Tools development: [http://db.wormbase.org:8120/colonnade Collonade] and the AceDb like tree editor [http://db.wormbase.org:8120/view/gene/WBGene00004013 TrACeView] | ||
+ | ** Written in [https://github.com/clojure/clojurescript ClojureScript] (Clojure compiled into JS) | ||
+ | * Thomas's replacement | ||
+ | ** Originally a database developer, but now poss looking for some web dev skills | ||
+ | ** Target a Clojure programmer? | ||
+ | |||
− | + | === Curation / Annotation === | |
− | |||
− | |||
− | |||
− | * | + | * New Brugia assembly (MP) |
− | ** | + | ** state of the genome |
− | ** | + | *** "Final" version |
+ | *** "No bacterial contamination" as Avril's code has been used. | ||
+ | *** Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000) | ||
+ | *** Haplotype sequences in additional file | ||
+ | ** Annotation | ||
+ | *** 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20% | ||
+ | *** Gene set will be Projected from old assembly + gap filling and extension from Augustus. | ||
+ | *** Training Augustus on old models (Min 2 exons with good intergenic spacing). | ||
+ | *** Issues will be: | ||
+ | **** Partially mapped manually curated genes | ||
+ | **** Curated and NOT mapped (Use Exonerate to recover). | ||
+ | **** Substring models......already has code for flattening them down. | ||
+ | ** STAR RNASeq alignments (GW) | ||
− | * | + | * New C. remanri genome (PD) |
− | ** | + | ** Strain PX356, from Philllips lab (published) |
+ | *** Generated setup configs etc. | ||
+ | *** Have an e! database containing genome | ||
+ | *** Annotations provided by user assessed and found to be bad | ||
+ | **** CDS features found with more exons that CDS features under a parent mRNA. | ||
− | * | + | * C. elegans reference genome |
− | ** | + | ** Yet another C. elegans reference sequence error paper (Zhang). |
+ | *** Some examples look odd.......deletion in intron where others have reported only a SNP. | ||
+ | *** Some inserts fix genes. | ||
− | + | * Re-working modENCODE RNASeq data | |
+ | ** Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW) | ||
+ | ** Wiill work with Sibyl once the science is happy to try and utilise the code for web usage. | ||
− | * | + | * Gene curation |
− | ** | + | ** Continues as usual |
+ | ** Finished looking at IWM genes (PD) | ||
+ | ** Organise meeting with new UniProt C. elegans curator | ||
+ | ** Brief discussion on capturing date-last-reviewed (to be followed up at next meeting) | ||
=== Parasite === | === Parasite === | ||
− | * | + | * RNASeq data display (JL & BB) |
− | * bigwig track display for 3 cestodes | + | ** bigwig track display for RNASeq data for 3 cestodes |
** Tracks look of but working with various ideas for display | ** Tracks look of but working with various ideas for display | ||
** Grouped by study | ** Grouped by study | ||
− | ** Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence. | + | ** Track scaling |
− | * | + | *** Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence. |
− | ** Possible to compare tracks but the scale will invariably be different. | + | *** Possible to compare tracks but the scale will invariably be different. |
− | ** Need to get | + | *** Need to get scaling right so as to not smear out points of biological interest. |
*** How do other browsers perform? | *** How do other browsers perform? | ||
**** Jbrowse appears to do same as e! browser | **** Jbrowse appears to do same as e! browser | ||
**** Looks at UCSC track hub handling | **** Looks at UCSC track hub handling | ||
** Possibly define a set of ubiquitous genes to normalise on? | ** Possibly define a set of ubiquitous genes to normalise on? | ||
− | ** Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start. | + | *** Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start. |
− | ** Encode might has software for normalisation in rseq tools | + | *** Encode might has software for normalisation in rseq tools |
− | + | ** Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring. | |
− | + | * Imminent release of ParaSite 3 (~30th July) (BB + KH) | |
− | * Imminent release of ParaSite 3 (~30th July) | + | ** Done |
− | ** | + | *** FTP dumps |
*** blast dbs | *** blast dbs | ||
− | *** REST | + | *** REST API |
− | ** | + | ** To do |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
*** MartBuild takes ~1week | *** MartBuild takes ~1week | ||
+ | *** Clearing critical healthchecks | ||
+ | *** Search dumps | ||
+ | *** Testing! | ||
+ | *** 2 New Species and 1 changed | ||
+ | ** Misc | ||
+ | *** Forum for announcing new features | ||
+ | **** News feed page/blog | ||
+ | **** What's new from old releases logged and poss turned into paragraphs for release blog? | ||
+ | **** Possibly re-work the front page to give more prominence to news/information on what's new in the release. | ||
+ | *** GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases. | ||
+ | *** Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | === Communication === | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | * Working on retirement of the old Sanger CVS for pipeline code. | ||
+ | ** Have always has a GIT mirroring, this will become primary source | ||
+ | ** Need to consider the models and wspec in general (tagging etc) | ||
+ | ** Need to communicate this with Caltech (new place to pick up models) | ||
* Discussion about all the places we store work tickets | * Discussion about all the places we store work tickets | ||
** RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets | ** RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets | ||
Line 122: | Line 124: | ||
*** github pipeline - our code | *** github pipeline - our code | ||
*** https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues | *** https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues | ||
− | **JIRA - | + | **JIRA - EBI projects (incl Ensembl) |
− | * | + | ** Aim: rationalise |
− | ** | + | *** Build code tickets should go on wormbase-pipeline github repor |
− | *** | + | *** PD will work through the bitbucket ones.....mostly tier II |
− | *** JIRA | + | *** JIRA (KH the main user of this system), should migrate to something visible project-wide |
+ | |||
+ | === Misc === | ||
− | * | + | * Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP) |
Revision as of 13:25, 10 July 2015
Contents
July 2015
July 9, 2015
In attendance: KH, MP, BB, JL, PD, GW Minuted by PD Start 16:00
Database migration
- Thomas last day 18th Sept.
- Documentation Documentation Documentation.....
- Ideal: Anyone can take an ACeDB database and load into a datomic instance
- Documentation is on the database github repository
- Not everyone has access to the db repository currently.
- There may be parts of docs that need to be more proscriptive
- Datomic set-up
- Gary had trouble setting up Datomic on his desktop machine
- Better test would be to document/try and set up Datomic from a fresh AWS instance? Can it be done from documentation?
- Datomic build
- WS250 GW to take a 250 acedb and try and work solely from the documentation
- Any issues then use Thomas's expertise while he is still here and improve the documentation
- WS249 could be used as a test case even though a datomic already exists (rather than waiting for WS250)
- Models updates
- Done via an augmented models file to make the conversion.
- Simple model additions = Simple extension to the augmented file
- Complexed changes/reworking of classes might not be so straightforward. Need to be aware of process for doing this
- Tools development: Collonade and the AceDb like tree editor TrACeView
- Written in ClojureScript (Clojure compiled into JS)
- Thomas's replacement
- Originally a database developer, but now poss looking for some web dev skills
- Target a Clojure programmer?
Curation / Annotation
- New Brugia assembly (MP)
- state of the genome
- "Final" version
- "No bacterial contamination" as Avril's code has been used.
- Chromosome naming and identification (Chromosomes scaffolded inc. X/Y + unlocalised contigs) (20-30 sequences where old assembly was >2000)
- Haplotype sequences in additional file
- Annotation
- 16% (1.19 /gene) duplication in CEGMA genes.........old v3 assembly 7% (1.08/gene) TIGR >20%
- Gene set will be Projected from old assembly + gap filling and extension from Augustus.
- Training Augustus on old models (Min 2 exons with good intergenic spacing).
- Issues will be:
- Partially mapped manually curated genes
- Curated and NOT mapped (Use Exonerate to recover).
- Substring models......already has code for flattening them down.
- STAR RNASeq alignments (GW)
- state of the genome
- New C. remanri genome (PD)
- Strain PX356, from Philllips lab (published)
- Generated setup configs etc.
- Have an e! database containing genome
- Annotations provided by user assessed and found to be bad
- CDS features found with more exons that CDS features under a parent mRNA.
- Strain PX356, from Philllips lab (published)
- C. elegans reference genome
- Yet another C. elegans reference sequence error paper (Zhang).
- Some examples look odd.......deletion in intron where others have reported only a SNP.
- Some inserts fix genes.
- Yet another C. elegans reference sequence error paper (Zhang).
- Re-working modENCODE RNASeq data
- Working on test code for generating MODEncode expression graphs with the scientific scrutiny of Julie Ahringer (GW)
- Wiill work with Sibyl once the science is happy to try and utilise the code for web usage.
- Gene curation
- Continues as usual
- Finished looking at IWM genes (PD)
- Organise meeting with new UniProt C. elegans curator
- Brief discussion on capturing date-last-reviewed (to be followed up at next meeting)
Parasite
- RNASeq data display (JL & BB)
- bigwig track display for RNASeq data for 3 cestodes
- Tracks look of but working with various ideas for display
- Grouped by study
- Track scaling
- Dynamic normalisation causes some display issues as currently normalises over loaded area and not complete genome/chromosome/sequence.
- Possible to compare tracks but the scale will invariably be different.
- Need to get scaling right so as to not smear out points of biological interest.
- How do other browsers perform?
- Jbrowse appears to do same as e! browser
- Looks at UCSC track hub handling
- Possibly define a set of ubiquitous genes to normalise on?
- Gary uses ana-1 for gleaning expression and RNASeq statistics as it appears to always be expressed at consistent level, but having a set of housekeeping genes might be a start.
- Encode might has software for normalisation in rseq tools
- Are we doing things wrong/different? No as e! does the same so not a major priority, but worth exploring.
- Imminent release of ParaSite 3 (~30th July) (BB + KH)
- Done
- FTP dumps
- blast dbs
- REST API
- To do
- MartBuild takes ~1week
- Clearing critical healthchecks
- Search dumps
- Testing!
- 2 New Species and 1 changed
- Misc
- Forum for announcing new features
- News feed page/blog
- What's new from old releases logged and poss turned into paragraphs for release blog?
- Possibly re-work the front page to give more prominence to news/information on what's new in the release.
- GO_term population, dropped by EG as have UniProt assistance, we may not run this pipeline in future releases.
- Look at google search optimisation and creating a more user friendly top entry in the google results as current google choices are a bit odd.
- Forum for announcing new features
- Done
Communication
- Working on retirement of the old Sanger CVS for pipeline code.
- Have always has a GIT mirroring, this will become primary source
- Need to consider the models and wspec in general (tagging etc)
- Need to communicate this with Caltech (new place to pick up models)
- Discussion about all the places we store work tickets
- RT-sanger worm-bug@sanger.ac.uk >1000 of paper tickets
- https://bitbucket.org/pauld/seqcur_ticketing-system 84 tickets
- GitHub
- github website - website issues and helpdesk
- github pipeline - our code
- https://github.com/Paul-Davis/seqcur_ticketing-system/issues 3 issues
- JIRA - EBI projects (incl Ensembl)
- Aim: rationalise
- Build code tickets should go on wormbase-pipeline github repor
- PD will work through the bitbucket ones.....mostly tier II
- JIRA (KH the main user of this system), should migrate to something visible project-wide
Misc
- Work for Ann Hart: checking for motifs conserved across Caenorhabditis with PCactus (MP)