Difference between revisions of "Hinxton 2015.08- Meeting minutes"

From WormBaseWiki
Jump to navigationJump to search
(Created page with "= August 2015 = == August 6, 2015 == In attendance: KH, PD, MP, TD, BB === General === * Build 250 ** In progress. MP doing all 9 core species (no precedent for this). **...")
 
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= August 2015 =
 
= August 2015 =
 +
 +
==August 20, 2015 ==
 +
 +
In attendance: KH, TD, GW, MP, JL
 +
 +
=== Build 250 ===
 +
 +
* Problems
 +
** Documentation error meant that crucial step at start of build was missed.
 +
** Result was that all non-elegans species were build using old data
 +
** Needed to re-start all T2 builds from scratch
 +
** Re-run of BLASTs, InterProScan, Compara
 +
* Timing
 +
** Build will probably be one week late.
 +
** MP will be on vacation at end of build, GW to finish build
 +
 +
=== Database migration ===
 +
 +
* Curation tools (Collonade, editor)
 +
** Numerous fixes/changes based on feedback from Mary Ann
 +
** Added KeySet functionalty
 +
* Importer/Exporter
 +
** Cannot re-implement AceDB's partial case-insensitivity in Datomic.
 +
*** WBGene00000001 != WBgene00000001 in new system
 +
** GW to run the importer for WS250
 +
** Dicussed cut-and-paste-into-shell method of running the importer
 +
 +
=== Curation ===
 +
 +
* Weird genes
 +
** Handful of genes in C.elegans that have contrived structures to accommodate an frameshift / in-frame stop
 +
** In some cases, the frameshft/stop is conserved in other Caenorhabditis.
 +
** GW doing further work to investigate what this might mean
 +
* Mass spec data from Lamond lab
 +
** Aligning mass-spec peptides to genome, will appear as a genome browser track
 +
* Further discussed capturing/registering when a curator has reviewed a gene model but not made a change
 +
** Should boil this down to a proposal and circulate with whole group
 +
 +
 +
 +
 +
 +
==August 14, 2015 ==
 +
 +
In attendance: KH, TD, PD, GW, BB, MP
 +
 +
=== Database migration ===
 +
 +
* Security discussion
 +
** Script will authenticate to the application using self-signed SSL certificates
 +
** What level of granularity for the certificates? Per center? Per user? Per update type (e.g. a separate cert for each build step)
 +
* Transaction log
 +
** Web app now allow unrolling of recent transactions
 +
** Functionality added to detect when unrolling of a T impacts on Ts made subsequently (e.g. T1 makes object, T2 refers to object, user attempts to undo T1).
 +
* Environment
 +
** GW had trouble previously getting Datomic running on EBI machine on nfd-mounted storage.
 +
** The free storage engine with Datomic assumes/requires low-latency (ideally directly attached) storage
 +
** GW to try again on acedb.ebi.ac.uk using /nfs/acedb/vol1 (purportedly low-latency storage)
 +
* 2-way XEFS with different attached # models at either end
 +
** e.g. Gene -> Variation (#Evidence), Variation -> Gene (#Molecular_change)
 +
** New db puts #model annotation of the edges of the graph, so does not currently support this.
 +
** Discussed possible workarounds
 +
* Loading Ace into Datomic
 +
** Some testing done by GW, works well
 +
 +
=== Build 250 ===
 +
 +
* Status
 +
** BLAST - all 9 species run, dumped and loaded
 +
** Compara - finished and loaded
 +
** RNASeq - in progress
 +
* General problems/fire-fighting
 +
** Transcript builder (mysteriously did not complete for two genomes)
 +
** InterProScan (found a bug in the Ensembl code, fixed and provided patch)
 +
** BLAST dumping (exceeded MySQL max connections. Needed to tweak server conf)
 +
 +
=== ParaSite ===
 +
 +
* Redesign of homepage
 +
** Beta version demonstrated. Will circulate some other possible layouts
 +
* New blog for ParaSite
 +
** wbparasite.wordpress.com
 +
 +
=== Data integration / curation ===
 +
 +
* Brugia
 +
** Produced initial annotation of new assembly.
 +
**MP presented to Brugia analysis group in conference call
 +
* C. elegans
 +
** DB_remark clean up -> Brief_identification, pseudogenes (will go in as a note in ENA submissions)
 +
** Classification pseudogenes. Will need a new unitary pseudogene tag in ?Pseudogene model
 +
** Consolidation of UniProt product names into the database
 +
** SVM paper => gene curation
 +
** Proteomics data from Lamond lab
 +
  
 
== August 6, 2015 ==
 
== August 6, 2015 ==
Line 5: Line 100:
 
In attendance: KH, PD, MP, TD, BB
 
In attendance: KH, PD, MP, TD, BB
  
=== General ===
+
=== Build 250 ===
 +
 
 +
* In progress. MP doing all 9 core species (no precedent for this).
 +
* First build from github code base (no problems so far)
 +
* How long will BLAST for all 9 species take (all BLASTX and BLASTP will need to be re-done, since all of the worm proteomes have changed)
 +
* Problems
 +
** RNAis with bad DNA_text. Cleaned and reported to Caltech
 +
** NCBI taxonomy database at EBI changed location, config needed a patch
 +
 
 +
=== Data integration / curation ===
 +
 
 +
* B. malayi
 +
** WS248 Brugia submission now live at ENA
 +
** Gene prediction on new assembly
 +
*** Looking at RATT transfers from old assembly
 +
*** Using HAL mapping to project genes also works quite well.
 +
*** AUGUSTUS to fill in the gaps
 +
** Handover will be Monday 10th August (during Brugia conference call)
 +
* C. remanei
 +
** Bad data. Submitter does not have time right now to clean. Will be deferred from this build
 +
* General
 +
** Hawaain strain - will go in as T3. BLASTable.
 +
** Cleaning up T2. Getting rid of in-frame STOPs for P. pacific us
 +
**Cleaning up Brief_identifcation for protein-coding and pseudogenes.
  
* Build 250
+
=== Database migration ===
** In progress. MP doing all 9 core species (no precedent for this).
 
** First build from github code base (no problems so far)
 
** How long will BLAST for all 9 species take (all BLASTX and BLASTP will need to be re-done, since all of the worm proteomes have changed)
 
** Problems
 
*** RNAis with bad DNA_text. Cleaned and reported to Caltech
 
*** NCBI taxonomy database at EBI changed location, config needed a patch
 
  
* Data integration / curation
+
* New interface for uploading ace files.  
** B. malayi
+
* Discussed potential strategies for schema updates
*** WS248 Brugia submission now live at ENA
 
*** Gene prediction on new assembly
 
**** Looking at RATT transfers from old assembly
 
**** Using HAL mapping to project genes also works quite well.
 
**** AUGUSTUS to fill in the gaps
 
*** Handover will be Monday 10th August (during Brugia conference call)
 
** C. remanei
 
*** Bad data. Submitter does not have time right now to clean. Will be deferred from this build
 
** General
 
*** Hawaain strain - will go in as T3. BLASTable.
 
*** Cleaning up T2. Getting rid of in-frame STOPs for P. pacific us
 
***Cleaning up Brief_identifcation for protein-coding and pseudogenes.
 
  
* Database migration
+
=== ParaSite ===
** New interface for uploading ace files.
 
** Discussed potential strategies for schema updates
 
  
* ParaSite
+
* New release out
** New release out
+
* BLAST  
** BLAST  
+
** Fixed glitches in new service
*** Fixed glitches in new service
+
** Discussed potential minor improvements to font page
*** Discussed potential minor improvements to font page
+
* Analytics
** Analytics
+
* Redesign of the home page. Make it more WormBase compliant.  
** Redesign of the home page. Make it more WormBase compliant.  
 
 
** Search box confusion
 
** Search box confusion

Latest revision as of 10:57, 21 August 2015

August 2015

August 20, 2015

In attendance: KH, TD, GW, MP, JL

Build 250

  • Problems
    • Documentation error meant that crucial step at start of build was missed.
    • Result was that all non-elegans species were build using old data
    • Needed to re-start all T2 builds from scratch
    • Re-run of BLASTs, InterProScan, Compara
  • Timing
    • Build will probably be one week late.
    • MP will be on vacation at end of build, GW to finish build

Database migration

  • Curation tools (Collonade, editor)
    • Numerous fixes/changes based on feedback from Mary Ann
    • Added KeySet functionalty
  • Importer/Exporter
    • Cannot re-implement AceDB's partial case-insensitivity in Datomic.
      • WBGene00000001 != WBgene00000001 in new system
    • GW to run the importer for WS250
    • Dicussed cut-and-paste-into-shell method of running the importer

Curation

  • Weird genes
    • Handful of genes in C.elegans that have contrived structures to accommodate an frameshift / in-frame stop
    • In some cases, the frameshft/stop is conserved in other Caenorhabditis.
    • GW doing further work to investigate what this might mean
  • Mass spec data from Lamond lab
    • Aligning mass-spec peptides to genome, will appear as a genome browser track
  • Further discussed capturing/registering when a curator has reviewed a gene model but not made a change
    • Should boil this down to a proposal and circulate with whole group



August 14, 2015

In attendance: KH, TD, PD, GW, BB, MP

Database migration

  • Security discussion
    • Script will authenticate to the application using self-signed SSL certificates
    • What level of granularity for the certificates? Per center? Per user? Per update type (e.g. a separate cert for each build step)
  • Transaction log
    • Web app now allow unrolling of recent transactions
    • Functionality added to detect when unrolling of a T impacts on Ts made subsequently (e.g. T1 makes object, T2 refers to object, user attempts to undo T1).
  • Environment
    • GW had trouble previously getting Datomic running on EBI machine on nfd-mounted storage.
    • The free storage engine with Datomic assumes/requires low-latency (ideally directly attached) storage
    • GW to try again on acedb.ebi.ac.uk using /nfs/acedb/vol1 (purportedly low-latency storage)
  • 2-way XEFS with different attached # models at either end
    • e.g. Gene -> Variation (#Evidence), Variation -> Gene (#Molecular_change)
    • New db puts #model annotation of the edges of the graph, so does not currently support this.
    • Discussed possible workarounds
  • Loading Ace into Datomic
    • Some testing done by GW, works well

Build 250

  • Status
    • BLAST - all 9 species run, dumped and loaded
    • Compara - finished and loaded
    • RNASeq - in progress
  • General problems/fire-fighting
    • Transcript builder (mysteriously did not complete for two genomes)
    • InterProScan (found a bug in the Ensembl code, fixed and provided patch)
    • BLAST dumping (exceeded MySQL max connections. Needed to tweak server conf)

ParaSite

  • Redesign of homepage
    • Beta version demonstrated. Will circulate some other possible layouts
  • New blog for ParaSite
    • wbparasite.wordpress.com

Data integration / curation

  • Brugia
    • Produced initial annotation of new assembly.
    • MP presented to Brugia analysis group in conference call
  • C. elegans
    • DB_remark clean up -> Brief_identification, pseudogenes (will go in as a note in ENA submissions)
    • Classification pseudogenes. Will need a new unitary pseudogene tag in ?Pseudogene model
    • Consolidation of UniProt product names into the database
    • SVM paper => gene curation
    • Proteomics data from Lamond lab


August 6, 2015

In attendance: KH, PD, MP, TD, BB

Build 250

  • In progress. MP doing all 9 core species (no precedent for this).
  • First build from github code base (no problems so far)
  • How long will BLAST for all 9 species take (all BLASTX and BLASTP will need to be re-done, since all of the worm proteomes have changed)
  • Problems
    • RNAis with bad DNA_text. Cleaned and reported to Caltech
    • NCBI taxonomy database at EBI changed location, config needed a patch

Data integration / curation

  • B. malayi
    • WS248 Brugia submission now live at ENA
    • Gene prediction on new assembly
      • Looking at RATT transfers from old assembly
      • Using HAL mapping to project genes also works quite well.
      • AUGUSTUS to fill in the gaps
    • Handover will be Monday 10th August (during Brugia conference call)
  • C. remanei
    • Bad data. Submitter does not have time right now to clean. Will be deferred from this build
  • General
    • Hawaain strain - will go in as T3. BLASTable.
    • Cleaning up T2. Getting rid of in-frame STOPs for P. pacific us
    • Cleaning up Brief_identifcation for protein-coding and pseudogenes.

Database migration

  • New interface for uploading ace files.
  • Discussed potential strategies for schema updates

ParaSite

  • New release out
  • BLAST
    • Fixed glitches in new service
    • Discussed potential minor improvements to font page
  • Analytics
  • Redesign of the home page. Make it more WormBase compliant.
    • Search box confusion