Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
Line 15: Line 15:
 
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
  
= 2017 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_January_2017|January]]
+
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_February_2017|February]]
+
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_March_2017|March]]
+
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_April_2017|April]]
+
[[WormBase-Caltech_Weekly_Calls_2021|2021 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_May_2017|May]]
+
= 2022 Meetings =
  
[[WormBase-Caltech_Weekly_Calls_June_2017|June]]
+
[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
  
== July 6, 2017 ==
+
= January 13th, 2022 =
 +
== tm variation - gene associations ==
 +
*Update on progress and some questions for the Caltech curators
 +
*Background: not all variations were being associated with genes in the OA table because some of those associations are in WS but not in geneace, so weren't coming through in the nightly geneace dump.  Some variation-gene associations are made as part of the VEP pipeline during the build.
 +
**https://github.com/WormBase/website/issues/8262
 +
**https://wiki.wormbase.org/index.php/WBGene_information_and_status_pipeline
 +
**https://wiki.wormbase.org/index.php/Source_and_maintenance_of_non-WBGene_info
 +
**https://wiki.wormbase.org/index.php/Updating_Postgres_with_New_WS_Information
 +
*Wen now downloads several full ACeDB classes from the latest WS release in the form of .ace files so we can also have whatever information is in WS.  Raymond wrote a script to sync those files to tazendra for further processing/use.
 +
*A few questions that we want to confirm before going forward:
 +
**In the WS variations file, there are 2,130,801 total variations (1,911,339 total Live) while in postgres there are currently 106,080.
 +
***Only include Status = Live variations?
 +
***Include regardless of whether there is an associated gene (this seems to be the current practice?).
 +
***Currently, some variations with a given Method, e.g. Million_mutation, are NOT included.  We would continue this filtering.
 +
****SNP
 +
****WGS_Hawaiian_Waterston
 +
****WGS_Pasadena_Quinlan
 +
****WGS_Hobert
 +
****Million_mutation
 +
****WGS_Yanai
 +
****WGS_De_Bono
 +
****WGS_Andersen
 +
****WGS_Flibotte
 +
****WGS_Rose
 +
***Do we want other filters?
 +
**For genes, the ace file contains ALL the gene objects in WB regardless of species.
 +
***We've recently had an author request, via the Acknowledge pipeline, to associate genes of other, less well studied Caenorhabditis species, e.g. C. inopinata, to [https://academic.oup.com/g3journal/article/11/3/jkab022/6121926 their paper].
 +
***Do we want all Caenorhabditis (and other nematode) species genes in our various gene tables, e.g. obo, paper? Any other species?
 +
***The effect on the autocomplete, if we include all, probably won't be a problem 1,018,332 vs 306116)
 +
***Some of the gene ids from other species don't have 'WBGene' prefixes, e.g. Sp34_10109610.  Should we keep this in a separate table from genes with 'WBGene' prefixes?
  
=== Noctua ===
+
= January 20th, 2022 =
* Use for all literature curation in future?
 
* We would need to see how it could be used for all data types
 
* We would also want a full-time developer working on Noctua; maybe bring Juancarlos into the project
 
  
=== SPELL ===
+
== Proposal for updating gene and variation information from WS releases ==
* SPELL related data - how to deal with?
+
=== Genes ===
** Data: recalculated, mapping, results dependent on SPELL
+
*Have two tables:
** Analysis (clustering tool): clunky, freezes w/limited resources
+
**One continues as is - contains only ids for the [https://wormbase.org/species/all core nematode species] (all have WBGene ids)
* Wen will work with Juancarlos to generate a SimpleMine-like tool for downloading data sets
+
**Second, new table - contains non-WGene ids for [https://wormbase.org/species/al comparator nematode genomes]
* Wen has to remove log files to make space on machine
+
***Include other elegans and remanei strains?
* ~360 data sets
+
**Would not include ids for non-WB (and WBParaSite) genomes, e.g. Drosophila or budding yeast
* Current virtual machine has 10G memory; is becoming limiting
+
=== Variations ===
* SPELL has lots of dependencies; difficult to account for them all when building SPELL
+
*Include all variations that have a value for:
* Would be good if we could setup a new (larger) virtual machine to run SPELL; how much time/resources do we want to spend on it?
+
**Method - current filters applied (filter SNP, Million_mutation, WGS's)
 
+
**Species - all
=== Micropublications ===
+
**Status - include all three status values (Live, Dead, Suppressed)
* Poster at Zebrafish meeting pointing to micropublications
+
*Whether a variation has a gene association doesn't matter (not a filter criteria for postgres)
* Microarray micropublications? Unpublished results
+
*From Paul D. - a number of variations in geneace were not making their way as individual objects to WS during the build and so were only created in WS via xref (hence the lack of other information). He's updated geneace with Species and other information wherever possible for the next build.
* Can make links to GEO submissions
+
*Variation merges are infrequent; previous ones may have been due to nameserver issues
* Daniela and Karen can draw up a mock publication to look at as a template
+
*New Methods arise infrequently, but we could check our parsing script against the list of Methods in each release to make sure we're up-to-date.  Would need an inclusion and exclusion list.
 
 
=== Community Curation ===
 
* Considering sending requests to only one person, first author when have info
 
* This should allow for more emails to be sent out per week, and more submissions, even if we maintain the 16% response rate
 
* Chris will work with Juancarlos once he's back from vacation
 
* What's next data type? Interactions? Site of action? Anatomy?
 
* What's required for site of action? Phenotype, gene, cell/tissue where gene is introduced or removed
 
* Can send requests to lab heads to fill in phenotype info for alleles from their lab
 
 
 
=== Linking ===
 
* Will human genes link to AGR human gene pages? Gene Cards? HGNC?
 
 
 
 
 
== July 13, 2017 ==
 
 
 
=== Allele attribute assignment ===
 
* Some attributes of alleles are currently only assigned in the context of a phenotype annotation
 
* Many times these attributes are reported independently of a phenotype experiment
 
* It would be good to have a mechanism to curate these attributes to alleles independent of phenotype curation
 
* As far as modeling goes I (Chris) had intended to do dump these attributes directly into the ?Variation object like this:
 
?Variation 
 
Null  ?Phenotype_experiment
 
* This would allow us to always point to a ?Phenotype_experiment object as a container of evidence, but if we don't always have a ?Phenotype_experiment object to point to then maybe a better approach is just to reference the paper like this:
 
?Variation 
 
Null  ?Paper
 
* This would require a separate curation pipeline from phenotype to capture these attributes; will discuss more with Mary Ann when she's back from vacation
 
* We discussed and came up with a couple of potential solutions:
 
* First:
 
?Variation
 
Variation_effect  Null  ?Text  #Evidence
 
* Second:
 
?Variation
 
Variation_effect  Null  ?Paper  #Evidence
 
* The first solution allows to put in a free text remark, potentially referencing a phenotype or some detail, with the #Evidence hash allowing reference to a paper or a person_evidence
 
* The second solution requires reference to a ?Paper object, which would require a reworking of all person evidence/personal communication to become ?Paper objects, with an #Evidence hash to capture remarks
 
 
 
=== Phenotype form submission requests ===
 
* Originally we sent email requests independently of Author First Pass pipeline emails
 
* Now a link to phenotype form is being included in Author First Pass emails, and we're putting a hold on all new email requests to corresponding author for one month
 
* I (Chris) am considering returning to original approach of sending requests independently of AFP (and focusing on first authors), to see if we can get response rate up at all
 
 
 
=== New WormBase Caltech server ===
 
* We're getting a new server, to consolidate different computers and their services
 
* Let Raymond know if you have an idea for another (extra) use of the server
 
* Will likely re-use altair.caltech.edu
 
* 96 GB memory, ~6 TB hard drive
 
 
 
=== SPELL ===
 
* SPELL could go on new server
 
* Moved from current virtual machine to KVM
 
* Could expand memory and CPU
 
 
 
 
 
== July 20, 2017 ==
 
 
 
=== Variation effect assignment ===
 
* We had discussed last week having either of two models in the ?Variation class
 
* First:
 
?Variation  Variation_effect  Null  ?Text  #Evidence
 
* Second:
 
?Variation  Variation_effect  Null  ?Paper  #Evidence
 
* The first option can keep things as they are and can allow a remark in the ?Text entry, with paper or person evidence coming from the #Evidence hash
 
* The second option likely requires that we consider person evidence/communication as ?Paper objects
 
* Many entries in the Phenotype OA coming from Jonathan Hodgkin are from the C. elegans I and II books
 
** Would be good to create ?Paper objects for C. elegans I and II and reference them accordingly
 
*** Already exist:
 
**** C. elegans II WBPaper00004071
 
**** C. elegans I  WBPaper00004052
 
* Probably best to go with
 
?Variation  Variation_effect  Null  ?Text  #Evidence
 
* Going forward we can encourage people to micropublish if they want to submit personal communication
 
 
 
=== Blog posts for WormMine ===
 
* Not quite ready; Chris will contact Ranjana about it when ready
 
 
 
=== WormMine templates ===
 
* Todd had encouraged development of template queries
 
* Older templates are still difficult to access and edit, but new queries going forward are OK
 
 
 
=== New Server ===
 
* New WormBase server (@ Catlech) is online
 
* Let Raymond know if anyone has uses for the server
 
* Anything that needs to be running 24-7 (e.g. services) can be considered
 
 
 
=== Supplement for AGR ===
 
* Proposing tuning up of orthology calls
 
* Paul S. wants to get Gene Orienteer back up, and have RNASeq analysis pipeline
 
* Pull in WormNet data (predicted interactions)
 
 
 
 
 
== July 27, 2017 ==
 
 
 
=== AGR GO Slim ===
 
*Discussion of new AGR GO slim and representation of C. elegans genes
 
*[https://github.com/geneontology/go-ontology/issues/13791 Modification to AGR slim]
 
*Key to graphs
 
**Red = AGR slim
 
**Orange = C. elegans orphans
 
**Multi-colors = Other organism orphans, as well
 
 
 
 
 
== August 3, 2017 ==
 
 
 
=== WB grant ===
 
* Progress (maintenance stuff) can be written as of last grant cycle
 
* Current projects and report the steady state
 
 
 
=== CRISPR alleles ===
 
* Some authors not reporting CRISPR alleles with standard nomenclature
 
* Some in Paul's lab creating same knockin with same name (should be distinct). Mary Ann to update nomenclature guidelines.
 
* Some (e.g. Bruce Bowerman) knock-in GFP in frame and use RNAi against GFP to knockdown endogenous gene
 
** RNAi mapping pipeline not setup to handle this
 
** Could maybe use endogenous sequence near insertion site (not ideal, for off-target effects)
 
** Maybe best, once new ?Phenotype_experiment model is in place, to just refer to perturbed gene
 
 
 
=== Ontology Browser ===
 
* Has been less stable lately; Raymond has needed to restart machine
 
* Being served from Dell server (in Braun building)
 

Latest revision as of 15:24, 20 January 2022

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

2022 Meetings

January

January 13th, 2022

tm variation - gene associations

  • Update on progress and some questions for the Caltech curators
  • Background: not all variations were being associated with genes in the OA table because some of those associations are in WS but not in geneace, so weren't coming through in the nightly geneace dump. Some variation-gene associations are made as part of the VEP pipeline during the build.
  • Wen now downloads several full ACeDB classes from the latest WS release in the form of .ace files so we can also have whatever information is in WS. Raymond wrote a script to sync those files to tazendra for further processing/use.
  • A few questions that we want to confirm before going forward:
    • In the WS variations file, there are 2,130,801 total variations (1,911,339 total Live) while in postgres there are currently 106,080.
      • Only include Status = Live variations?
      • Include regardless of whether there is an associated gene (this seems to be the current practice?).
      • Currently, some variations with a given Method, e.g. Million_mutation, are NOT included. We would continue this filtering.
        • SNP
        • WGS_Hawaiian_Waterston
        • WGS_Pasadena_Quinlan
        • WGS_Hobert
        • Million_mutation
        • WGS_Yanai
        • WGS_De_Bono
        • WGS_Andersen
        • WGS_Flibotte
        • WGS_Rose
      • Do we want other filters?
    • For genes, the ace file contains ALL the gene objects in WB regardless of species.
      • We've recently had an author request, via the Acknowledge pipeline, to associate genes of other, less well studied Caenorhabditis species, e.g. C. inopinata, to their paper.
      • Do we want all Caenorhabditis (and other nematode) species genes in our various gene tables, e.g. obo, paper? Any other species?
      • The effect on the autocomplete, if we include all, probably won't be a problem 1,018,332 vs 306116)
      • Some of the gene ids from other species don't have 'WBGene' prefixes, e.g. Sp34_10109610. Should we keep this in a separate table from genes with 'WBGene' prefixes?

January 20th, 2022

Proposal for updating gene and variation information from WS releases

Genes

  • Have two tables:
    • One continues as is - contains only ids for the core nematode species (all have WBGene ids)
    • Second, new table - contains non-WGene ids for comparator nematode genomes
      • Include other elegans and remanei strains?
    • Would not include ids for non-WB (and WBParaSite) genomes, e.g. Drosophila or budding yeast

Variations

  • Include all variations that have a value for:
    • Method - current filters applied (filter SNP, Million_mutation, WGS's)
    • Species - all
    • Status - include all three status values (Live, Dead, Suppressed)
  • Whether a variation has a gene association doesn't matter (not a filter criteria for postgres)
  • From Paul D. - a number of variations in geneace were not making their way as individual objects to WS during the build and so were only created in WS via xref (hence the lack of other information). He's updated geneace with Species and other information wherever possible for the next build.
  • Variation merges are infrequent; previous ones may have been due to nameserver issues
  • New Methods arise infrequently, but we could check our parsing script against the list of Methods in each release to make sure we're up-to-date. Would need an inclusion and exclusion list.