Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
(414 intermediate revisions by 11 users not shown)
Line 19: Line 19:
 
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
GoToMeeting link: https://www.gotomeet.me/wormbase1
 
  
  
= 2019 Meetings =
 
  
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
+
= 2020 Meetings =
  
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
+
[[WormBase-Caltech_Weekly_Calls_January_2020|January]]
  
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
+
[[WormBase-Caltech_Weekly_Calls_February_2020|February]]
  
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
+
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
  
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
+
[[WormBase-Caltech_Weekly_Calls_April_2020|April]]
  
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
+
[[WormBase-Caltech_Weekly_Calls_May_2020|May]]
  
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
+
[[WormBase-Caltech_Weekly_Calls_June_2020|June]]
  
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
+
[[WormBase-Caltech_Weekly_Calls_July_2020|July]]
  
  
== September 12, 2019 ==
+
==August 6th, 2020==
  
=== Update on SVM pipeline ===
+
===Experimental conditions data flow into Alliance===
* New SVM pipeline: more analysis and more parameter tuning
+
*Experimental conditions in disease annotations: WB has inducers (used to recapitulate the disease condition) and modifiers (a modifier can ameliorate, exacerbate, or have no effect, on the disease condition)
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
+
*We use the WB Molecule CV for Inducers and Modifiers in disease annotation
* For example shown, "dumb" machine starts out with precision above 0.6
+
*Experimental conditions in phenotype annotations: are free text (captured in remarks); will probably need to formalize later on
* G-value (Michael's invention); does not depend on distribution of sets
+
*So for data flow into Alliance:
* Applied to various data types
+
**In the short term we will load the Molecule CV into the Alliance (Ranjana and Michael P. will work on this)
* Analysis: 10-fold cross validation
+
**Groups will switch to using common data model that works for all and common ontology/ontologies in the near future.
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
+
* How do we handle genetic sex? Part of condition?
* F-value changes over different p/n values; G-value does not (essentially flat)
+
** Condition has been intended for external/environmental conditions, whereas genetic sex is inherent to the organism of study
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
+
** Expression pattern curation needs genetic sex; needs a model at the Alliance for capturing sex
* AUC values for many WB data types upper 80%'s into 90%'s
 
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 
* Michael can provide training sets he has used recently
 
  
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
 
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
 
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 
* Definitions include meanings or words:
 
** "Variations in the ability"
 
** "aberrant"
 
** "defect"
 
** "defective"
 
** "defects"
 
** "deficiency"
 
** "deficient"
 
** "disrupted"
 
** "impaired"
 
** "incompetent"
 
** "ineffective"
 
** "perturbation that disrupts"
 
** Failure to execute the characteristic response = abnormal?
 
** abnormal
 
** abnormality leading to specific outcomes
 
** fail to exhibit the same taxis behavior = abnormal?
 
** failure
 
** failure OR delayed
 
** failure, slower OR late
 
** failure/abnormal
 
** reduced
 
** slower
 
  
=== Citace upload ===
+
== August 13, 2020 ==
** Tuesday, Sep 24th
 
  
=== Strain to ID mapping ===
+
=== Species in Postgres and ACEDB/Datomic ===
* Waiting on Hinxton to send strain ID mapping file?
+
* Want to dump "Affected By Pathogen" fields in Phenotype OA and RNAi OA
* Hopefully we can all get that well before the upload deadline
+
* Want to be sure that what gets dumped aligns with species loaded into ACEDB
* Will do global replacement at time of citace upload (at least for now)
+
* Currently one species annotated not in WS277: Streptococcus gallolyticus subsp. gallolyticus
 +
* We currently have multiple Postgres tables for storing species lists:
 +
** pap_species_index (used by "Affected By Pathogen" fields, AFP); Kimberly uses to assign species to papers and occasionally adds new ones
 +
** obo_name_ncbitaxonid
 +
** obo_name_taxon (original, smaller list)
 +
** h_pap_species_index (history for pap_species_index)
 +
* How do species get loaded into ACEDB? Dumps from Postgres? Which table(s)?
 +
* WS277 has 7,906 species (1,936 have no NCBI Taxon ID)
 +
* Kimberly has occasionally uploaded a species.ace file in the context of GO curation; but Hinxton otherwise handles it; should ask them
 +
* New species are associated with paper objects, but otherwise no additional data for those species come from Caltech
 +
* It might be useful to have species pages in WB that at least list papers for which we have species associations, maybe include other information?
  
=== New name server ===
+
=== WS279 Citace upload ===
* When will this officially go live?
+
* When is it happening? Not sure; not on release schedule right now
* Will we now be able to request strain IDs through the server? Yes
 
  
=== SObA Graphs ===
+
=== SOLR server security (IMSS) ===
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
+
* IMSS network security blocked network on our server due to its open SOLR web access.
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
+
* Part of AMIGO stack, very old version, drives our ontology browser directly, SObA, Enrichment tools indirectly.
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
+
* Added some firewall/URL filter and IMSS opens up the network (for now). IMSS still gripes about its service is open to the world.
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
+
 
 +
=== Alzheimer's disease portal ===
 +
* Supplement grant awarded to Alliance for an Alzheimer's disease portal
 +
* Could involve automated/concise descriptions, interactions, etc.
 +
* Could establish useful pipelines that could be reused in other contexts

Latest revision as of 21:01, 13 August 2020

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings



2020 Meetings

January

February

March

April

May

June

July


August 6th, 2020

Experimental conditions data flow into Alliance

  • Experimental conditions in disease annotations: WB has inducers (used to recapitulate the disease condition) and modifiers (a modifier can ameliorate, exacerbate, or have no effect, on the disease condition)
  • We use the WB Molecule CV for Inducers and Modifiers in disease annotation
  • Experimental conditions in phenotype annotations: are free text (captured in remarks); will probably need to formalize later on
  • So for data flow into Alliance:
    • In the short term we will load the Molecule CV into the Alliance (Ranjana and Michael P. will work on this)
    • Groups will switch to using common data model that works for all and common ontology/ontologies in the near future.
  • How do we handle genetic sex? Part of condition?
    • Condition has been intended for external/environmental conditions, whereas genetic sex is inherent to the organism of study
    • Expression pattern curation needs genetic sex; needs a model at the Alliance for capturing sex


August 13, 2020

Species in Postgres and ACEDB/Datomic

  • Want to dump "Affected By Pathogen" fields in Phenotype OA and RNAi OA
  • Want to be sure that what gets dumped aligns with species loaded into ACEDB
  • Currently one species annotated not in WS277: Streptococcus gallolyticus subsp. gallolyticus
  • We currently have multiple Postgres tables for storing species lists:
    • pap_species_index (used by "Affected By Pathogen" fields, AFP); Kimberly uses to assign species to papers and occasionally adds new ones
    • obo_name_ncbitaxonid
    • obo_name_taxon (original, smaller list)
    • h_pap_species_index (history for pap_species_index)
  • How do species get loaded into ACEDB? Dumps from Postgres? Which table(s)?
  • WS277 has 7,906 species (1,936 have no NCBI Taxon ID)
  • Kimberly has occasionally uploaded a species.ace file in the context of GO curation; but Hinxton otherwise handles it; should ask them
  • New species are associated with paper objects, but otherwise no additional data for those species come from Caltech
  • It might be useful to have species pages in WB that at least list papers for which we have species associations, maybe include other information?

WS279 Citace upload

  • When is it happening? Not sure; not on release schedule right now

SOLR server security (IMSS)

  • IMSS network security blocked network on our server due to its open SOLR web access.
  • Part of AMIGO stack, very old version, drives our ontology browser directly, SObA, Enrichment tools indirectly.
  • Added some firewall/URL filter and IMSS opens up the network (for now). IMSS still gripes about its service is open to the world.

Alzheimer's disease portal

  • Supplement grant awarded to Alliance for an Alzheimer's disease portal
  • Could involve automated/concise descriptions, interactions, etc.
  • Could establish useful pipelines that could be reused in other contexts