Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(497 intermediate revisions by 9 users not shown)
Line 16: Line 16:
  
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
  
Line 21: Line 23:
  
  
= 2018 Meetings =
+
= 2019 Meetings =
 
 
[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_February_2018|February]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_March_2018|March]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_April_2018|April]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_May_2018|May]]
 
 
 
  
== June 7, 2018 ==
+
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
  
=== Alliance literature pipeline working group ===
+
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
* Has anyone gotten back to Carol Bult about WB/Textpresso membership in the group? Not yet
 
* Paul & Kimberly can join; Kimberly will sit in on group meetings
 
  
=== SPELL migration ===
+
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
* SGD put SPELL into the cloud; will give us code to set up WB SPELL in cloud
 
* Want to have common interface but two instances (yeast & worm)
 
* Worm data in SPELL may be more complicated than the yeast data (different species, platforms, meta data, etc.)
 
* Mike Cherry proposed to use GTEx to replace SPELL. (www.gtexportal.org)
 
* Trying to get GTEX to have same functionality as SPELL
 
  
=== Progress report ===
+
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
* Wen can generate numbers of changes since last year
 
* 5-year progress report coming up (for funding agencies)
 
* Want a progress report-like document to give to SAB in July
 
* WS259 - WS265
 
  
=== Paper author name ===
+
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
* Jonathan Ewbank wrote in about an incorrect author name
 
* Issue for WBPaper00048704: https://wormbase.org/resources/paper/WBPaper00048704#0--10
 
* First author is "Li, C" as in PubMed (correct name) in WormBase as "Chun, L"
 
* Will make a GitHub ticket; web team may need to look at
 
  
=== Lab class ===
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
* Lab data is not clean/consistent; has gotten messy
 
* Cecilia trying to clean it up
 
* There is conflicting data from lab class, author class, and person class
 
* Currently there is a ~redundant curation pipeline; considering pulling person info into lab class
 
* When looking at lab page, if there is a problem with a person's info, changes would be made (requested) in the person class, not the lab class
 
* Cecilia will contact labs to ask about correctness of information
 
* Can have a discussion with Ann and Aric at CGC
 
  
=== How many C. elegans genes have "good" knockouts? ===
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
* Paul giving talk next week, wants to report
 
* Chris will look into
 
* Mitani did 1,000 deletions in last year
 
  
=== Server logs ===
+
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
* SimpleMine logs are all coming from Amazon; will need to ask Todd about SimpleMine usage (from AWS stats?)
 
  
=== Migration of lab data ===
 
* Data for labs was coming from Name Server (in Hinxton) - <font color= red>Comment:</font>[PD] Not coming from the Nameserver, geneace was the only source of Lab data prior to handover.
 
* Now Paul D has stopped curating lab info - <font color= red>Comment:</font>[PD] Build config has been updated to take the majority of lab data from citace rather than geneace
 
Database        File                                    Class                  remove/format
 
----------------------------------------------------------------------------------------------------------------
 
db=geneace      file=geneace_Laboratory.ace            class=Laboratory        format="Alleles WBVar\d{8}"    delete=Representative  delete=Registered_lab_members  delete=Past_lab_members delete=Allele_designation      delete=Strain_designation      delete=Address
 
db=citace      file=caltech_Laboratory.ace            class=Laboratory        delete=Alleles  format="Representative WBPerson\d{1,5}" format="Registered_lab_members WBPerson\d{1,5}" format="Past_lab_members WBPerson\d{1,5}"
 
  
* Now that Cecilia is curating, will pull lab info from Postgres
+
== September 12, 2019 ==
  
 +
=== Update on SVM pipeline ===
 +
* New SVM pipeline: more analysis and more parameter tuning
 +
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
 +
* For example shown, "dumb" machine starts out with precision above 0.6
 +
* G-value (Michael's invention); does not depend on distribution of sets
 +
* Applied to various data types
 +
* Analysis: 10-fold cross validation
 +
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
== June 21, 2018 ==
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
 +
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
 +
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
=== SAB ===
+
=== Citace upload ===
* Lightning talks at project meeting; everyone should consider what they want to present (5 minutes with 5 minutes for questions)
+
** Tuesday, Sep 24th
* 45 minutes curation talk to SAB; Chris volunteered; anyone else interested?
 
* What data types are being incorporated with the Alliance? What have we gained from those conversations?
 
* How have Alliance interactions benefited us?
 
* How does the Datomic migration affect our curation and data models?
 
* Alliance data models should be union of existing MOD data models, but does not require curation of all attributes at each MOD
 
* Do WB and our users benefit from creating a ?Genotype class?
 
  
=== Phenotype & Disease Face-to-Face ===
+
=== Strain to ID mapping ===
* Reviewed phenotype curation practices at each MOD
+
* Waiting on Hinxton to send strain ID mapping file?
* Discussed strain and genotype classes
+
* Hopefully we can all get that well before the upload deadline
* Generally it's felt that genotypes should only represent full genotypes of actual individuals or strains
+
* Will do global replacement at time of citace upload (at least for now)
* For phenotype annotations WB will still need a mechanism to attribute the specific genotypic components that are responsible for the observed phenotype along with a way to capture the complete or background genotype for context
 
* MGI considers strains to be part of a genotype (i.e. background inbred strain into which alleles are introduced)
 
* WB and SGD consider genotype to be part of strain
 
* WB will still consider moving ahead with instantiating a ?Genotype class for capturing genotypes (e.g. for disease models) that don't have an explicitly reported strain name; also for capturing transient genotypes like heterozygosity as well as paternal and maternal contributions/genotypes
 
  
 +
=== New name server ===
 +
* When will this officially go live?
 +
* Will we now be able to request strain IDs through the server? Yes
  
 +
=== SObA Graphs ===
 +
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
 +
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
 +
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 +
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
  
== June 28, 2018 ==
 
  
=== Bulk emailing community for phenotypes ===
+
== September 19, 2019 ==
* Chris & Juancarlos have now setup a pipeline to send requests in bulk
 
* Can send 75 emails at a time; GMail may limit to 500 per day (was hoping for 3,000 at a time)
 
* Sent out 338 emails in past week; already received annotations for 18 papers, ~50 annotations
 
* Papers go back to 1987, earliest paper received curation from in last week 2008
 
* Have sent about 25 papers from 2018, got annotations for 5 of them
 
* Ask SAB about email frequency
 
* Would be good to link to a Textpresso search
 
* Maybe make a link people can click on to indicate there is no phenotype data
 
* Can a submitter chat with a curator? Maybe integrate Olark chat? Would be good to make available, indicate how to chat
 
  
=== SAB ===
+
=== Strains ===
* Chris & Paul will present (curation & intro respectively)
+
* Need to wait for new strain IDs from Hinxton before running dumping scripts
* Tools? Raymond can bring up enrichment analysis
+
* Don't edit multi-ontology strain fields in OA for now!
* Project meeting
+
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
** Everyone should sign up for a topic if they haven't already
+
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
=== Phenotype Ontology ===
+
=== Alliance literature curation ===
* Is the Phenotype ontology growing? Very slowly
+
* Working group will be formed soon
* Received several suggested phenotypes from community in last batch
+
* Will work out general common pipelines for literature curation
* Want to focus on logical definitions
 
* Mammalian Phenotype ontology - precomposed terms with logical definitions
 
* Raymond trying to use the SObA approach to display logically defined phenotype terms; needs to figure out how to effectively model it
 
* Should get useful feedback at the ICBO meeting in August
 
** Would be good to establish ahead of time what we want to get from the meeting
 
* Want feedback on:
 
** Pre-composed vs. post-composed
 
** Granularity
 
** Logical definitions
 
* Looking up phenotypes: want to make more user friendly
 
** We could map phenotype terms to processes to allow community to find relevant terms
 
** Want to find "male hook" phenotypes; correct term is "male copulatory structure variant"; the term "hook" should find "male copulatory structure"
 
* What is the best use of a user's time? curator's time?
 
* How are ontology logical definitions being used now?
 
** Kimberly: To calculate/reason the inferences based on ontology parentage
 
** Logical definitions are not very visible on relevant pages (like Amigo); should we make it more visible to users?
 
  
 +
=== SObA Graph relations ===
 +
* Currently only integrating over "is a", "part of" and "regulates"
 +
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
  
=== Protein interactions ===
+
=== Author First Pass ===
* Now up to date; how should we publish this info?
+
* Putting together paper for AFP
* Can make a blog post and/or micropublication and put in next NAR paper
+
* Reviewing all user input for paper
 +
* Asking individual curators to check input

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input