Revision as of 16:39, 19 September 2019

Previous Years

2019 Meetings

January

February

September 12, 2019

Update on SVM pipeline

New SVM pipeline: more analysis and more parameter tuning
avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
For example shown, "dumb" machine starts out with precision above 0.6
G-value (Michael's invention); does not depend on distribution of sets
Applied to various data types
Analysis: 10-fold cross validation
- Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
F-value changes over different p/n values; G-value does not (essentially flat)
Area Under the Curve (AUC): probability that a random positive scores higher than random negative
AUC values for many WB data types upper 80%'s into 90%'s
Ranjana: How many papers for a good training set? Michael: we don't know yet
Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
Definitions include meanings or words:
- "Variations in the ability"
- "aberrant"
- "defect"
- "defective"
- "defects"
- "deficiency"
- "deficient"
- "disrupted"
- "impaired"
- "incompetent"
- "ineffective"
- "perturbation that disrupts"
- Failure to execute the characteristic response = abnormal?
- abnormal
- abnormality leading to specific outcomes
- fail to exhibit the same taxis behavior = abnormal?
- failure
- failure OR delayed
- failure, slower OR late
- failure/abnormal
- reduced
- slower

Citace upload

- Tuesday, Sep 24th

Strain to ID mapping

Waiting on Hinxton to send strain ID mapping file?
Hopefully we can all get that well before the upload deadline
Will do global replacement at time of citace upload (at least for now)

New name server

When will this officially go live?
Will we now be able to request strain IDs through the server? Yes

SObA Graphs

New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
- Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology

September 19, 2019

Strains

Need to wait for new strain IDs from Hinxton before running dumping scripts
Don't edit multi-ontology strain fields in OA for now!
Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
"Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

Working group will be formed soon
Will work out general common pipelines for literature curation

SObA Graph relations

Currently only integrating over "is a", "part of" and "regulates"
Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

Putting together paper for AFP
Reviewing all user input for paper
Asking individual curators to check input

@@ Line 16: / Line 16: @@
 [[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
+[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
@@ Line 21: / Line 23: @@
-= 2018 Meetings =
+= 2019 Meetings =
-[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
-[[WormBase-Caltech_Weekly_Calls_February_2018|February]]
-[[WormBase-Caltech_Weekly_Calls_March_2018|March]]
-[[WormBase-Caltech_Weekly_Calls_April_2018|April]]
-[[WormBase-Caltech_Weekly_Calls_May_2018|May]]
-== June 7, 2018 ==
+[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
-=== Alliance literature pipeline working group ===
+[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
-* Has anyone gotten back to Carol Bult about WB/Textpresso membership in the group? Not yet
-* Paul & Kimberly can join; Kimberly will sit in on group meetings
-=== SPELL migration ===
+[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
-* SGD put SPELL into the cloud; will give us code to set up WB SPELL in cloud
-* Want to have common interface but two instances (yeast & worm)
-* Worm data in SPELL may be more complicated than the yeast data (different species, platforms, meta data, etc.)
-* Mike Cherry proposed to use GTEx to replace SPELL. (www.gtexportal.org)
-* Trying to get GTEX to have same functionality as SPELL
-=== Progress report ===
+[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
-* Wen can generate numbers of changes since last year
-* 5-year progress report coming up (for funding agencies)
-* Want a progress report-like document to give to SAB in July
-* WS259 - WS265
-=== Paper author name ===
+[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
-* Jonathan Ewbank wrote in about an incorrect author name
-* Issue for WBPaper00048704: https://wormbase.org/resources/paper/WBPaper00048704#0--10
-* First author is "Li, C" as in PubMed (correct name) in WormBase as "Chun, L"
-* Will make a GitHub ticket; web team may need to look at
-=== Lab class ===
+[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
-* Lab data is not clean/consistent; has gotten messy
-* Cecilia trying to clean it up
-* There is conflicting data from lab class, author class, and person class
-* Currently there is a ~redundant curation pipeline; considering pulling person info into lab class
-* When looking at lab page, if there is a problem with a person's info, changes would be made (requested) in the person class, not the lab class
-* Cecilia will contact labs to ask about correctness of information
-* Can have a discussion with Ann and Aric at CGC
-=== How many C. elegans genes have "good" knockouts? ===
+[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
-* Paul giving talk next week, wants to report
-* Chris will look into
-* Mitani did 1,000 deletions in last year
-=== Server logs ===
+[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
-* SimpleMine logs are all coming from Amazon; will need to ask Todd about SimpleMine usage (from AWS stats?)
-=== Migration of lab data ===
-* Data for labs was coming from Name Server (in Hinxton) - <font color= red>Comment:</font>[PD] Not coming from the Nameserver, geneace was the only source of Lab data prior to handover.
-* Now Paul D has stopped curating lab info - <font color= red>Comment:</font>[PD] Build config has been updated to take the majority of lab data from citace rather than geneace
- Database        File                                    Class                   remove/format
- ----------------------------------------------------------------------------------------------------------------
- db=geneace      file=geneace_Laboratory.ace             class=Laboratory        format="Alleles WBVar\d{8}"     delete=Representative   delete=Registered_lab_members   delete=Past_lab_members delete=Allele_designation       delete=Strain_designation       delete=Address
- db=citace       file=caltech_Laboratory.ace             class=Laboratory        delete=Alleles  format="Representative WBPerson\d{1,5}" format="Registered_lab_members WBPerson\d{1,5}" format="Past_lab_members WBPerson\d{1,5}"
-* Now that Cecilia is curating, will pull lab info from Postgres
+== September 12, 2019 ==
+=== Update on SVM pipeline ===
+* New SVM pipeline: more analysis and more parameter tuning
+* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
+* For example shown, "dumb" machine starts out with precision above 0.6
+* G-value (Michael's invention); does not depend on distribution of sets
+* Applied to various data types
+* Analysis: 10-fold cross validation
+** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
+* F-value changes over different p/n values; G-value does not (essentially flat)
+* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
+* AUC values for many WB data types upper 80%'s into 90%'s
+* Ranjana: How many papers for a good training set? Michael: we don't know yet
+* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
+* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
+* Michael can provide training sets he has used recently
-== June 21, 2018 ==
+=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
+* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
+* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
+* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
+* Definitions include meanings or words:
+** "Variations in the ability"
+** "aberrant"
+** "defect"
+** "defective"
+** "defects"
+** "deficiency"
+** "deficient"
+** "disrupted"
+** "impaired"
+** "incompetent"
+** "ineffective"
+** "perturbation that disrupts"
+** Failure to execute the characteristic response = abnormal?
+** abnormal
+** abnormality leading to specific outcomes
+** fail to exhibit the same taxis behavior = abnormal?
+** failure
+** failure OR delayed
+** failure, slower OR late
+** failure/abnormal
+** reduced
+** slower
-=== SAB ===
+=== Citace upload ===
-* Lightning talks at project meeting; everyone should consider what they want to present (5 minutes with 5 minutes for questions)
+** Tuesday, Sep 24th
-* 45 minutes curation talk to SAB; Chris volunteered; anyone else interested?
-* What data types are being incorporated with the Alliance? What have we gained from those conversations?
-* How have Alliance interactions benefited us?
-* How does the Datomic migration affect our curation and data models?
-* Alliance data models should be union of existing MOD data models, but does not require curation of all attributes at each MOD
-* Do WB and our users benefit from creating a ?Genotype class?
-=== Phenotype & Disease Face-to-Face ===
+=== Strain to ID mapping ===
-* Reviewed phenotype curation practices at each MOD
+* Waiting on Hinxton to send strain ID mapping file?
-* Discussed strain and genotype classes
+* Hopefully we can all get that well before the upload deadline
-* Generally it's felt that genotypes should only represent full genotypes of actual individuals or strains
+* Will do global replacement at time of citace upload (at least for now)
-* For phenotype annotations WB will still need a mechanism to attribute the specific genotypic components that are responsible for the observed phenotype along with a way to capture the complete or background genotype for context
-* MGI considers strains to be part of a genotype (i.e. background inbred strain into which alleles are introduced)
-* WB and SGD consider genotype to be part of strain
-* WB will still consider moving ahead with instantiating a ?Genotype class for capturing genotypes (e.g. for disease models) that don't have an explicitly reported strain name; also for capturing transient genotypes like heterozygosity as well as paternal and maternal contributions/genotypes
+=== New name server ===
+* When will this officially go live?
+* Will we now be able to request strain IDs through the server? Yes
+=== SObA Graphs ===
+* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
+* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
+* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
+** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
-== June 28, 2018 ==
-=== Bulk emailing community for phenotypes ===
+== September 19, 2019 ==
-* Chris & Juancarlos have now setup a pipeline to send requests in bulk
-* Can send 75 emails at a time; GMail may limit to 500 per day (was hoping for 3,000 at a time)
-* Sent out 338 emails in past week; already received annotations for 18 papers, ~50 annotations
-* Papers go back to 1987, earliest paper received curation from in last week 2008
-* Have sent about 25 papers from 2018, got annotations for 5 of them
-* Ask SAB about email frequency
-* Would be good to link to a Textpresso search
-* Maybe make a link people can click on to indicate there is no phenotype data
-* Can a submitter chat with a curator? Maybe integrate Olark chat? Would be good to make available, indicate how to chat
-=== SAB ===
+=== Strains ===
-* Chris & Paul will present (curation & intro respectively)
+* Need to wait for new strain IDs from Hinxton before running dumping scripts
-* Tools? Raymond can bring up enrichment analysis
+* Don't edit multi-ontology strain fields in OA for now!
-* Project meeting
+* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
-** Everyone should sign up for a topic if they haven't already
+* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
-=== Phenotype Ontology ===
+=== Alliance literature curation ===
-* Is the Phenotype ontology growing? Very slowly
+* Working group will be formed soon
-* Received several suggested phenotypes from community in last batch
+* Will work out general common pipelines for literature curation
-* Want to focus on logical definitions
-* Mammalian Phenotype ontology - precomposed terms with logical definitions
-* Raymond trying to use the SObA approach to display logically defined phenotype terms; needs to figure out how to effectively model it
-* Should get useful feedback at the ICBO meeting in August
-** Would be good to establish ahead of time what we want to get from the meeting
-* Want feedback on:
-** Pre-composed vs. post-composed
-** Granularity
-** Logical definitions
-* Looking up phenotypes: want to make more user friendly
-** We could map phenotype terms to processes to allow community to find relevant terms
-** Want to find "male hook" phenotypes; correct term is "male copulatory structure variant"; the term "hook" should find "male copulatory structure"
-* What is the best use of a user's time? curator's time?
-* How are ontology logical definitions being used now?
-** Kimberly: To calculate/reason the inferences based on ontology parentage
-** Logical definitions are not very visible on relevant pages (like Amigo); should we make it more visible to users?
+=== SObA Graph relations ===
+* Currently only integrating over "is a", "part of" and "regulates"
+* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
-=== Protein interactions ===
+=== Author First Pass ===
-* Now up to date; how should we publish this info?
+* Putting together paper for AFP
-* Can make a blog post and/or micropublication and put in next NAR paper
+* Reviewing all user input for paper
+* Asking individual curators to check input

Difference between revisions of "WormBase-Caltech Weekly Calls"

Revision as of 16:39, 19 September 2019

Contents

Previous Years

2019 Meetings

September 12, 2019

Update on SVM pipeline

Clarifying definitions of "defective" and "deficient" for phenotypes

Citace upload

Strain to ID mapping

New name server

SObA Graphs

September 19, 2019

Strains

Alliance literature curation

SObA Graph relations

Author First Pass

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools