Revision as of 16:39, 19 September 2019

Previous Years

2019 Meetings

January

February

September 12, 2019

Update on SVM pipeline

New SVM pipeline: more analysis and more parameter tuning
avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
For example shown, "dumb" machine starts out with precision above 0.6
G-value (Michael's invention); does not depend on distribution of sets
Applied to various data types
Analysis: 10-fold cross validation
- Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
F-value changes over different p/n values; G-value does not (essentially flat)
Area Under the Curve (AUC): probability that a random positive scores higher than random negative
AUC values for many WB data types upper 80%'s into 90%'s
Ranjana: How many papers for a good training set? Michael: we don't know yet
Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
Definitions include meanings or words:
- "Variations in the ability"
- "aberrant"
- "defect"
- "defective"
- "defects"
- "deficiency"
- "deficient"
- "disrupted"
- "impaired"
- "incompetent"
- "ineffective"
- "perturbation that disrupts"
- Failure to execute the characteristic response = abnormal?
- abnormal
- abnormality leading to specific outcomes
- fail to exhibit the same taxis behavior = abnormal?
- failure
- failure OR delayed
- failure, slower OR late
- failure/abnormal
- reduced
- slower

Citace upload

- Tuesday, Sep 24th

Strain to ID mapping

Waiting on Hinxton to send strain ID mapping file?
Hopefully we can all get that well before the upload deadline
Will do global replacement at time of citace upload (at least for now)

New name server

When will this officially go live?
Will we now be able to request strain IDs through the server? Yes

SObA Graphs

New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
- Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology

September 19, 2019

Strains

Need to wait for new strain IDs from Hinxton before running dumping scripts
Don't edit multi-ontology strain fields in OA for now!
Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
"Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

Working group will be formed soon
Will work out general common pipelines for literature curation

SObA Graph relations

Currently only integrating over "is a", "part of" and "regulates"
Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

Putting together paper for AFP
Reviewing all user input for paper
Asking individual curators to check input

Difference between revisions of "WormBase-Caltech Weekly Calls"

Revision as of 16:39, 19 September 2019

Contents

Previous Years

2019 Meetings

September 12, 2019

Update on SVM pipeline

Clarifying definitions of "defective" and "deficient" for phenotypes

Citace upload

Strain to ID mapping

New name server

SObA Graphs

September 19, 2019

Strains

Alliance literature curation

SObA Graph relations

Author First Pass

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 1: / Line 1: @@
+= Previous Years =
 [[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]
-==February 10, 2011==
+[[WormBase-Caltech_Weekly_Calls_2011|2011 Meetings]]
+[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]
-*Should we have a Caltech project/site manager? We'll look at the issues to be solved first
+[[WormBase-Caltech_Weekly_Calls_2013|2013 Meetings]]
-*Todd would like to know more about what's going on at Caltech
+[[WormBase-Caltech_Weekly_Calls_2014|2014 Meetings]]
-**What would Todd like to know about? New ideas, data types?
-**What can we communicate more effectively to Todd?
-*Should we take minutes of Caltech WormBase group meetings and send around?
+[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
-*Kimberly: helpful (for off site individuals) if we go around the room to talk about what each of us is working on
+[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
-*Bitbucket use
+[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
-**How do we see only what pertains to us, individually?
-**Paul: good way to document bugs and fixes, problems/solutions
-**Bitbucket Wiki - does that capture what people want/need to see?
-**Who should Todd follow? What would Todd like to know about?
-**Raymond: used for code development/versioning by OICR; may be the best use for Bitbucket
-**Should we develop best practices guideline for Bitbucket use?
-***Avoid posting topics that are too specific or too vague?
-***What do we want most to get out of Bitbucket?
-**Curation efforts? No
-*SAB Meeting
+[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
-**Not much feedback for literature curators
-**User interface stuff took precedence
-**Paul: Testing, spot-checking website for errors
-**Web issues, e-mails from Gary and Oliver
-*WormBase-wide conference call twice per month
-**First & third Thursday of the month
-**Starting next Thursday (2-17) @ 8:30am PST
-**Web redesign meetings on Thursdays will have to be every other week? Do both same day?
-*Raymond: WormBase mirrors
+GoToMeeting link: https://www.gotomeet.me/wormbase1
-**Do we want to implement the Genome Browser for the mirror(s)? If practical
-**It seems as though Caltech mirror has been crashing often
-**Has the Caltech mirror been working for people lately? Yes, more or less
-== February 17, 2011 ==
+= 2019 Meetings =
-Interaction dumping script
+[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
-*Spot check table
-*Push changes to Git? Yes, to main branch
+[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
-Interactions automatically downloaded to FTP every release? Yes
+[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
-*People should check their own curation data
-*Users query frequently?
-*Can WormMart work instead?
-*Ruihua - feature can be added
-*Todd - good to have pre-defined queries generated automatically
-*Parse/remove predicted interactions (400,000) from others during dump? Yes
-*Paul - Wei Wei wants to help with updates - who should she contact at WB?
+[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
-Ranjana - Solar flares causing static on phone? ;)
+[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
+[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
-Newsletter for new website release? Yes
+[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
-*Todd - 5 development milestones
-**teleconference with Gary Ruvkun's lab next week
-**Go live with Beta version in June (@IWM)
-**Go live live with new site in September
-**Retire old site at end of year
-**Need outreach, documentation, screencasts
+[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
-BioGRID
-*WormBase to BioGRID curation
-*Issues of mapping interaction types: WB -> BioGRID vs. BioGRID -> WB
-*Moving towards using WB interaction types
-*Definitions of our interaction types
-*Rose will propose to BioGRID community
-*Physical interactions - BioGRID has a better format
-**break physical interactions away from other interaction types?
-**Kimberly can propose changes to model: YH separate from other interactions?
-**Physical vs Genetic interactions? keep separate
-*We need an interaction ontology, of some sort
-*Who should Wei Wei contact, how to coordinate with Hinxton; getting data from FlyBase and SGD?
-**Automation?
-*Interaction pages - explain what the interaction is based on; provenance/reference
-*Gene Orienteer shows data, but user has to link out
+== September 12, 2019 ==
-WormBase IWM souvenir?
+=== Update on SVM pipeline ===
-*Computer stress ball? with new site address?
+* New SVM pipeline: more analysis and more parameter tuning
-*bouncy ball?
+* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
-*worm rubber band thing?
+* For example shown, "dumb" machine starts out with precision above 0.6
-*Screen shield?
+* G-value (Michael's invention); does not depend on distribution of sets
-*Complementary iPads? ;)
+* Applied to various data types
-*Antifungal socks? wristbands?
+* Analysis: 10-fold cross validation
+** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
+* F-value changes over different p/n values; G-value does not (essentially flat)
+* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
+* AUC values for many WB data types upper 80%'s into 90%'s
+* Ranjana: How many papers for a good training set? Michael: we don't know yet
+* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
+* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
+* Michael can provide training sets he has used recently
+=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
+* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
+* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
+* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
+* Definitions include meanings or words:
+** "Variations in the ability"
+** "aberrant"
+** "defect"
+** "defective"
+** "defects"
+** "deficiency"
+** "deficient"
+** "disrupted"
+** "impaired"
+** "incompetent"
+** "ineffective"
+** "perturbation that disrupts"
+** Failure to execute the characteristic response = abnormal?
+** abnormal
+** abnormality leading to specific outcomes
+** fail to exhibit the same taxis behavior = abnormal?
+** failure
+** failure OR delayed
+** failure, slower OR late
+** failure/abnormal
+** reduced
+** slower
-SPELL
+=== Citace upload ===
-*new data doesn't load; problem runnning search engine; can load it on athena, not generalizable problem?
+** Tuesday, Sep 24th
-*OS was not complete with an update; try again
-*two virtual machines working back and forth for support
-*how often is SPELL being used? log?; had 6-7 users within two weeks complaining of SPELL problems
-*Todd - can send Google analytic tool to put at bottom of page
-*Wen - SPELL testing server; official and mirror; want a separate testing server for new releases; mirror on athena (Wen's working machine); Raymond says don't want to attract users to that machine; WB's running SPELL on altair; heavily loaded and has had problems; relying on athena may be problematic
-*maybe we need analysis of all of our machines to see big picture; can we consolidate machines; make use of ones we have? If not we can get another machine.
-*what type of machine do we need?
-*Log of all machines with purchase date and functionality?
-*Linux vs Mac curators? athena development machine;
-*how much power do the scripts need? can check
-*may not be appropriate as official server if main machine goes down
-*WB and athena can run SPELL server, but not others. security problems?
-*prefer work machine is readily re-bootable; reconfigure software easily without affecting other things that people rely on
-*need 10GB space; 4GB memory? SPELL production server; want it stable for outside world; hardware designed for that purpose; farm it out to IMSS?
-*complexity of the application; Linux can run multiple applications; caprica has SVM production and WormMart; understand demand of the application; requirements; development vs running scripts, how often? Efficient use of machines. Have 2 computers, getting a third; for WormMart need power; traffic suddenly increased yesterday; monitor traffic; not much personal experience with server maintenance; cluster needs lots of expertise
-*building official sever of SPELL at OICR? Yes
+=== Strain to ID mapping ===
+* Waiting on Hinxton to send strain ID mapping file?
+* Hopefully we can all get that well before the upload deadline
+* Will do global replacement at time of citace upload (at least for now)
-WormMart
+=== New name server ===
-*with WormMart and WormBase, people just want to know if it's down, just to know to lessen frustration; Official statement from dev team
+* When will this officially go live?
-*WormMart - we give users testing url; can we just change WormBase link to that testing server?
+* Will we now be able to request strain IDs through the server? Yes
-*going to change HTML to update users with most recent information; discuss with Lincoln, 5 datasets on testing server; remaining 3 datasets. if WormMart not working, put a message on main site
-*it doesn't look good to have a link to a tool linked from a production server; if we don't trust the data we shouldn't put it up.
-*would like to keep the html page as a testing server; could modify page
-*if WormMart is working OK but not perfect, that's OK; Comment in the banner
-*the html page is a useful tool to get feedback from users
+=== SObA Graphs ===
+* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
+* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
+* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
+** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
-== February 24, 2011 ==
-CIT Computer Survey - done
+== September 19, 2019 ==
-CD/DVD burner not working (Wen)
+=== Strains ===
+* Need to wait for new strain IDs from Hinxton before running dumping scripts
+* Don't edit multi-ontology strain fields in OA for now!
+* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
+* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
-Build Scripts:
+=== Alliance literature curation ===
-*run @ OICR
+* Working group will be formed soon
-*dynamic/static files
+* Will work out general common pipelines for literature curation
-*problem: no one takes care of them
-*curators should claim them and take care of them
-**Gene interaction dump script claimed by Raymond and Xiaodong
-**256 interaction objects empty (should not be); emptied during build?
-*Monitor usage with new site
-*Files (like FASTA, GFF) should be immediately available to users
-*Page that links to the scripts (FTP)
-*Need to define what we should have (common datasets)
-**Example: Table with all genes, RNAi phenotype, genetic phenotype, etc.
-**How many genes have been knocked out by RNAi?, etc.
-**Metrics table on each species: #genes, #chromosomes, etc.
-*Curators think about what data should be available
-*Modifying existing script is easier than making new one
-*Gary Williams already generates RNAi-phenotype-gene connections?
-Snehal work half-time at WormBase?
+=== SObA Graph relations ===
-*Will interview
+* Currently only integrating over "is a", "part of" and "regulates"
-*UI testing?
+* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
-*Curation? GO? Phenotype?
-*Start April 15th - 8 months? (Sept.) maybe longer
-*Focus on particular papers?
-Physical Interaction model
+=== Author First Pass ===
-*Have been discussing with Rose at BioGRID
+* Putting together paper for AFP
-*Y2H, CoIPs, Pull downs, etc.
+* Reviewing all user input for paper
-*Store physical interactions in WB outside of GO curation
+* Asking individual curators to check input
-*Data model
-**YH model vs Interaction model
-**Physical interaction tag in current interaction model unused
-**Generalize YH model -> physical interaction model
-**Add experiment types
-**Issue: what do we do with existing physical interaction tag in interaction model?
-**Split out different interaction types into individual classes?
-**Separate into physical, genetic, predicted classes? Textpresso?
-**Create separate classes for each experimental type? (Y2H, CoIP, etc.)
-**XREF Interaction to Physical interaction, Genetic interaction
-**Goal is for cleanliness and completeness
-**How to capture experimental-specific info in curation and database?
-**Suppress Y2H import from BioGRID?
-**Keep interaction models as they are?
-**All curators: Review interaction models to see what they want on the curation end