Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
m
Line 1: Line 1:
==June 11, 2009==
+
= Previous Years =
*Discussion of allele-paper connections
 
**Jolene has a Textpresso-based pipeline for identifying alleles in papers
 
**Alleles get associated with a paper if there is phenotypic or molecular data
 
**List of all alleles associated with a paper could possibly be passed on to Sanger for subsequent connection
 
  
*Discussion of gene-paper connections
+
[[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]
**What criteria should be used for associating genes with papers?
 
**Currently being associated manually by a curator during first-pass
 
**Could Textpresso be harnessed for this task?
 
**There are different evidences attached to gene-paper connections
 
**Genes associated with a paper via Textpresso searches could be presented to the first-pass curator for approval in the first-pass form
 
**Need to fix current abstract-gene associating script to include proteins (e.g. PIE-1)
 
  
*Incorporate a customized Textpresso query on WormBase object pages so users could look for additional info if they want
+
[[WormBase-Caltech_Weekly_Calls_2011|2011 Meetings]]
**Data type curator could help design a useful query
 
  
*Other gene-object connections
+
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]
**Develop a canned query for all objects that should be associated with the gene page
 
  
*Worm Meeting
+
[[WormBase-Caltech_Weekly_Calls_2013|2013 Meetings]]
**Pre-meeting - meeting rooms?
 
**Pre-meeting - practice talks for WormBase workshop
 
**Storage for swag - pavillion should be okay
 
  
==June 18, 2009==
+
[[WormBase-Caltech_Weekly_Calls_2014|2014 Meetings]]
  
*Worm Meeting and Pre-Meeting
+
[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
**WormBase Help Desk Schedule - set for worm meeting
 
**Wen may need help carrying boxes of swag
 
**Do any Sanger or WashU people need rides to UCLA?
 
**Mary Alvarez is working on finding rooms for the pre-meeting
 
**Be prepared to report on curation status of your data type and what we might need
 
  
*Author contact information on first-pass form
+
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
**email goes to the first email address found by Textpresso
 
**Raymond - we do care where this data comes from
 
**Erich - can we send it to the designated corresponding author
 
**first-pass curation could capture the corresponding author
 
**if the email didn't go to the corresponding author, it may affect our response rate
 
**two-stage approach proposal
 
***respond to Bernard - we'll try to find corresponding author in the future and add his name to the comments field for now
 
***going forward, how best to determine corresponding author?
 
***keep track of IP address from which the data was sent?
 
***add text to email saying that authors may include their contact information in the comments box, if they wish
 
***Raymond will proof-read papers to check on corresponding author designation and email addresses
 
  
*Zfin visit - PATO progress, worth looking at
+
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
**Erich noted no genome browser - what is the issue?
 
**Their experience with getting figures from journals might be helpful.
 
  
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
==February 10, 2011==
 
  
*Should we have a Caltech project/site manager? We'll look at the issues to be solved first
+
GoToMeeting link: https://www.gotomeet.me/wormbase1
  
*Todd would like to know more about what's going on at Caltech
 
**What would Todd like to know about? New ideas, data types?
 
**What can we communicate more effectively to Todd?
 
  
*Should we take minutes of Caltech WormBase group meetings and send around?
+
= 2019 Meetings =
  
*Kimberly: helpful (for off site individuals) if we go around the room to talk about what each of us is working on
+
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
  
*Bitbucket use
+
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
**How do we see only what pertains to us, individually?
 
**Paul: good way to document bugs and fixes, problems/solutions
 
**Bitbucket Wiki - does that capture what people want/need to see?
 
**Who should Todd follow? What would Todd like to know about?
 
**Raymond: used for code development/versioning by OICR; may be the best use for Bitbucket
 
**Should we develop best practices guideline for Bitbucket use?
 
***Avoid posting topics that are too specific or too vague?
 
***What do we want most to get out of Bitbucket?
 
**Curation efforts? No
 
  
*SAB Meeting
+
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
**Not much feedback for literature curators
 
**User interface stuff took precedence
 
**Paul: Testing, spot-checking website for errors
 
**Web issues, e-mails from Gary and Oliver
 
  
*WormBase-wide conference call twice per month
+
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
**First & third Thursday of the month
 
**Starting next Thursday (2-17) @ 8:30am PST
 
**Web redesign meetings on Thursdays will have to be every other week? Do both same day?
 
  
*Raymond: WormBase mirrors
+
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
**Do we want to implement the Genome Browser for the mirror(s)? If practical
 
**It seems as though Caltech mirror has been crashing often
 
**Has the Caltech mirror been working for people lately? Yes, more or less
 
  
 +
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
  
== February 17, 2011 ==
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
  
Interaction dumping script
+
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
*-Spot check table
 
*-Push changes to Git? Yes, to main branch
 
  
  
Interactions automatically downloaded to FTP every release? Yes
+
== September 12, 2019 ==
*- People should check their own curation data
 
*- Users query frequently?
 
*- Can WormMart work instead?
 
*Ruihua - feature can be added
 
*Todd - good to have pre-defined queries generated automatically
 
*- Parse/remove predicted interactions (400,000) from others during dump? Yes
 
*Paul - Wei Wei wants to help with updates - who should she contact at WB?
 
  
 +
=== Update on SVM pipeline ===
 +
* New SVM pipeline: more analysis and more parameter tuning
 +
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
 +
* For example shown, "dumb" machine starts out with precision above 0.6
 +
* G-value (Michael's invention); does not depend on distribution of sets
 +
* Applied to various data types
 +
* Analysis: 10-fold cross validation
 +
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
Ranjana - Solar flares causing static on phone? ;)
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
 +
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
 +
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
 +
=== Citace upload ===
 +
** Tuesday, Sep 24th
  
Newsletter for new website release? Yes
+
=== Strain to ID mapping ===
*Todd - *5 development milestones
+
* Waiting on Hinxton to send strain ID mapping file?
**teleconference with Gary Ruvkun's lab next week
+
* Hopefully we can all get that well before the upload deadline
**Go live with Beta version in June (@IWM)
+
* Will do global replacement at time of citace upload (at least for now)
**Go live live with new site in September
 
**Retire old site at end of year
 
**Need outreach, documentation, screencasts
 
  
 +
=== New name server ===
 +
* When will this officially go live?
 +
* Will we now be able to request strain IDs through the server? Yes
  
BioGRID
+
=== SObA Graphs ===
*WormBase to BioGRID curation
+
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
*Issues of mapping interaction types: WB -> BioGRID vs. BioGRID -> WB
+
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
*Moving towards using WB interaction types
+
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
*Definitions of our interaction types
+
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
*Rose will propose to BioGRID community
 
*Physical interactions - BioGRID has a better format
 
**- break physical interactions away from other interaction types?
 
**- Kimberly can propose changes to model: YH separate from other interactions?
 
**- Physical vs Genetic interactions? keep separate
 
*We need an interaction ontology, of some sort
 
*Who should Wei Wei contact, how to coordinate with Hinxton; getting data from FlyBase and SGD?
 
**-Automation?
 
*Interaction pages - explain what the interaction is based on; provenance/reference
 
*Gene Orienteer shows data, but user has to link out
 
  
WormBase IWM souvenir?
 
*- Computer stress ball? with new site address?
 
*- bouncy ball?
 
*- worm rubber band thing?
 
*- Screen shield?
 
*- Complementary iPads? ;)
 
*- Antifungal socks? wristbands?
 
*
 
SPELL
 
*- new data doesn't load; problem runnning search engine; can load it on athena, not generalizable problem?
 
*- OS was not complete with an update; try again
 
*- two virtual machines working back and forth for support
 
*- how often is SPELL being used? log?; had 6-7 users within two weeks complaining of SPELL problems
 
*Todd - can send Google analytic tool to put at bottom of page
 
*Wen - SPELL testing server; official and mirror; want a separate testing server for new releases; mirror on athena (Wen's working machine); Raymond says don't want to attract users to that machine; WB's running SPELL on altair; heavily loaded and has had problems; relying on athena may be problematic
 
*- maybe we need analysis of all of our machines to see big picture; can we consolidate machines; make use of ones we have? If not we can get another machine.
 
*- what type of machine do we need?
 
*- Log of all machines with purchase date and functionality?
 
*- Linux vs Mac curators? athena development machine;
 
*- how much power do the scripts need? can check
 
*- may not be appropriate as official server if main machine goes down
 
*- WB and athena can run SPELL server, but not others. security problems?
 
*- prefer work machine is readily re-bootable; reconfigure software easily without affecting other things that people rely on
 
*- need 10GB space; 4GB memory? SPELL production server; want it stable for outside world; hardware designed for that purpose; farm it out to IMSS?
 
*- complexity of the application; Linux can run multiple applications; caprica has SVM production and WormMart; understand demand of the application; requirements; development vs running scripts, how often? Efficient use of machines. Have 2 computers, getting a third; for WormMart need power; traffic suddenly increased yesterday; monitor traffic; not much personal experience with server maintenance; cluster needs lots of expertise
 
*- building official sever of SPELL at OICR? Yes
 
  
WormMart
+
== September 19, 2019 ==
*- with WormMart and WormBase, people just want to know if it's down, just to know to lessen frustration; Official statement from dev team
+
 
*- WormMart - we give users testing url; can we just change WormBase link to that testing server?
+
=== Strains ===
*- going to change HTML to update users with most recent information; discuss with Lincoln, 5 datasets on testing server; remaining 3 datasets. if WormMart not working, put a message on main site
+
* Need to wait for new strain IDs from Hinxton before running dumping scripts
*- it doesn't look good to have a link to a tool linked from a production server; if we don't trust the data we shouldn't put it up.
+
* Don't edit multi-ontology strain fields in OA for now!
*- would like to keep the html page as a testing server; could modify page
+
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
*- if WormMart is working OK but not perfect, that's OK; Comment in the banner
+
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
*- the html page is a useful tool to get feedback from users
+
 
 +
=== Alliance literature curation ===
 +
* Working group will be formed soon
 +
* Will work out general common pipelines for literature curation
 +
 
 +
=== SObA Graph relations ===
 +
* Currently only integrating over "is a", "part of" and "regulates"
 +
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
 +
 
 +
=== Author First Pass ===
 +
* Putting together paper for AFP
 +
* Reviewing all user input for paper
 +
* Asking individual curators to check input

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input