Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
(226 intermediate revisions by 9 users not shown)
Line 22: Line 22:
  
  
GoToMeeting link: https://www.gotomeet.me/wormbase1
+
 
  
 
= 2020 Meetings =
 
= 2020 Meetings =
Line 32: Line 32:
 
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
 
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
  
 +
[[WormBase-Caltech_Weekly_Calls_April_2020|April]]
  
== April 2, 2020 ==
+
[[WormBase-Caltech_Weekly_Calls_May_2020|May]]
 
 
=== Community phenotype requests ===
 
* March 9-28
 
* 2,548 emails went out; 89 bounced; 6 resent; 13 backup; 2,478 successful emails
 
* 361 annotations overall
 
* 48 papers requested received curation (2% response rate)
 
* 53 distinct papers overall (5 papers without request)
 
* 53 distinct persons overall
 
 
 
=== Community curation volunteers ===
 
* Tracking volunteers [https://docs.google.com/spreadsheets/d/1ldECC44PXMilcDO6ctz-8AkRZntfDoV0Wtc4F-T_Zvg/edit?usp=sharing here]
 
* 14 volunteers so far, all have been assigned a WBPerson ID
 
* Chris will set up a webinar tutorial in the coming week or two
 
 
 
=== AFP pipeline ===
 
* Will resend email requests to authors that haven't already responded
 
* May also send out for older papers
 
* May work with people to help
 
* Does the old AFP form still work? It should
 
* If someone has a link to the old form, they won't get one for the new form
 
* Maybe could set up an automatic redirect from the old form to the new form
 
* Received many submissions recently (>20% response rate)
 
 
 
=== Ontology Annotator ===
 
* Need to work on Genotype OA dumper
 
* Turns out semicolons are problematic (currently in genotypes and transgenes) for object names (ontology fields)
 
* Ampersands (&) are also problematic for object names in the OA
 
** 20237  | Is[Pgcy-5::daf-2a::venus; Punc-122::mCherry]                          | 2014-10-08 10:32:45.874519-07
 
** 20239  | Ex[Pgcy-5::casy-1::venus; Pgcy-5::aman-2::mCherry; Punc-122::mCherry] | 2014-10-08 10:45:23.202362-07
 
** 20238  | Is[Pgcy-5::daf-2c::venus; Punc-122::mCherry]                         | 2014-10-08 10:38:19.859078-07
 
** 25249  | Ex[Prheb-1::rheb-1::GFP; unc-119(+]                                   | 2018-06-29 10:16:40.784295-07
 
** 16283  | [hlh-13::GFP;unc-119(+)]                                              | 2013-02-07 17:43:22.384819-08
 
** 26131  | Ex[pedc-3EDC-3::DsRed;pRF4]                                          | 2019-08-14 08:44:49.91063-07
 
  
=== Use Slack More ===
+
[[WormBase-Caltech_Weekly_Calls_June_2020|June]]
* Slack is a good tool for quick communication among team members; would be good for all curators to join Slack to enable efficient communication
 
  
 +
[[WormBase-Caltech_Weekly_Calls_July_2020|July]]
  
== April 9, 2020 ==
+
[[WormBase-Caltech_Weekly_Calls_August_2020|August]]
  
=== Volunteer curators ===
+
[[WormBase-Caltech_Weekly_Calls_September_2020|September]]
* Have sent out emails to schedule tutorials
 
* Chris had one tutorial with Michael Davies (Alyson Ashe's lab) yesterday
 
* One already scheduled for next Monday with Wilber and Stephanie from Paul's lab
 
* Two others already scheduled for next Tuesday with Lina Dahlberg and Colin Dolphin
 
  
===TAGC is virtual (4.22-25.2020)===
 
FYI in case you missed it
 
*You still have to register (it's free), if you hadn't before
 
https://genetics-gsa.org/tagc-2020/registration/
 
  
===summer students===
+
== October 1, 2020 ==
* Caltech SURF students (and other summer students worldwide) now are looking for projects
 
* Maybe they could curate for WormBase
 
* In addition to phenotype, they could curate:
 
** Allele/lesion sequence curation (using Allele Sequence form); maybe Paul Davis could make a tutorial video?
 
** Anatomy function, looking for novel info; opportunity to program/code
 
  
=== OA semicolon issue ===
+
=== Gene association file formats on FTP ===
* Juancarlos has fixed the issues on sandbox
+
* For example, current production release ONTOLOGY directory: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/
* Curators should test on Mangolassi
+
* Our association files have format "*.wb"; is this useful or necessary?
 +
* Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be
 +
* We could add a README file and/or convert to the new [https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md GAF 2.2 format] which would have a more expressive file header and possibly column headers(?)
 +
** File headers could possibly link to the format specification page
  
=== Textmining/automation ===
+
=== Phenotype association file idiosyncrasy ===
* Daniela will discuss with Christina Zorn from Xenbase
+
* As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references
* Will discuss SVM, AFP, Textpresso, etc.
+
* According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional
 +
* When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8
 +
* However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank.
 +
** This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file
 +
* Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)?
 +
* With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8
 +
* Proposal is to put WBPerson IDs in column 6 for personal communications. Chris & Karen will check if this will work.
  
=== Retracted WBPapers ===
+
=== Server space in Chen Building ===
* Jae & Kimberly put in GitHub ticket to make retractions clear on WormBase site
+
* It looks like that we will not have a specific space for server computers.
* https://github.com/WormBase/website/issues/7637
 
* Can we systematically detect retractions? Yes
 
* What about finding papers that cite retractions? Maybe, but likely tricky
 
  
  
== April 16, 2020 ==
+
== October 8, 2020 ==
  
=== Community Phenotype Curation Tutorials ===
+
=== Webinar Announcement ===
* Chris has run 6 tutorials, recorded 4
+
* Here is the live registration site: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/webinar.cgi
* MPG files saved on DropBox; ask Chris for access
+
* Caltech zoom allows 300 attendees.
* Plan to edit videos to make tutorial video to post on WB YouTube channel
 
  
=== Author First Pass ===
+
=== Descriptions from GO-CAM models ===
* May run a webinar and use Zoom to record
+
* One suggestion for the Alliance is to create a description based on a GO-CAM model
* May make a short tutorial video
+
* Could also micropublish some descriptions (semi-automated?)
* Jae: Is there documentation for terminology used in the form?
+
* Can make curators authors of micropublications for GO-CAM models/pathways
  
=== Zoom accounts ===
+
=== Transcription Factors in WormBase ===
* People can try to use Caltech Zoom account
+
* WormBase has a ?Transcription_factor class that is currently underutilized
 +
* Chris spoke with Gary Williams about the status as he has done much of the work on the class
 +
* Because transcription factors can often be complexes, it was decided to create the ?Transcription_factor class rather than simply an extension of tags to the existing ?Gene class
 +
* The class seems reasonably complete; it's important to note that some TFs are general transcription factors, not necessarily gene-specific or sequence-specific DNA-binding TFs; it will be good to make that distinction clear to users
 +
* Chris has compiled a [https://docs.google.com/spreadsheets/d/1KdmvybWDWHXdlJwZgfleL4xHDoyPoYR13WAUcERF82g/edit?usp=sharing Google sheet] to assess the class before Gary W. leaves WB in the next couple of weeks
 +
* The Google sheet has several tabs/worksheets, including one for the ACEDB data model (and notes about usage of tags), a summary table of associated genes, bound sequence features, existence of other protein-DNA binding data, etc.
 +
* It would be good to make TF binding info (per gene and globally) more accessibly to our users, maybe via a new widget on gene pages (e.g. list incoming, regulating TFs and, for TF genes themselves, list potential target genes)
  
 +
== October 15, 2020 ==
  
== April 23, 2020 ==
+
=== BioGRID data sharing ===
 +
* Rose from BioGRID proposed that BioGRID curate high-throughput C. elegans interaction datasets, capturing confidence scores when available, and making those annotations available to WormBase for regular ingest
 +
* Will need to consider a few points:
 +
** BioGRID doesn't curate protein-DNA interactions
 +
** We don't yet know the turn-around timeline for BioGRID curation of worm datasets; WB may be able to curate them much sooner
 +
* Chris and Jae will work with Rose et al. to coordinate HTP curation
  
=== Community Phenotype Curation Tutorials ===
+
=== Enriched genes ===
* Chris has finished first round of tutorials; 8 tutorials, 6 video recordings
+
* Some genes are considered "enriched" for an expression cluster data set even if the enrichment was in comparison to another cell or tissue (not whole animal)
* There are ~8 new volunteers; will setup tutorials for them soon
+
* We should reconsider the ?Expression_cluster model to make sure we can appropriately model and communicate enrichment or subtypes thereof
  
=== ECO code implementation ===
 
* ?ECO_term to replace ?GO_code in ACEDB models
 
* GAF files with three-letter codes can still be generated by mapping
 
  
=== Simplemine for Alliance ===
+
== October 22, 2020 ==
* Wen has presented proposal to Search group
 
* Plan is to have a link to the Alliance Simplemine prototype from the Alliance web page
 
  
=== Venn diagram tool ===
+
=== CHEBI ===
* Conceived by Jae, implemented by Sibyl
+
* Karen spoke to CHEBI personnel on Tuesday
* Currently used for interactions data
+
* CHEBI only has ~2 curators to create new entities
* Could use for other data types like phenotype (e.g. comparing RNAi vs. allele phenotype)
+
* CHEBI had submitted a proposal to establish pipelines to process requests from MODs
* Could also use for Expression data, e.g. comparing results from different methods
+
* Chemical Translation Service (CTS)
* Could maybe use for disease data
+
* OxO = https://www.ebi.ac.uk/spot/oxo/search
  
=== AFP tutorial ===
+
=== Training Webinar ===
* Daniela, Kimberly, Valerio will run through the AFP form with Nikita from Gupta lab tomorrow
+
* Scheduled for tomorrow at 1pm Pacific/4pm Eastern
* May record in the future to make a tutorial video
 
* Daniela may (re-)start curating markers for relevant expression patterns
 
* Wen noticed that many tissue markers are artificial (not necessarily endogenous sequence)
 
  
=== Expression markers ===
 
* SURF student projects: Identifying good expression markers? Maybe, but may require more curation experience
 
* Wen looked at expression cluster data; hard to find good, very specific (i.e. neuron) markers
 
* Daniela may (re-)start curating markers for relevant expression patterns
 
* Wen noticed that many tissue markers are artificial (not necessarily endogenous sequence)
 
* Already have an "Expression markers" widget on anatomy term pages
 
* Could combinations of genes (e.g. cGal) act as markers?
 
  
== April 30, 2020 ==
+
== October 29, 2020 ==
  
=== Adding ?ECO_term class for WS278 ===
+
=== Overview Webinar debriefing ===
* Proposed[https://wiki.wormbase.org/index.php/Evidence_Code_Ontology#.3FECO_term_Model ?ECO_term model]
+
* What's Good
** How are the Parent/Child and Ancestor/Descendant tags used in WB for ontology classes?  Do we still need them in .ace files?
+
* What needs improvement
*Confirm proposed changes to class models that will use this tag:
+
* Participant requests:
** ?GO_annotation
+
  A place to look for Worm methods (a public {moderated} wiki page?)
** ?Phenotype
 
** ?Disease_model_annotation
 
  
=== Ontology term models in WB ===
 
* Discuss using ?RO_term values in our WB ontology term models
 
* Currently relations between ontology terms are captured with text that is sometimes inconsistent for the same concept, e.g. is_a
 
* Where possible, should be use ?RO_term to express the relations between ontology terms in our WB models?
 
* Impact on web display?
 
  
===Entries in the new Genotype OA===
+
=== New alleles extraction pipeline ===
*21 genotype entries created in the Genotype OA required for disease curation
+
* current pipeline (on textpresso-dev) is sending data to Sanger RT system, which is being retired
*Few more to come, and at some point need to work on the dumper, in order to submit for WS278
+
* the plan is to build a new pipeline to send AFP-like alerts with new entities
*The use of the Genotype class across disease related classes waiting on Paul D. for approval, will need dumper changes as well; hopefully we have enough time to get all this done for WS278
+
* current pipeline reads alleles data from GSA and gene lists from Sanger, but I (Valerio) would need help from curators to understand how to get these data

Latest revision as of 18:06, 29 October 2020

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings



2020 Meetings

January

February

March

April

May

June

July

August

September


October 1, 2020

Gene association file formats on FTP

  • For example, current production release ONTOLOGY directory: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/
  • Our association files have format "*.wb"; is this useful or necessary?
  • Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be
  • We could add a README file and/or convert to the new GAF 2.2 format which would have a more expressive file header and possibly column headers(?)
    • File headers could possibly link to the format specification page

Phenotype association file idiosyncrasy

  • As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references
  • According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional
  • When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8
  • However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank.
    • This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file
  • Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)?
  • With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8
  • Proposal is to put WBPerson IDs in column 6 for personal communications. Chris & Karen will check if this will work.

Server space in Chen Building

  • It looks like that we will not have a specific space for server computers.


October 8, 2020

Webinar Announcement

Descriptions from GO-CAM models

  • One suggestion for the Alliance is to create a description based on a GO-CAM model
  • Could also micropublish some descriptions (semi-automated?)
  • Can make curators authors of micropublications for GO-CAM models/pathways

Transcription Factors in WormBase

  • WormBase has a ?Transcription_factor class that is currently underutilized
  • Chris spoke with Gary Williams about the status as he has done much of the work on the class
  • Because transcription factors can often be complexes, it was decided to create the ?Transcription_factor class rather than simply an extension of tags to the existing ?Gene class
  • The class seems reasonably complete; it's important to note that some TFs are general transcription factors, not necessarily gene-specific or sequence-specific DNA-binding TFs; it will be good to make that distinction clear to users
  • Chris has compiled a Google sheet to assess the class before Gary W. leaves WB in the next couple of weeks
  • The Google sheet has several tabs/worksheets, including one for the ACEDB data model (and notes about usage of tags), a summary table of associated genes, bound sequence features, existence of other protein-DNA binding data, etc.
  • It would be good to make TF binding info (per gene and globally) more accessibly to our users, maybe via a new widget on gene pages (e.g. list incoming, regulating TFs and, for TF genes themselves, list potential target genes)

October 15, 2020

BioGRID data sharing

  • Rose from BioGRID proposed that BioGRID curate high-throughput C. elegans interaction datasets, capturing confidence scores when available, and making those annotations available to WormBase for regular ingest
  • Will need to consider a few points:
    • BioGRID doesn't curate protein-DNA interactions
    • We don't yet know the turn-around timeline for BioGRID curation of worm datasets; WB may be able to curate them much sooner
  • Chris and Jae will work with Rose et al. to coordinate HTP curation

Enriched genes

  • Some genes are considered "enriched" for an expression cluster data set even if the enrichment was in comparison to another cell or tissue (not whole animal)
  • We should reconsider the ?Expression_cluster model to make sure we can appropriately model and communicate enrichment or subtypes thereof


October 22, 2020

CHEBI

  • Karen spoke to CHEBI personnel on Tuesday
  • CHEBI only has ~2 curators to create new entities
  • CHEBI had submitted a proposal to establish pipelines to process requests from MODs
  • Chemical Translation Service (CTS)
  • OxO = https://www.ebi.ac.uk/spot/oxo/search

Training Webinar

  • Scheduled for tomorrow at 1pm Pacific/4pm Eastern


October 29, 2020

Overview Webinar debriefing

  • What's Good
  • What needs improvement
  • Participant requests:
 A place to look for Worm methods (a public {moderated} wiki page?)


New alleles extraction pipeline

  • current pipeline (on textpresso-dev) is sending data to Sanger RT system, which is being retired
  • the plan is to build a new pipeline to send AFP-like alerts with new entities
  • current pipeline reads alleles data from GSA and gene lists from Sanger, but I (Valerio) would need help from curators to understand how to get these data