Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
Line 15: Line 15:
 
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
  
= 2017 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_January_2017|January]]
+
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_February_2017|February]]
+
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_March_2017|March]]
 
  
[[WormBase-Caltech_Weekly_Calls_April_2017|April]]
 
  
[[WormBase-Caltech_Weekly_Calls_May_2017|May]]
 
  
[[WormBase-Caltech_Weekly_Calls_June_2017|June]]
+
= 2020 Meetings =
  
[[WormBase-Caltech_Weekly_Calls_July_2017|July]]
+
[[WormBase-Caltech_Weekly_Calls_January_2020|January]]
  
[[WormBase-Caltech_Weekly_Calls_August_2017|August]]
+
[[WormBase-Caltech_Weekly_Calls_February_2020|February]]
  
[[WormBase-Caltech_Weekly_Calls_September_2017|September]]
+
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
  
[[WormBase-Caltech_Weekly_Calls_October_2017|October]]
+
[[WormBase-Caltech_Weekly_Calls_April_2020|April]]
  
 +
[[WormBase-Caltech_Weekly_Calls_May_2020|May]]
  
== November 2, 2017 ==
+
[[WormBase-Caltech_Weekly_Calls_June_2020|June]]
  
=== Site visits ===
+
[[WormBase-Caltech_Weekly_Calls_July_2020|July]]
* Invited to give 25+5 minute talk at Worcester Area Worm Meeting on November 14th
 
* Had particular interest in micropublications
 
* 25-30 minutes probably insufficient to present everything that we want
 
* Also, fairly short notice
 
* Travel budget?
 
* Would be good to meet with individual labs and lab members to discuss WormBase
 
* Will probably decline this offer and wait until a longer talk next year
 
* Will point them to micropublication.org to find out more info
 
* Maybe Karen/Daniela could do a webinar during the Nov 14 slot
 
  
=== GO annotation for Expression cluster ===
+
[[WormBase-Caltech_Weekly_Calls_August_2020|August]]
* Wen asking about some GO annotation details for expression clusters
 
* Want to annotate a particular data set with a GO term
 
* Wen will discuss with Kimberly
 
  
=== Tazendra issue ===
+
[[WormBase-Caltech_Weekly_Calls_September_2020|September]]
* Had a problem Sunday-Monday
 
* We should consider moving curation database to a new machine/location and/or creating a backup system
 
* Move to the cloud? Would reduce maintenance time but will add cost
 
* Install on different local server?
 
* What are the requirements? Disk space? Computation?
 
  
=== Marker help desk question ===
 
* Someone looking for promoter sequences of pan-neuronal genes; neuronal marker
 
* Can go to "neuron" anatomy page and search expression patterns table for the term "marker"; not optimal
 
* Can also (reverse) sort the "Expression Pattern" column of table to pull up "Marker" annotations
 
* Create a "Markers" widget? "Tissue Marker" or "Expression Marker"? Daniela will create ticket
 
* Existing markers in Associations widget will remain there
 
* Are markers still being curated?
 
  
 +
== October 1, 2020 ==
  
== November 9, 2017 ==
+
=== Gene association file formats on FTP ===
 +
* For example, current production release ONTOLOGY directory: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/
 +
* Our association files have format "*.wb"; is this useful or necessary?
 +
* Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be
 +
* We could add a README file and/or convert to the new [https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md GAF 2.2 format] which would have a more expressive file header and possibly column headers(?)
 +
** File headers could possibly link to the format specification page
  
=== Citace upload ===
+
=== Phenotype association file idiosyncrasy ===
* Upload files to Spica for Wen by 6pm next Friday (17th)
+
* As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references
 +
* According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional
 +
* When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8
 +
* However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank.
 +
** This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file
 +
* Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)?
 +
* With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8
 +
* Proposal is to put WBPerson IDs in column 6 for personal communications. Chris & Karen will check if this will work.
  
=== New models file for WS263 ===
+
=== Server space in Chen Building ===
* Changes snuck up on some CIT curators
+
* It looks like that we will not have a specific space for server computers.
* Anatomy function model: Proposal to make Remark entry unique; Raymond asking to remove the UNIQUE in the model
 
* Still want to have multiple remarks (each on a separate, new line)
 
* May request a rollback of Anatomy function model change
 
* Are curators tracking model changes on GitHub?
 
  
=== Data migration call ===
 
* At noon PST today, if people want to join and ask questions
 
  
=== Caltech library ===
+
== October 8, 2020 ==
* Wen asked about getting articles that we can't get from Caltech library
 
* Was told each person can get 10 articles per year through inter-library loan
 
* We need about 500 articles per year that aren't in Caltech library
 
* Can get articles within an hour
 
* Maybe we can talk to library to make an agreement to get more articles
 
* If we get preprints, will it be additional cost to get official final version?
 
  
=== Micropublications ===
+
=== Webinar Announcement ===
* Currently, curation is performed manually
+
* Here is the live registration site: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/webinar.cgi
 +
* Caltech zoom allows 300 attendees.
  
=== AGR disease working group update ===
+
=== Descriptions from GO-CAM models ===
* Next focus, for AGR 1.3 release in March 2018, is to pull allele objects with basic info into AGR
+
* One suggestion for the Alliance is to create a description based on a GO-CAM model
* First we will only pull in alleles that have disease data and that only have associations to a single gene
+
* Could also micropublish some descriptions (semi-automated?)
* Basic allele information will be provided on respective AGR gene pages in an Alleles table, along with associated disease data (?)
+
* Can make curators authors of micropublications for GO-CAM models/pathways
* Alleles may also be referenced within a gene page disease table
 
* Disease pages will show Association tables with alleles in addition to genes
 
* In both contexts, alleles will link to the MOD allele page (until AGR develops an allele page)
 
* Alleles will be added to the disease association file (DAF) and respective JSON files
 
* Dedicated AGR allele pages will come at a later release, probably as a work product of the AGR Variants working group
 
* We will want to work closely with the AGR Variants working group
 
* Alleles will be stepping stone to other genotype components and then complex genotypes, as well as treatments/conditions/chemicals etc.
 
* Also discussed the possibility of a Disease ribbon, showing diseases in AGR orthologs
 
* If we show disease associations inferred by orthology, we need to be explicit about evidence codes and data provenance; maybe different tables for experimental versus inferred associations?
 
  
=== AGR orthology ===
+
=== Transcription Factors in WormBase ===
* AGR orthology based exclusively on sequence similarity
+
* WormBase has a ?Transcription_factor class that is currently underutilized
* Some "orthologs" are actually just homologs
+
* Chris spoke with Gary Williams about the status as he has done much of the work on the class
* Jae sent email to info@alliancegenome.org last week (Wed Nov 1) but hasn't heard back from anyone
+
* Because transcription factors can often be complexes, it was decided to create the ?Transcription_factor class rather than simply an extension of tags to the existing ?Gene class
* How do we explicitly define "ortholog"? Sequence similarity? Synteny? Functional complementation?
+
* The class seems reasonably complete; it's important to note that some TFs are general transcription factors, not necessarily gene-specific or sequence-specific DNA-binding TFs; it will be good to make that distinction clear to users
* How do we accommodate both manual and automated assertions of orthology?
+
* Chris has compiled a [https://docs.google.com/spreadsheets/d/1KdmvybWDWHXdlJwZgfleL4xHDoyPoYR13WAUcERF82g/edit?usp=sharing Google sheet] to assess the class before Gary W. leaves WB in the next couple of weeks
 +
* The Google sheet has several tabs/worksheets, including one for the ACEDB data model (and notes about usage of tags), a summary table of associated genes, bound sequence features, existence of other protein-DNA binding data, etc.
 +
* It would be good to make TF binding info (per gene and globally) more accessibly to our users, maybe via a new widget on gene pages (e.g. list incoming, regulating TFs and, for TF genes themselves, list potential target genes)
  
=== Help Desk question: finding all alcohol dehydrogenase genes ===
+
== October 15, 2020 ==
* There is no single root "alcohol dehydrogenase" term in the GO MF branch, but many more specific terms that exist in different branches
 
* This means that one has to search for terms that explicitly have "alcohol dehydrogenase" in the name of the term and cannot take advantage of the Ontology Browser to see all gene associations, direct and inferred
 
* This approach also will miss terms like "methanol dehydrogenase" or other logical descendant terms
 
* Also, can search for protein domains/motifs that explicitly have "alcohol dehydrogenase" in the name of the motif, but domains are not (as far as I'm aware, in InterPro or PFAM at least) organized into an ontology
 
* Note: GO tries to classify MFs according to the [http://www.enzyme-database.org/class.php?c=1&sc=1&ssc=*&sh=1 Enzyme Classification] system.  In this classification, the alcohol dehydrogenase (NAD) activity is a sibling of methanol dehydrogenase activity.
 
*There is a comment associated with the alcohol dehydrogenase entry, E.C. 1.1.1.1, that says:
 
    Comments: A zinc protein. Acts on primary or secondary alcohols or hemi-acetals with very broad specificity; however the enzyme oxidizes methanol much more
 
    poorly than ethanol. The animal, but not the yeast, enzyme acts also on cyclic secondary alcohols.
 
*This may explain why methanol dehydrogenase is a sibling and not a child of alcohol dehydrogenase in this classification.
 
  
=== Alliance/AGR interactions working group ===
+
=== BioGRID data sharing ===
* Now formed, first meeting tomorrow (Friday Nov 10th) at 1pm PST/4pm EST
+
* Rose from BioGRID proposed that BioGRID curate high-throughput C. elegans interaction datasets, capturing confidence scores when available, and making those annotations available to WormBase for regular ingest
* Can find folder and related documents in the Alliance "Working Groups" Google folder
+
* Will need to consider a few points:
 +
** BioGRID doesn't curate protein-DNA interactions
 +
** We don't yet know the turn-around timeline for BioGRID curation of worm datasets; WB may be able to curate them much sooner
 +
* Chris and Jae will work with Rose et al. to coordinate HTP curation
  
=== AGR gene expression data working group ===
+
=== Enriched genes ===
* Starting up now; awaiting first meeting
+
* Some genes are considered "enriched" for an expression cluster data set even if the enrichment was in comparison to another cell or tissue (not whole animal)
* Talk to Wen and/or Daniela if you want to join
+
* We should reconsider the ?Expression_cluster model to make sure we can appropriately model and communicate enrichment or subtypes thereof
  
=== Site visits ===
 
* Daniela and Karen might be able to join Bay Area worm meeting
 
  
 +
== October 22, 2020 ==
  
== November 16, 2017 ==
+
=== CHEBI ===
 +
* Karen spoke to CHEBI personnel on Tuesday
 +
* CHEBI only has ~2 curators to create new entities
 +
* CHEBI had submitted a proposal to establish pipelines to process requests from MODs
 +
* Chemical Translation Service (CTS)
 +
* OxO = https://www.ebi.ac.uk/spot/oxo/search
  
=== Inter-library loan ===
+
=== Training Webinar ===
* Getting papers (otherwise unavailable) through inter-library loan
+
* Scheduled for tomorrow at 1pm Pacific/4pm Eastern
* Have to fill out request forms for (each?) paper that we request
 
* No charge unless it is a rush request
 
  
=== Social media ===
 
* Wen attended social media training at Caltech
 
* How to use social media to promote their work
 
* Sean Carroll has > 1 million Twitter followers
 
* Facebook Caltech posts; target students and faculty, as well as donors
 
* Social media strategies: videos useful, schedule posts
 
* Interviews
 
* Videos: viewers lose interest after ~1 minute
 
* Would be good to make blog post announcements on FaceBook
 
* Use Twitter, FaceBook, WB blog to announce next WB tutorial
 
* Can post several times per month, rather than once per ~3 months
 
* Wen can work with Ranjana and Todd, to make blog posts available on FB and Twitter
 
* Helpful to have fun posts; photos, videos, interviews
 
* Wen connected with Andy Golden about Baltimore worm meeting on FB
 
* WB needs to follow other members of the community, PIs, etc.
 
* Can make posts about interesting papers
 
* Can post top community curator for the month
 
* Would be good to have a social media point person, Wen and Ranjana? Involve Todd
 
* One post per week?
 
* Good to have specific posts about papers and researchers
 
* Reward labs with most community curation, highlight their research
 
  
=== Juancarlos' vacation ===
+
== October 29, 2020 ==
* May want contingency plan while he is away (Dec 19 - Feb 15)
 
* Next upload Jan 19
 
* Will need to consider required changes to models/dumpers before he leaves
 
  
 +
=== Overview Webinar debriefing ===
 +
* What's Good
 +
* What needs improvement
 +
* Participant requests:
 +
  A place to look for Worm methods (a public {moderated} wiki page?)
  
== November 30, 2017 ==
 
  
=== ISB Meeting, April 2018 ===
+
=== New alleles extraction pipeline ===
*https://www.biocuration.org/biocuration-2018-call-for-abstracts/
+
* current pipeline (on textpresso-dev) is sending data to Sanger RT system, which is being retired
*This year abstracts are invited for the following topic areas:
+
* the plan is to build a new pipeline to send AFP-like alerts with new entities
**Precision Medicine
+
* current pipeline reads alleles data from GSA and gene lists from Sanger, but I (Valerio) would need help from curators to understand how to get these data
**Phenotypes, genotypes, and variants
 
**Data Standards and Ontologies
 
**Text Mining
 
**Functional Annotation
 
**Community Annotation
 
**Data Integration and Visualization
 
**Deep Learning in curation process
 
**Softwares, Applications and Systems in biocuration
 
**Curation Standards and Best Practice; inference from evidence; data and annotation quality
 
* What does WormBase need to take away from the meeting? What does WB have to present? Important projects
 
* Micropublication group/member will go
 
* Could send people on behalf of AGR or AGR projects/working group
 
 
 
=== Expression Cluster -> WOBr ===
 
 
 
It looks like we have a data annotation issue where the data is about {AFD and/or AWB} but WormBase annotation usage is meant as {AFD} and {AWB}.
 
 
 
The data source publication is <http://www.wormbase.org/tools/tree/run?name=WBPaper00024671;class=Paper;expand=Refers_to#Refers_to>, and the dataset is  Expression cluster » WBPaper00024671:AFD_AWB_vs_unsorted_upregulated <http://www.wormbase.org/species/c_elegans/expression_cluster/WBPaper00024671:AFD_AWB_vs_unsorted_upregulated#013--10>.
 
 
 
Given that {AFD and/or AWB} is not a natural anatomy group (that is, there is no specific functional meaning nor is it a frequently used grouping), I don't think the solution is to invent an anatomy ontology term just for this dataset.
 
 
 
Instead, I propose that we remove the {AFD and/or AWB} datasets from associating with AFD and AWB respectively, so that the associations don't get interpreted incorrectly. This 'rule' should be applied to all ambiguous datasets.
 

Latest revision as of 18:06, 29 October 2020

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings



2020 Meetings

January

February

March

April

May

June

July

August

September


October 1, 2020

Gene association file formats on FTP

  • For example, current production release ONTOLOGY directory: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/
  • Our association files have format "*.wb"; is this useful or necessary?
  • Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be
  • We could add a README file and/or convert to the new GAF 2.2 format which would have a more expressive file header and possibly column headers(?)
    • File headers could possibly link to the format specification page

Phenotype association file idiosyncrasy

  • As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references
  • According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional
  • When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8
  • However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank.
    • This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file
  • Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)?
  • With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8
  • Proposal is to put WBPerson IDs in column 6 for personal communications. Chris & Karen will check if this will work.

Server space in Chen Building

  • It looks like that we will not have a specific space for server computers.


October 8, 2020

Webinar Announcement

Descriptions from GO-CAM models

  • One suggestion for the Alliance is to create a description based on a GO-CAM model
  • Could also micropublish some descriptions (semi-automated?)
  • Can make curators authors of micropublications for GO-CAM models/pathways

Transcription Factors in WormBase

  • WormBase has a ?Transcription_factor class that is currently underutilized
  • Chris spoke with Gary Williams about the status as he has done much of the work on the class
  • Because transcription factors can often be complexes, it was decided to create the ?Transcription_factor class rather than simply an extension of tags to the existing ?Gene class
  • The class seems reasonably complete; it's important to note that some TFs are general transcription factors, not necessarily gene-specific or sequence-specific DNA-binding TFs; it will be good to make that distinction clear to users
  • Chris has compiled a Google sheet to assess the class before Gary W. leaves WB in the next couple of weeks
  • The Google sheet has several tabs/worksheets, including one for the ACEDB data model (and notes about usage of tags), a summary table of associated genes, bound sequence features, existence of other protein-DNA binding data, etc.
  • It would be good to make TF binding info (per gene and globally) more accessibly to our users, maybe via a new widget on gene pages (e.g. list incoming, regulating TFs and, for TF genes themselves, list potential target genes)

October 15, 2020

BioGRID data sharing

  • Rose from BioGRID proposed that BioGRID curate high-throughput C. elegans interaction datasets, capturing confidence scores when available, and making those annotations available to WormBase for regular ingest
  • Will need to consider a few points:
    • BioGRID doesn't curate protein-DNA interactions
    • We don't yet know the turn-around timeline for BioGRID curation of worm datasets; WB may be able to curate them much sooner
  • Chris and Jae will work with Rose et al. to coordinate HTP curation

Enriched genes

  • Some genes are considered "enriched" for an expression cluster data set even if the enrichment was in comparison to another cell or tissue (not whole animal)
  • We should reconsider the ?Expression_cluster model to make sure we can appropriately model and communicate enrichment or subtypes thereof


October 22, 2020

CHEBI

  • Karen spoke to CHEBI personnel on Tuesday
  • CHEBI only has ~2 curators to create new entities
  • CHEBI had submitted a proposal to establish pipelines to process requests from MODs
  • Chemical Translation Service (CTS)
  • OxO = https://www.ebi.ac.uk/spot/oxo/search

Training Webinar

  • Scheduled for tomorrow at 1pm Pacific/4pm Eastern


October 29, 2020

Overview Webinar debriefing

  • What's Good
  • What needs improvement
  • Participant requests:
 A place to look for Worm methods (a public {moderated} wiki page?)


New alleles extraction pipeline

  • current pipeline (on textpresso-dev) is sending data to Sanger RT system, which is being retired
  • the plan is to build a new pipeline to send AFP-like alerts with new entities
  • current pipeline reads alleles data from GSA and gene lists from Sanger, but I (Valerio) would need help from curators to understand how to get these data