Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
(287 intermediate revisions by 9 users not shown)
Line 21: Line 21:
 
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
  
 +
= 2021 Meetings =
  
 +
[[WormBase-Caltech_Weekly_Calls_January_2021|January]]
  
= 2020 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_February_2021|February]]
  
[[WormBase-Caltech_Weekly_Calls_January_2020|January]]
+
[[WormBase-Caltech_Weekly_Calls_March_2021|March]]
  
[[WormBase-Caltech_Weekly_Calls_February_2020|February]]
+
[[WormBase-Caltech_Weekly_Calls_April_2021|April]]
  
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
+
[[WormBase-Caltech_Weekly_Calls_May_2021|May]]
  
[[WormBase-Caltech_Weekly_Calls_April_2020|April]]
 
  
[[WormBase-Caltech_Weekly_Calls_May_2020|May]]
+
== June 3, 2021 ==
  
[[WormBase-Caltech_Weekly_Calls_June_2020|June]]
+
=== Reserving meeting rooms ===
 +
* Raymond encountering challenges with setting up regular meeting room reservations in Chen building
 +
* We've been asked to make reservations one week in advance
 +
* Need to use a room if we reserve it
  
[[WormBase-Caltech_Weekly_Calls_July_2020|July]]
+
=== Summer student(s) ===
 +
* Anatomy function project with Raymond
 +
* Many types of anatomy function data submitted via AFP
  
[[WormBase-Caltech_Weekly_Calls_August_2020|August]]
+
== June 10, 2021 ==
  
[[WormBase-Caltech_Weekly_Calls_September_2020|September]]
+
=== Variation-Gene Associations ===
 +
*Some QC on AFP-extracted data led to the realization that at least some of the 'tm' variations aren't associated with genes on tazendra
 +
*https://github.com/WormBase/author-first-pass/issues/204
 +
*https://github.com/WormBase/website/issues/8262
 +
*It looks like non-manually asserted variation-gene associations will be generated via the VEP pipeline during the build, so Caltech would need to get this information from each WB release
  
 +
===Variation in name service but not in OA===
 +
*Ranjana: I could not find gk315316 in the OA though it exists in the name server. I agree that we probably don’t want to let all the million mutations into the OA since that would slow the drop-downs, but when we need one for curation, what needs to be done?
 +
*Juancarlos: That might be right.  It seems to try to create the variation in the name service, and if it gets a 409 Conflict error, it adds it to the temp variation file, and the obo_ tables in postgres. Since it fails to create in the name service, that's probably okay with Hinxton, and since it gets added to postgres, you should be able to use it in the OA, and since it gets added to the temp variation file, on future updates of the ontology it gets added again. Probably best if someone confirms that's the process (and maybe points us to a wiki ?)
  
== October 1, 2020 ==
+
*Solution from Karen and Chris: If the Hinxton name server already has the variation but it isn't in the OA (as expected for Million Mutation Project variants like gk315316), we just need to add it through the old temp variations CGI:
  
=== Gene association file formats on FTP ===
+
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo
* For example, current production release ONTOLOGY directory: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/
 
* Our association files have format "*.wb"; is this useful or necessary?
 
* Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be
 
* We could add a README file and/or convert to the new [https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md GAF 2.2 format] which would have a more expressive file header and possibly column headers(?)
 
** File headers could possibly link to the format specification page
 
  
=== Phenotype association file idiosyncrasy ===
+
making sure to enter the variation with name-space-WBVarID like:
* As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references
 
* According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional
 
* When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8
 
* However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank.
 
** This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file
 
* Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)?
 
* With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8
 
* Proposal is to put WBPerson IDs in column 6 for personal communications. Chris & Karen will check if this will work.
 
  
=== Server space in Chen Building ===
+
gk315316 WBVar01148785
* It looks like that we will not have a specific space for server computers.
 
  
 +
and then, after refresh, it should be available to the OA. Hinxton never has to get involved in this scenario.
  
== October 8, 2020 ==
+
=== Confirm WS282 Upload Dates ===
 +
*July 6th?
 +
*Data freeze/upload date on the release schedule is July 12th
  
=== Webinar Announcement ===
+
=== CenGen bar plots ===
* Here is the live registration site: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/webinar.cgi
+
*Initially discussed to have the bar plot images going in as image data
* Caltech zoom allows 300 attendees.
+
*CenGen group wants interactive bar plots similar to the modENCODE bar plots currently displayed in the FPKM expression data section on the expression widget. That way users could hover over a bar plot and see the cell type, the expression value (TPM, in our case) and the proportion of cells of each neuron type expressing the gene.
 +
*They can provide the underlying data and have the WB team generate interactive plots for each gene
 +
*Sibyl said that this is feasible and we could: 1. bring the data files in OR 2. call the CenGen API on the fly
 +
*The first approach may be more work but better in the long run as we store the data
 +
*Will ping Hinxton and see how they can integrate the data
  
=== Descriptions from GO-CAM models ===
+
* Bring in data  both as pictures and interactive bar plots
* One suggestion for the Alliance is to create a description based on a GO-CAM model
+
* Ping Hinxon on GitHub to move this forward
* Could also micropublish some descriptions (semi-automated?)
 
* Can make curators authors of micropublications for GO-CAM models/pathways
 
 
 
=== Transcription Factors in WormBase ===
 
* WormBase has a ?Transcription_factor class that is currently underutilized
 
* Chris spoke with Gary Williams about the status as he has done much of the work on the class
 
* Because transcription factors can often be complexes, it was decided to create the ?Transcription_factor class rather than simply an extension of tags to the existing ?Gene class
 
* The class seems reasonably complete; it's important to note that some TFs are general transcription factors, not necessarily gene-specific or sequence-specific DNA-binding TFs; it will be good to make that distinction clear to users
 
* Chris has compiled a [https://docs.google.com/spreadsheets/d/1KdmvybWDWHXdlJwZgfleL4xHDoyPoYR13WAUcERF82g/edit?usp=sharing Google sheet] to assess the class before Gary W. leaves WB in the next couple of weeks
 
* The Google sheet has several tabs/worksheets, including one for the ACEDB data model (and notes about usage of tags), a summary table of associated genes, bound sequence features, existence of other protein-DNA binding data, etc.
 
* It would be good to make TF binding info (per gene and globally) more accessibly to our users, maybe via a new widget on gene pages (e.g. list incoming, regulating TFs and, for TF genes themselves, list potential target genes)
 
 
 
== October 15, 2020 ==
 
 
 
=== BioGRID data sharing ===
 
* Rose from BioGRID proposed that BioGRID curate high-throughput C. elegans interaction datasets, capturing confidence scores when available, and making those annotations available to WormBase for regular ingest
 
* Will need to consider a few points:
 
** BioGRID doesn't curate protein-DNA interactions
 
** We don't yet know the turn-around timeline for BioGRID curation of worm datasets; WB may be able to curate them much sooner
 
* Chris and Jae will work with Rose et al. to coordinate HTP curation
 
 
 
=== Enriched genes ===
 
* Some genes are considered "enriched" for an expression cluster data set even if the enrichment was in comparison to another cell or tissue (not whole animal)
 
* We should reconsider the ?Expression_cluster model to make sure we can appropriately model and communicate enrichment or subtypes thereof
 
 
 
 
 
== October 22, 2020 ==
 
 
 
=== CHEBI ===
 
* Karen spoke to CHEBI personnel on Tuesday
 
* CHEBI only has ~2 curators to create new entities
 
* CHEBI had submitted a proposal to establish pipelines to process requests from MODs
 
* Chemical Translation Service (CTS)
 
* OxO = https://www.ebi.ac.uk/spot/oxo/search
 
 
 
=== Training Webinar ===
 
* Scheduled for tomorrow at 1pm Pacific/4pm Eastern
 
 
 
 
 
== October 29, 2020 ==
 
 
 
=== Overview Webinar debriefing ===
 
* What's Good
 
* What needs improvement
 
* Participant requests:
 
  A place to look for Worm methods (a public {moderated} wiki page?)
 
 
 
 
 
=== New alleles extraction pipeline ===
 
* current pipeline (on textpresso-dev) is sending data to Sanger RT system, which is being retired
 
* the plan is to build a new pipeline to send AFP-like alerts with new entities
 
* current pipeline reads alleles data from GSA and gene lists from Sanger, but I (Valerio) would need help from curators to understand how to get these data
 

Latest revision as of 18:59, 10 June 2021

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

January

February

March

April

May


June 3, 2021

Reserving meeting rooms

  • Raymond encountering challenges with setting up regular meeting room reservations in Chen building
  • We've been asked to make reservations one week in advance
  • Need to use a room if we reserve it

Summer student(s)

  • Anatomy function project with Raymond
  • Many types of anatomy function data submitted via AFP

June 10, 2021

Variation-Gene Associations

Variation in name service but not in OA

  • Ranjana: I could not find gk315316 in the OA though it exists in the name server. I agree that we probably don’t want to let all the million mutations into the OA since that would slow the drop-downs, but when we need one for curation, what needs to be done?
  • Juancarlos: That might be right. It seems to try to create the variation in the name service, and if it gets a 409 Conflict error, it adds it to the temp variation file, and the obo_ tables in postgres. Since it fails to create in the name service, that's probably okay with Hinxton, and since it gets added to postgres, you should be able to use it in the OA, and since it gets added to the temp variation file, on future updates of the ontology it gets added again. Probably best if someone confirms that's the process (and maybe points us to a wiki ?)
  • Solution from Karen and Chris: If the Hinxton name server already has the variation but it isn't in the OA (as expected for Million Mutation Project variants like gk315316), we just need to add it through the old temp variations CGI:

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo

making sure to enter the variation with name-space-WBVarID like:

gk315316 WBVar01148785

and then, after refresh, it should be available to the OA. Hinxton never has to get involved in this scenario.

Confirm WS282 Upload Dates

  • July 6th?
  • Data freeze/upload date on the release schedule is July 12th

CenGen bar plots

  • Initially discussed to have the bar plot images going in as image data
  • CenGen group wants interactive bar plots similar to the modENCODE bar plots currently displayed in the FPKM expression data section on the expression widget. That way users could hover over a bar plot and see the cell type, the expression value (TPM, in our case) and the proportion of cells of each neuron type expressing the gene.
  • They can provide the underlying data and have the WB team generate interactive plots for each gene
  • Sibyl said that this is feasible and we could: 1. bring the data files in OR 2. call the CenGen API on the fly
  • The first approach may be more work but better in the long run as we store the data
  • Will ping Hinxton and see how they can integrate the data
  • Bring in data both as pictures and interactive bar plots
  • Ping Hinxon on GitHub to move this forward