Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
(157 intermediate revisions by 8 users not shown)
Line 22: Line 22:
  
 
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
 
  
 
= 2021 Meetings =
 
= 2021 Meetings =
Line 28: Line 27:
 
[[WormBase-Caltech_Weekly_Calls_January_2021|January]]
 
[[WormBase-Caltech_Weekly_Calls_January_2021|January]]
  
 +
[[WormBase-Caltech_Weekly_Calls_February_2021|February]]
  
== Feb 4th, 2021 ==
+
[[WormBase-Caltech_Weekly_Calls_March_2021|March]]
===How the "duplicate" function works in OAs with respect to object IDs (Ranjana and Juancarlos)===
 
*A word of caution: when you duplicate a row, for those OAs with Object IDs (eg., WBGenotype00000014) note that the object ID gets duplicated as well and does not advance to the next ID like the PGID does
 
*If you do use the "duplicate" function, remember to manually change the Object ID
 
* We can implement checks to make sure distinct annotations/objects don't share the same ID
 
 
 
=== GAF Wiki and headers ===
 
* Any more comments about the Wiki page and the proposal? https://wiki.wormbase.org/index.php/WormBase_gene_association_file
 
 
 
=== Missing references in expression GAFs ===
 
* ~300 missing from anatomy association file and ~45 missing from development association file
 
* Daniela looking into missing refs; many are personal communications or very old papers
 
* Will change ?Expr_pattern model to possibly remove ?Author reference and add in a ?Person reference instead
 
** 399 objects in WS279 reference an author; Daniela will take a look
 
* Would be good to have some reference for those objects in the GAF file on the FTP site; could use WBPerson when ready
 
 
 
 
 
== Feb 11th, 2021 ==
 
=== Alliance Literature Paper Tags ===
 
*What do we definitely want to transfer to the Alliance?
 
*Alliance literature group [https://docs.google.com/spreadsheets/d/1d3Y73x1BFiARkbxrvPPX2tCh5rFeBQRoBMHOcaXmijA/edit#gid=1866989939 spreadsheet]
 
*Current flags vs legacy flags
 
*Can we map everything to the proposed hierarchy or do we need to add some more classes?
 
*Kimberly will review existing tags/flags to sort out what we know we need and what is questionable
 
 
 
=== Personal communications in Expr_pattern ===
 
* 27 objects missing reference (personal communications)
 
** Even if we capture the WBPerson in the Person tag, how are we submitting these to Alliance? The evidence required by the expression JSON spec https://github.com/alliance-genome/agr_schemas/blob/master/ingest/expression/wildtypeExpressionModelAnnotation.json (and other specs) must be a publication, as defined by the publicationRef.json https://github.com/alliance-genome/agr_schemas/blob/master/ingest/publicationRef.json. If there's no PMID for a publication listed as evidence, a MOD ID will suffice for the "publicationId" but we have no WBPaperID created for such  objects.
 
** One way to solve this: Daniela can go over the list and see if the initial personal communication resulted in  a publication later on. One example is Expr181 (expression of cpl-1 in hypodermis and pharynx), communicated  via email by Sarwar Hashmi in 2000, Expr450 (expression of cpl-1 in hypodermis, intestine) communicated by  Britton in 2001. The pattern was published in 2002 by Hashmi and Britton in 2002 (WBPaper00005099). Daniela can then associate WBPaper00005099 to Expr181 and Expr450.
 
** The solution above still  does not work for all: An example is lad-2 personal communication from Oliver Hobert, 2002. Later published by Lishia Chen (2008). Removing Oliver’s personal communication will remove evidence of data provenance from the Hobert’s lab. Unless Oliver published this in a paper that was eluded from our flagging system (e.g. flagged SVM negative).
 
** Daniela can go over the entire list and contact the authors for such cases.
 
** Are personal communications used in other classes?
 
** Action item: Daniela will add Persons in the person tag for such communications. Will request a model change to Hinxton. Will ask Magda to populate column 6 of the GAF file with Author data. Will add a request for the DQMs to allow Persons in the 'Evidence' in the JSON in addition to Papers
 
 
 
=== Author data in Expr_pattern ===
 
* 399 Expression objects have the author tag populated. Most of them were submitted even prior Wen started working on Expr_pattern.
 
** out of 399 objects, we have 32 for which the authors partially match. One example is Expr60, which has Bauer as extra author in the .ace file. Bauer is not listed as author in the paper.
 
** should we keep the author info and store it in the Person tag? Even if we do, how are we submitting these to Alliance? And should we at all? This is legacy data
 
** Decision: we can remove the authors and add in the remarks the historic info
 
 
 
=== Date tag in Expr_pattern ===
 
* The date tag seems to be  populated for objects that have authors (above) to probably capture when the submission occurred.
 
* In addition, Date is populated for a large scale submission from Ian hope (2006-03), later published.
 
* We can still keep this info as is for WB (currently stored in citace minus) but what are we going to do for the Alliance submission? The tag was used last time in 2006 for the Hope study but prior to this was used in  the ‘90s (1990, 1998).
 
** We  can get rid of date, too. And pull the fo for the ones for which authors do not match
 
 
 
=== Proposed WormBase metrics page ===
 
* Inspired by MGI's stats page:
 
** http://www.informatics.jax.org/mgihome/homepages/stats/all_stats.shtml
 
* Sibyl and Paulo working on. Prototype here: https://master.d25n59ij2csrbn.amplifyapp.com/
 
** Current prototype is C. elegans specific
 
* Chris is collecting ideas and queries here:
 
** https://docs.google.com/spreadsheets/d/1OeZuMRSHelVD7tGRIxEkCKOyDaBPN29wrGbNU3TGxKU/edit?usp=sharing
 
* Could eventually be used across the Alliance
 
 
 
 
 
== Feb 18th, 2021 ==
 
  
=== CenGen data ===
+
[[WormBase-Caltech_Weekly_Calls_April_2021|April]]
* How can we incorporate the CenGen data into WormBase pages? i.e. provide users info:
 
** Per gene: what cells express this gene?
 
** Per cell: what genes are expressed in this cell?
 
** May be derived from Eduardo's data processing
 
* CenGen has a weekly call: have invited Wen, Daniela, and Raymond
 
** Too much for all three to join?
 
* Good to establish healthy boundaries for responsibilities
 
* Do they want to collaborate or no?
 
* We can link to the main CenGen page; once gene-level data is available we can consume and make available
 
* Eduardo's tool is one WormBase tool for processing and providing CenGen data; will make them aware
 
* Ultimately this data (and its presentation) will need to get into the Alliance; may remain a WB-specific/portal feature for the near future
 
* Alaska? Probably won't be maintained
 
  
=== Cleaning up bounced emails to outreach@wormbase.org ===
+
[[WormBase-Caltech_Weekly_Calls_May_2021|May]]
* Many unread messages (~140) in inbox
 
* Many of those are bounced emails from AFP pipeline and webinar announcements
 
* If relevant people could review those bounced emails and, as appropriate, add people or email addresses to the Omit list using the Omit Form CGI (http://tazendra.caltech.edu/~postgres/cgi-bin/omit_form.cgi) that would be appreciated.
 
  
  
== Feb 25th, 2021 ==
+
== June 3, 2021 ==
  
===Expr_pattern clean up===
+
=== Reserving meeting rooms ===
* Seldom populated  tags. Can we move the associated info into remarks and get rid of the tag? This is in view of the Alliance import
+
* Raymond encountering challenges with setting up regular meeting room reservations in Chen building
** Protein_description 33 objects
+
* We've been asked to make reservations one week in advance
<pre>Example: Expr_pattern : "Expr450"
+
* Need to use a room if we reserve it
Gene "WBGene00000776"
 
Protein_description "CPL-1"
 
  
Expr_pattern : "Expr552"
+
=== Summer student(s) ===
Gene "WBGene00006528"
+
* Anatomy function project with Raymond
Protein_description "Tubulin alpha"</pre>
+
* Many types of anatomy function data submitted via AFP
  
** Sequence 12 objects
+
== June 10, 2021 ==
<pre>Example: Expr_pattern : "Expr12"
 
      Gene "WBGene00003976"
 
      Sequence "Z28377|Z28375|Z28376"</pre>
 
  
** Laboratory 23 objects -> can infer via publication
+
=== Variation-Gene Associations ===
<pre>Example: Expr_pattern : "Expr87"
+
*Some QC on AFP-extracted data led to the realization that at least some of the 'tm' variations aren't associated with genes on tazendra
        …
+
*https://github.com/WormBase/author-first-pass/issues/204
Laboratory "ML"
+
*https://github.com/WormBase/website/issues/8262
Gene "WBGene00003012"</pre>
+
*It looks like non-manually asserted variation-gene associations will be generated via the VEP pipeline during the build, so Caltech would need to get this information from each WB release
  
 +
===Variation in name service but not in OA===
 +
*Ranjana: I could not find gk315316 in the OA though it exists in the name server. I agree that we probably don’t want to let all the million mutations into the OA since that would slow the drop-downs, but when we need one for curation, what needs to be done?
 +
*Juancarlos: That might be right.  It seems to try to create the variation in the name service, and if it gets a 409 Conflict error, it adds it to the temp variation file, and the obo_ tables in postgres. Since it fails to create in the name service, that's probably okay with Hinxton, and since it gets added to postgres, you should be able to use it in the OA, and since it gets added to the temp variation file, on future updates of the ontology it gets added again. Probably best if someone confirms that's the process (and maybe points us to a wiki ?)
  
* Empty tags. Can remove tags from WB Expression model?
+
*Solution from Karen and Chris: If the Hinxton name server already has the variation but it isn't in the OA (as expected for Million Mutation Project variants like gk315316), we just need to add it through the old temp variations CGI:
** Cell -> 0 objects
 
** Expressed_in -> 0 objects
 
** Protein -> 0 objects
 
** Pseudogene -> 0 objects
 
  
* Microarray, Tiling Array, RNAseq associations
+
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo
**Microarray, Microarray_results
 
<pre>example: Expr1050000 -Yanai study
 
Currently accessible via the schema
 
https://wormbase.org/species/c_briggsae/expr_pattern/Expr1050000#0123--10</pre>
 
  
 +
making sure to enter the variation with name-space-WBVarID like:
  
**Tiling_array
+
gk315316 WBVar01148785
<pre>Example: Expr1040545 - Miller study
 
Currently accessible via the schema
 
https://wormbase.org/species/all/expr_pattern/Expr1040545#0123--10</pre>
 
  
 +
and then, after refresh, it should be available to the OA. Hinxton never has to get involved in this scenario.
  
**RNASeq
+
=== Confirm WS282 Upload Dates ===
<pre>Example:
+
*July 6th?
Expr_pattern : "Expr1142792" - Yanai study
+
*Data freeze/upload date on the release schedule is July 12th
Gene "WBGene00007063"
 
RNASeq "RNASeq_Study.SRP029448"
 
Currently accessible  via the schema
 
https://wormbase.org/species/all/expr_pattern/Expr1142792#0123--10</pre>
 
  
* Others
+
=== CenGen bar plots ===
** Historical_gene -> 51 objects. How are historical gene tags treated for other classes in Alliance?
+
*Initially discussed to have the bar plot images going in as image data
** EPIC -> Ad hoc Tag for Murray, no correspondence with method ontology terms
+
*CenGen group wants interactive bar plots similar to the modENCODE bar plots currently displayed in the FPKM expression data section on the expression widget. That way users could hover over a bar plot and see the cell type, the expression value (TPM, in our case) and the proportion of cells of each neuron type expressing the gene.
** Species -> what are we doing with non elegans annotations?
+
*They can provide the underlying data and have the WB team generate interactive plots for each gene
** MovieURL -> 32 - Mohler -> move to  movies
+
*Sibyl said that this is feasible and we could: 1. bring the data files in OR 2. call the CenGen API on the fly
 +
*The first approach may be more work but better in the long run as we store the data
 +
*Will ping Hinxton and see how they can integrate the data
  
===Braun Server Room===
+
* Bring in data  both as pictures and interactive bar plots
* Manager Dave Mathog retired. Uncertain about its management or fate.
+
* Ping Hinxon on GitHub to move this forward

Latest revision as of 18:59, 10 June 2021

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

January

February

March

April

May


June 3, 2021

Reserving meeting rooms

  • Raymond encountering challenges with setting up regular meeting room reservations in Chen building
  • We've been asked to make reservations one week in advance
  • Need to use a room if we reserve it

Summer student(s)

  • Anatomy function project with Raymond
  • Many types of anatomy function data submitted via AFP

June 10, 2021

Variation-Gene Associations

Variation in name service but not in OA

  • Ranjana: I could not find gk315316 in the OA though it exists in the name server. I agree that we probably don’t want to let all the million mutations into the OA since that would slow the drop-downs, but when we need one for curation, what needs to be done?
  • Juancarlos: That might be right. It seems to try to create the variation in the name service, and if it gets a 409 Conflict error, it adds it to the temp variation file, and the obo_ tables in postgres. Since it fails to create in the name service, that's probably okay with Hinxton, and since it gets added to postgres, you should be able to use it in the OA, and since it gets added to the temp variation file, on future updates of the ontology it gets added again. Probably best if someone confirms that's the process (and maybe points us to a wiki ?)
  • Solution from Karen and Chris: If the Hinxton name server already has the variation but it isn't in the OA (as expected for Million Mutation Project variants like gk315316), we just need to add it through the old temp variations CGI:

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo

making sure to enter the variation with name-space-WBVarID like:

gk315316 WBVar01148785

and then, after refresh, it should be available to the OA. Hinxton never has to get involved in this scenario.

Confirm WS282 Upload Dates

  • July 6th?
  • Data freeze/upload date on the release schedule is July 12th

CenGen bar plots

  • Initially discussed to have the bar plot images going in as image data
  • CenGen group wants interactive bar plots similar to the modENCODE bar plots currently displayed in the FPKM expression data section on the expression widget. That way users could hover over a bar plot and see the cell type, the expression value (TPM, in our case) and the proportion of cells of each neuron type expressing the gene.
  • They can provide the underlying data and have the WB team generate interactive plots for each gene
  • Sibyl said that this is feasible and we could: 1. bring the data files in OR 2. call the CenGen API on the fly
  • The first approach may be more work but better in the long run as we store the data
  • Will ping Hinxton and see how they can integrate the data
  • Bring in data both as pictures and interactive bar plots
  • Ping Hinxon on GitHub to move this forward