Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(41 intermediate revisions by 4 users not shown)
Line 25: Line 25:
  
 
= 2021 Meetings =
 
= 2021 Meetings =
 +
 +
[[WormBase-Caltech_Weekly_Calls_January_2021|January]]
 +
  
 
== Feb 4th, 2021 ==
 
== Feb 4th, 2021 ==
====How the "duplicate" function works in OAs with respect to object IDs (Ranjana and Juancarlos)====
+
===How the "duplicate" function works in OAs with respect to object IDs (Ranjana and Juancarlos)===
 
*A word of caution: when you duplicate a row, for those OAs with Object IDs (eg., WBGenotype00000014) note that the object ID gets duplicated as well and does not advance to the next ID like the PGID does
 
*A word of caution: when you duplicate a row, for those OAs with Object IDs (eg., WBGenotype00000014) note that the object ID gets duplicated as well and does not advance to the next ID like the PGID does
 
*If you do use the "duplicate" function, remember to manually change the Object ID
 
*If you do use the "duplicate" function, remember to manually change the Object ID
 +
* We can implement checks to make sure distinct annotations/objects don't share the same ID
 +
 +
=== GAF Wiki and headers ===
 +
* Any more comments about the Wiki page and the proposal? https://wiki.wormbase.org/index.php/WormBase_gene_association_file
 +
 +
=== Missing references in expression GAFs ===
 +
* ~300 missing from anatomy association file and ~45 missing from development association file
 +
* Daniela looking into missing refs; many are personal communications or very old papers
 +
* Will change ?Expr_pattern model to possibly remove ?Author reference and add in a ?Person reference instead
 +
** 399 objects in WS279 reference an author; Daniela will take a look
 +
* Would be good to have some reference for those objects in the GAF file on the FTP site; could use WBPerson when ready
 +
 +
 +
== Feb 11th, 2021 ==
 +
=== Alliance Literature Paper Tags ===
 +
*What do we definitely want to transfer to the Alliance?
 +
*Alliance literature group [https://docs.google.com/spreadsheets/d/1d3Y73x1BFiARkbxrvPPX2tCh5rFeBQRoBMHOcaXmijA/edit#gid=1866989939 spreadsheet]
 +
*Current flags vs legacy flags
 +
*Can we map everything to the proposed hierarchy or do we need to add some more classes?
 +
*Kimberly will review existing tags/flags to sort out what we know we need and what is questionable
 +
 +
=== Personal communications in Expr_pattern ===
 +
* 27 objects missing reference (personal communications)
 +
** Even if we capture the WBPerson in the Person tag, how are we submitting these to Alliance? The evidence required by the expression JSON spec https://github.com/alliance-genome/agr_schemas/blob/master/ingest/expression/wildtypeExpressionModelAnnotation.json (and other specs) must be a publication, as defined by the publicationRef.json https://github.com/alliance-genome/agr_schemas/blob/master/ingest/publicationRef.json. If there's no PMID for a publication listed as evidence, a MOD ID will suffice for the "publicationId" but we have no WBPaperID created for such  objects.
 +
** One way to solve this: Daniela can go over the list and see if the initial personal communication resulted in  a publication later on. One example is Expr181 (expression of cpl-1 in hypodermis and pharynx), communicated  via email by Sarwar Hashmi in 2000, Expr450 (expression of cpl-1 in hypodermis, intestine) communicated by  Britton in 2001. The pattern was published in 2002 by Hashmi and Britton in 2002 (WBPaper00005099). Daniela can then associate WBPaper00005099 to Expr181 and Expr450.
 +
** The solution above still  does not work for all: An example is lad-2 personal communication from Oliver Hobert, 2002. Later published by Lishia Chen (2008). Removing Oliver’s personal communication will remove evidence of data provenance from the Hobert’s lab. Unless Oliver published this in a paper that was eluded from our flagging system (e.g. flagged SVM negative).
 +
** Daniela can go over the entire list and contact the authors for such cases.
 +
** Are personal communications used in other classes?
 +
** Action item: Daniela will add Persons in the person tag for such communications. Will request a model change to Hinxton. Will ask Magda to populate column 6 of the GAF file with Author data. Will add a request for the DQMs to allow Persons in the 'Evidence' in the JSON in addition to Papers
 +
 +
=== Author data in Expr_pattern ===
 +
* 399 Expression objects have the author tag populated. Most of them were submitted even prior Wen started working on Expr_pattern.
 +
** out of 399 objects, we have 32 for which the authors partially match. One example is Expr60, which has Bauer as extra author in the .ace file. Bauer is not listed as author in the paper.
 +
** should we keep the author info and store it in the Person tag? Even if we do, how are we submitting these to Alliance? And should we at all? This is legacy data
 +
** Decision: we can remove the authors and add in the remarks the historic info
 +
 +
=== Date tag in Expr_pattern ===
 +
* The date tag seems to be  populated for objects that have authors (above) to probably capture when the submission occurred.
 +
* In addition, Date is populated for a large scale submission from Ian hope (2006-03), later published.
 +
* We can still keep this info as is for WB (currently stored in citace minus) but what are we going to do for the Alliance submission? The tag was used last time in 2006 for the Hope study but prior to this was used in  the ‘90s (1990, 1998).
 +
** We  can get rid of date, too. And pull the fo for the ones for which authors do not match
 +
 +
=== Proposed WormBase metrics page ===
 +
* Inspired by MGI's stats page:
 +
** http://www.informatics.jax.org/mgihome/homepages/stats/all_stats.shtml
 +
* Sibyl and Paulo working on. Prototype here: https://master.d25n59ij2csrbn.amplifyapp.com/
 +
** Current prototype is C. elegans specific
 +
* Chris is collecting ideas and queries here:
 +
** https://docs.google.com/spreadsheets/d/1OeZuMRSHelVD7tGRIxEkCKOyDaBPN29wrGbNU3TGxKU/edit?usp=sharing
 +
* Could eventually be used across the Alliance
  
== January 28, 2021 ==
 
  
=== String Matching Pipelines ===
+
== Feb 18th, 2021 ==
* Old pipelines on textpresso-dev are not compatible with the new TPC system
 
* New TPC API does not support string matching
 
* New Python library (wbtools) - used by variation pipeline - supports batch processing of WB literature and regex matching
 
* Email extraction
 
** No longer needed for concise description community curation tracker
 
** Juancarlos, Valerio, and Chris will meet to establish a new, streamlined email address extraction pipeline
 
* Old AFP Display CGI (http://tazendra.caltech.edu/~postgres/cgi-bin/author_fp_display.cgi)
 
** Still uses old Textpresso-Dev; no longer needed? Probably not; Karen can look if there's anything there worth keeping (nothing critical)
 
* Valerio will determine priorities (e.g. antibody stuff first), and send issues to curators as needed
 
  
=== Tracking interlibrary loan requests ===
+
=== CenGen data ===
* Raymond: Do we have a common place to track interlibrary loan requests? Could be useful for Alliance/WB
+
* How can we incorporate the CenGen data into WormBase pages? i.e. provide users info:
 +
** Per gene: what cells express this gene?
 +
** Per cell: what genes are expressed in this cell?
 +
** May be derived from Eduardo's data processing
 +
* CenGen has a weekly call: have invited Wen, Daniela, and Raymond
 +
** Too much for all three to join?
 +
* Good to establish healthy boundaries for responsibilities
 +
* Do they want to collaborate or no?
 +
* We can link to the main CenGen page; once gene-level data is available we can consume and make available
 +
* Eduardo's tool is one WormBase tool for processing and providing CenGen data; will make them aware
 +
* Ultimately this data (and its presentation) will need to get into the Alliance; may remain a WB-specific/portal feature for the near future
 +
* Alaska? Probably won't be maintained
  
=== Cengen data ===
+
=== Cleaning up bounced emails to outreach@wormbase.org ===
 +
* Many unread messages (~140) in inbox
 +
* Many of those are bounced emails from AFP pipeline and webinar announcements
 +
* If relevant people could review those bounced emails and, as appropriate, add people or email addresses to the Omit list using the Omit Form CGI (http://tazendra.caltech.edu/~postgres/cgi-bin/omit_form.cgi) that would be appreciated.
  
Eduardo: I got the cengen 2020 data over christmas, and I have repackaged the full release data (all 100k cells with annotations, but prior to the soupX processing) in an h5ad file which I make available here https://wormcells.com/
 
  
* This is the repo that makes that website: https://github.com/Munfred/wormcells-data
+
== Feb 25th, 2021 ==
* I figured a way to spin up the interface I made for doing differential expression  through google colab, so now people can do it with any h5ad file they want. As an example I wrote a notebook that runs it with the 100k cells from cengen: https://colab.research.google.com/github/Munfred/scdefg/blob/main/scdefg.ipynb
 
* Since they are thinking about simple things that can be offered in wormbase, I will also briefly talk about this dashboard that i made for a UCLA group that wanted to look at nuclei data. It uses the tissue enrichment analysis code for the bottom 3 plots.
 
  
=== Tracking corresponding authors for papers at Alliance ===
+
===Expr_pattern clean up===
* Corresponding authors not tracked in ACEDB, because authors are just text names not IDs
+
* Seldom populated  tags. Can we move the associated info into remarks and get rid of the tag? This is in view of the Alliance import
* Maybe Cecilia could link a WBPerson as corresponding author for a paper during curation?
+
** Protein_description 33 objects -> Decision: Move to remarks -> Done DR 2021/02/26. Redundant info such as CPL-1 in Protein_description and CPL-1 in gene name were omitted.
 +
<pre>Example: Expr_pattern : "Expr450"
 +
Gene "WBGene00000776"
 +
Protein_description "CPL-1"
  
== January 21, 2021 ==
+
Expr_pattern : "Expr552"
 +
Gene "WBGene00006528"
 +
Protein_description "Tubulin alpha"</pre>
  
=== Neural Network (NN) Paper Classification Results ===
+
** Sequence 12 objects -> Decision: Move to remarks
* Linking to Paper Display tool (as opposed to Paper Editor) from Michael's webpage for NN results (Michael will make change)
+
<pre>Example: Expr_pattern : "Expr12"
* NN results will be incorporated into the Curation Status Form
+
      Gene "WBGene00003976"
* For AFP and VFP, there is now a table with mixed SVM and NN results ("blackbox" results); for a given paper, if NN results exist, they take priority over any SVM results
+
      Sequence "Z28377|Z28375|Z28376"</pre>
* Decision: we will omit blackbox results (at least for now) from curation status form (just add the new NN results separately)
 
* We have stopped running SVM on new papers
 
* Interactions SVM has performed better than new NN results; would be worth attempting a retraining
 
  
=== Community Phenotype Curation ===
+
** Laboratory 23 objects -> can infer via publication -> Decision: good to ignore
* On hold for a few months to commit time to updating the phenotype annotation model to accommodate, e.g. double mutant phenotypes, multiple RNAi targets (intended or otherwise), mutant transgene products causing phenotypes, expressed human genes causing phenotypes, etc.
+
<pre>Example: Expr_pattern : "Expr87"
* Changes made for WB phenotypes may carry over to Alliance phenotype work
+
        …
* [https://www.preprints.org/manuscript/202101.0169/v1 Paper out now] on undergrad community phenotype curation project with Lina Dahlberg; we may get more requests for trying this with other undergrad classes
+
Laboratory "ML"
 +
Gene "WBGene00003012"</pre>
  
=== AFP Anatomy Function flagging ===
 
* Sometimes it is difficult to assess whether an author flag is correct (often times can be wrong/absent)
 
* What about giving authors/users feedback on their flagging results?
 
* Would be good to provide content from paper where this data is said to exist (automatically from a Textpresso pipeline or manually from author identified data)
 
* We want to be careful about how we provide feedback; we should be proactive to make improvements/modifications on our end and bring those back to users for feedback to us
 
  
== January 14th, 2021 ==
+
* Empty tags. Can remove tags from WB Expression model? Yes
 +
** Cell -> 0 objects
 +
** Expressed_in -> 0 objects
 +
** Protein -> 0 objects
 +
** Pseudogene -> 0 objects
  
===PubMed LinkOut to WormBase Paper Pages (Kimberly) ===
 
* Other databases [https://www.ncbi.nlm.nih.gov/projects/linkout/doc/nonbiblinkout.html link out from PubMed] to their respective paper pages
 
* For example, https://pubmed.ncbi.nlm.nih.gov/20864032/ links out to GO and MGI paper pages
 
* Would like to set this up for WormBase and ultimately for the Alliance, but this will require some developer help
 
* Work on this next month (after AFP and GO grant submissions)?
 
  
===Update cycle for HGNC data in the OA (Ranjana) ===
+
* Microarray, Tiling Array, RNAseq associations -> Discuss with  the Expression WG how to bring in images for these and treat as image  objects liked to the high  throughput data
*Juancarlos had these questions for us:
+
**Microarray, Microarray_results
<pre style="white-space: pre-wrap;
+
<pre>example: Expr1050000 -Yanai study
white-space: -moz-pre-wrap;
+
Currently accessible via the schema
white-space: -pre-wrap;
+
https://wormbase.org/species/c_briggsae/expr_pattern/Expr1050000#0123--10</pre>
white-space: -o-pre-wrap;
 
word-wrap: break-word">
 
  
There's a script here that repopulates the postgres obo_*_hgnc tables
 
based off of Chris and Wen's data
 
/home/postgres/work/pgpopulation/obo_oa_ontologies/populate_obo_hgnc.pl
 
  
It's not on a cronjob, because I think the files are not updated that
+
**Tiling_array
often. Do we want to run this every night, or run it manually when
+
<pre>Example: Expr1040545 - Miller study
the files get re-generated ?  Or run every night, and check if the
+
Currently accessible via the schema
files's timestamps have changed, then repopulate postgres ?
+
https://wormbase.org/species/all/expr_pattern/Expr1040545#0123--10</pre>
  
</pre>
 
  
===Minutes===
+
**RNASeq
====PubMed LinkOut to WormBase Paper Pages====
+
<pre>Example:
 +
Expr_pattern : "Expr1142792" - Yanai study
 +
Gene "WBGene00007063"
 +
RNASeq "RNASeq_Study.SRP029448"
 +
Currently accessible  via the  schema
 +
https://wormbase.org/species/all/expr_pattern/Expr1142792#0123--10</pre>
  
====Update cycle for HGNC data in the OA====
+
* Others
*We will update when Alliance updates the data
+
** Historical_gene -> 51 objects. How are historical gene tags treated for other classes in Alliance? -> keep in Alliance and maintain the same mechanism when  curation will be  moved over
*Juancarlos will set it to check the timestamps and if they change will do an update for the OAs
+
** EPIC -> Ad hoc Tag for Murray, no correspondence with method ontology terms -> ok to ignore
 +
** Species -> what are we doing with non elegans annotations?
 +
** MovieURL -> 32 - Mohler -> move to movies -> talk again with Raymond
  
====CENGEN====
+
===Braun Server Room===
*Wen, Daniela, and Raymond will look at the datasets to work out how to incorporate. Start simple.
+
* Manager Dave Mathog retired. Uncertain about its management or fate.
*We will make links to pages on their site.
 

Revision as of 17:40, 26 February 2021

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings


2021 Meetings

January


Feb 4th, 2021

How the "duplicate" function works in OAs with respect to object IDs (Ranjana and Juancarlos)

  • A word of caution: when you duplicate a row, for those OAs with Object IDs (eg., WBGenotype00000014) note that the object ID gets duplicated as well and does not advance to the next ID like the PGID does
  • If you do use the "duplicate" function, remember to manually change the Object ID
  • We can implement checks to make sure distinct annotations/objects don't share the same ID

GAF Wiki and headers

Missing references in expression GAFs

  • ~300 missing from anatomy association file and ~45 missing from development association file
  • Daniela looking into missing refs; many are personal communications or very old papers
  • Will change ?Expr_pattern model to possibly remove ?Author reference and add in a ?Person reference instead
    • 399 objects in WS279 reference an author; Daniela will take a look
  • Would be good to have some reference for those objects in the GAF file on the FTP site; could use WBPerson when ready


Feb 11th, 2021

Alliance Literature Paper Tags

  • What do we definitely want to transfer to the Alliance?
  • Alliance literature group spreadsheet
  • Current flags vs legacy flags
  • Can we map everything to the proposed hierarchy or do we need to add some more classes?
  • Kimberly will review existing tags/flags to sort out what we know we need and what is questionable

Personal communications in Expr_pattern

  • 27 objects missing reference (personal communications)
    • Even if we capture the WBPerson in the Person tag, how are we submitting these to Alliance? The evidence required by the expression JSON spec https://github.com/alliance-genome/agr_schemas/blob/master/ingest/expression/wildtypeExpressionModelAnnotation.json (and other specs) must be a publication, as defined by the publicationRef.json https://github.com/alliance-genome/agr_schemas/blob/master/ingest/publicationRef.json. If there's no PMID for a publication listed as evidence, a MOD ID will suffice for the "publicationId" but we have no WBPaperID created for such objects.
    • One way to solve this: Daniela can go over the list and see if the initial personal communication resulted in a publication later on. One example is Expr181 (expression of cpl-1 in hypodermis and pharynx), communicated via email by Sarwar Hashmi in 2000, Expr450 (expression of cpl-1 in hypodermis, intestine) communicated by Britton in 2001. The pattern was published in 2002 by Hashmi and Britton in 2002 (WBPaper00005099). Daniela can then associate WBPaper00005099 to Expr181 and Expr450.
    • The solution above still does not work for all: An example is lad-2 personal communication from Oliver Hobert, 2002. Later published by Lishia Chen (2008). Removing Oliver’s personal communication will remove evidence of data provenance from the Hobert’s lab. Unless Oliver published this in a paper that was eluded from our flagging system (e.g. flagged SVM negative).
    • Daniela can go over the entire list and contact the authors for such cases.
    • Are personal communications used in other classes?
    • Action item: Daniela will add Persons in the person tag for such communications. Will request a model change to Hinxton. Will ask Magda to populate column 6 of the GAF file with Author data. Will add a request for the DQMs to allow Persons in the 'Evidence' in the JSON in addition to Papers

Author data in Expr_pattern

  • 399 Expression objects have the author tag populated. Most of them were submitted even prior Wen started working on Expr_pattern.
    • out of 399 objects, we have 32 for which the authors partially match. One example is Expr60, which has Bauer as extra author in the .ace file. Bauer is not listed as author in the paper.
    • should we keep the author info and store it in the Person tag? Even if we do, how are we submitting these to Alliance? And should we at all? This is legacy data
    • Decision: we can remove the authors and add in the remarks the historic info

Date tag in Expr_pattern

  • The date tag seems to be populated for objects that have authors (above) to probably capture when the submission occurred.
  • In addition, Date is populated for a large scale submission from Ian hope (2006-03), later published.
  • We can still keep this info as is for WB (currently stored in citace minus) but what are we going to do for the Alliance submission? The tag was used last time in 2006 for the Hope study but prior to this was used in the ‘90s (1990, 1998).
    • We can get rid of date, too. And pull the fo for the ones for which authors do not match

Proposed WormBase metrics page


Feb 18th, 2021

CenGen data

  • How can we incorporate the CenGen data into WormBase pages? i.e. provide users info:
    • Per gene: what cells express this gene?
    • Per cell: what genes are expressed in this cell?
    • May be derived from Eduardo's data processing
  • CenGen has a weekly call: have invited Wen, Daniela, and Raymond
    • Too much for all three to join?
  • Good to establish healthy boundaries for responsibilities
  • Do they want to collaborate or no?
  • We can link to the main CenGen page; once gene-level data is available we can consume and make available
  • Eduardo's tool is one WormBase tool for processing and providing CenGen data; will make them aware
  • Ultimately this data (and its presentation) will need to get into the Alliance; may remain a WB-specific/portal feature for the near future
  • Alaska? Probably won't be maintained

Cleaning up bounced emails to outreach@wormbase.org

  • Many unread messages (~140) in inbox
  • Many of those are bounced emails from AFP pipeline and webinar announcements
  • If relevant people could review those bounced emails and, as appropriate, add people or email addresses to the Omit list using the Omit Form CGI (http://tazendra.caltech.edu/~postgres/cgi-bin/omit_form.cgi) that would be appreciated.


Feb 25th, 2021

Expr_pattern clean up

  • Seldom populated tags. Can we move the associated info into remarks and get rid of the tag? This is in view of the Alliance import
    • Protein_description 33 objects -> Decision: Move to remarks -> Done DR 2021/02/26. Redundant info such as CPL-1 in Protein_description and CPL-1 in gene name were omitted.
Example: Expr_pattern : "Expr450"
	Gene	 "WBGene00000776"
	Protein_description	 "CPL-1"

	Expr_pattern : "Expr552"
	Gene	 "WBGene00006528"
	Protein_description	 "Tubulin alpha"
    • Sequence 12 objects -> Decision: Move to remarks
Example: Expr_pattern : "Expr12"
		      Gene	 "WBGene00003976"
		      Sequence	 "Z28377|Z28375|Z28376"
    • Laboratory 23 objects -> can infer via publication -> Decision: good to ignore
Example: Expr_pattern : "Expr87"
        …
	Laboratory	 "ML"
	Gene	 "WBGene00003012"


  • Empty tags. Can remove tags from WB Expression model? Yes
    • Cell -> 0 objects
    • Expressed_in -> 0 objects
    • Protein -> 0 objects
    • Pseudogene -> 0 objects


  • Microarray, Tiling Array, RNAseq associations -> Discuss with the Expression WG how to bring in images for these and treat as image objects liked to the high throughput data
    • Microarray, Microarray_results
example: Expr1050000 -Yanai study
Currently accessible via the schema
https://wormbase.org/species/c_briggsae/expr_pattern/Expr1050000#0123--10


    • Tiling_array
Example: Expr1040545 - Miller study
Currently accessible via the schema
https://wormbase.org/species/all/expr_pattern/Expr1040545#0123--10


    • RNASeq
Example: 
	Expr_pattern : "Expr1142792" - Yanai study
	Gene	 "WBGene00007063"
	RNASeq	 "RNASeq_Study.SRP029448"
Currently accessible  via the  schema
https://wormbase.org/species/all/expr_pattern/Expr1142792#0123--10
  • Others
    • Historical_gene -> 51 objects. How are historical gene tags treated for other classes in Alliance? -> keep in Alliance and maintain the same mechanism when curation will be moved over
    • EPIC -> Ad hoc Tag for Murray, no correspondence with method ontology terms -> ok to ignore
    • Species -> what are we doing with non elegans annotations?
    • MovieURL -> 32 - Mohler -> move to movies -> talk again with Raymond

Braun Server Room

  • Manager Dave Mathog retired. Uncertain about its management or fate.