Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
Line 17: Line 17:
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
= 2018 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
== January 4, 2018 ==
+
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
  
=== WS264 Upload ===
 
* Citace upload to Wen, Tuesday January 16th, by 10am PST
 
* Upload to Hinxton on Jan 19th
 
  
=== Strain data import to AGR for disease ===
+
= 2021 Meetings =
* Will begin to consider pulling in strains into AGR
 
* Will need to think about how genotypes are built and stored at other MODs
 
* We should encourage authors to include strain IDs
 
* Diseases are annotated to genes, alleles, and strains within WB
 
  
=== Curating phenotypes and diseases to strains or genotypes ===
+
[[WormBase-Caltech_Weekly_Calls_January_2021|January]]
* Should we generate a ?Genotype class to capture genotypes without a known strain name? Or to capture relevant/relative genotypes thought to be responsible for a phenotype or disease?
 
* We could create un-named strain objects, that use a new unique identifier as a primary identifier and represent the entire genotype of a strain used
 
** Introduction of a new ?Strain class attribute of a unique serial identifier (like WBStrain00001) would be very costly to implement; would need to consider how crucial this is before implementing
 
** We can, instead, use new strain (public) names like "WBPaper00012345_Strain1", etc. instead of creating new unique ID attribute for un-named strains
 
* When curating phenotypes to strains, we will want to specify what is the relevant/relative genotype that is causative/correlated with the disease or phenotype observation
 
** Would be best if the specification of the relevant genotype used controlled vocabularies (when possible) and free text (when needed); would need to work out the logistics/mechanics of such curation
 
** Transgene-phenotype curation currently specifies causative gene, but would be more complicated for strains
 
* Alternatively, we could create the ?Genotype class to represent the abstract "relative"/"relevant" genotype thought to be responsible for the phenotype or disease, and annotate directly to that ?Genotype object
 
* ?Strain approach:
 
** Use strain if named (but important to know if the control strain is not simply N2)
 
*** If control strain is simply N2, causative genotype (and respective components) can be inferred from strain genotype
 
*** If control strain is not N2, causative genotype and components would need to be specified at the moment of phenotype/disease curation (by mechanism to be worked out)
 
** If no strain name provided, create "un-named" strain that contains the entire genotype provided by authors
 
*** Control strain issues above would still need to be addressed
 
* ?Genotype approach:
 
** ?Genotype class could represent individual instances of relevant/relative genotypes that are suggested to be causative for a disease or phenotype
 
** ?Genotype objects would be created with formal construction, with DB associations to each component object (e.g. alleles, transgenes, etc.) as well as free text descriptions (for components with no corresponding DB object)
 
** Such ?Genotype objects could be used repeatedly throughout a paper when applicable, but would likely not be used in any other papers (we would likely accumulate redundant objects in the DB)
 
* We may want to consider strains with same public name that have diverged
 
** Apply new strain names with prefixes/suffixes? Create new strain objects? Keep original?
 
* Need to determine how each AGR member DB curates phenotypes or diseases to genotypes: is each "genotype" a relative or absolute genotype?
 
  
  
== January 11, 2018 ==
+
== Feb 4th, 2021 ==
 +
===How the "duplicate" function works in OAs with respect to object IDs (Ranjana and Juancarlos)===
 +
*A word of caution: when you duplicate a row, for those OAs with Object IDs (eg., WBGenotype00000014) note that the object ID gets duplicated as well and does not advance to the next ID like the PGID does
 +
*If you do use the "duplicate" function, remember to manually change the Object ID
 +
* We can implement checks to make sure distinct annotations/objects don't share the same ID
  
=== IWM swag ===
+
=== GAF Wiki and headers ===
* Eppendorf tube openers with WormBase logo?
+
* Any more comments about the Wiki page and the proposal? https://wiki.wormbase.org/index.php/WormBase_gene_association_file
  
=== Update on AFP Form ===
+
=== Missing references in expression GAFs ===
*[https://docs.google.com/presentation/d/1anFOFRK9Ida1UEvrXWf2OBAJ9F4xyU4lWJGy-qr6wRU/edit#slide=id.p3 In-progress mock-up of new form]
+
* ~300 missing from anatomy association file and ~45 missing from development association file
*[https://docs.google.com/spreadsheets/d/1sS_uAjBJ2r5H90Lam62Ai0HunjwvfjnklkFNrDoNXeU/edit#gid=1929595460 Data type spreadsheet]
+
* Daniela looking into missing refs; many are personal communications or very old papers
*[http://wiki.wormbase.org/index.php/First-pass_flagging_pipelines#Author_first-pass_form_revisions Curation forms info]
+
* Will change ?Expr_pattern model to possibly remove ?Author reference and add in a ?Person reference instead
 +
** 399 objects in WS279 reference an author; Daniela will take a look
 +
* Would be good to have some reference for those objects in the GAF file on the FTP site; could use WBPerson when ready
  
*Idea is to move from author flagging to author validation of text mining and data submission wherever possible
 
*Goal is to flag all data types in a paper and either curate at WB or share with a group that does curate that data
 
*SVM flags and author flags can/will be used as filters in TPC
 
*Provide examples of what we want for each type of data to help avoid confusion
 
* Recognize entities automatically and show list to author
 
** Species, strains, genes, alleles, transgenes, etc.
 
** Ask to verify or add unrecognized
 
** Could show known/existing objects with checkboxes
 
** Possibly include unrecognized pattern matching objects? Ask author to verify if these are real?
 
** For strains:
 
*** Show recorded genotype for verification; maybe ask to update/modify if needed?
 
** For transgenes:
 
*** When author submits new transgene, send them to a transgene form, or send them an email asking for details?
 
*** Form could be for both strain and transgene
 
* Mapping data: still ask for? Maybe for balancers, but no one is reporting that. Could still ask if there's interest
 
* Maybe provide option for author to save their progress and return to the form later
 
* Phenotypes
 
** Ask for allele, RNAi and overexpression phenotypes with links to Phenotype form
 
** Also ask for drug/chemical and environmental perturbations (call treatment?); store as free text for now, accommodate with new data model when available
 
* Gene site- and time-of-action, mosaic
 
** Appears to be confusion from authors about mosaics. Should we keep this?
 
** Will keep gene site-of-action and time-of-action; leave unchecked (no SVM, yet) but allow users to indicate
 
* Cell and anatomy data
 
** Cell function ("Cell ablation (laser/genetic) data, optogenetics")
 
** Ultrastructural analysis
 
* Interaction data
 
** Genetic interactions
 
** Physical interactions
 
** Functional complementation
 
* Comparative genomics
 
* Gene expression & regulation
 
  
 +
== Feb 11th, 2021 ==
 +
=== Alliance Literature Paper Tags ===
 +
*What do we definitely want to transfer to the Alliance?
 +
*Alliance literature group [https://docs.google.com/spreadsheets/d/1d3Y73x1BFiARkbxrvPPX2tCh5rFeBQRoBMHOcaXmijA/edit#gid=1866989939 spreadsheet]
 +
*Current flags vs legacy flags
 +
*Can we map everything to the proposed hierarchy or do we need to add some more classes?
 +
*Kimberly will review existing tags/flags to sort out what we know we need and what is questionable
  
 +
=== Personal communications in Expr_pattern ===
 +
* 27 objects missing reference (personal communications)
 +
** Even if we capture the WBPerson in the Person tag, how are we submitting these to Alliance? The evidence required by the expression JSON spec https://github.com/alliance-genome/agr_schemas/blob/master/ingest/expression/wildtypeExpressionModelAnnotation.json (and other specs) must be a publication, as defined by the publicationRef.json https://github.com/alliance-genome/agr_schemas/blob/master/ingest/publicationRef.json. If there's no PMID for a publication listed as evidence, a MOD ID will suffice for the "publicationId" but we have no WBPaperID created for such  objects.
 +
** One way to solve this: Daniela can go over the list and see if the initial personal communication resulted in  a publication later on. One example is Expr181 (expression of cpl-1 in hypodermis and pharynx), communicated  via email by Sarwar Hashmi in 2000, Expr450 (expression of cpl-1 in hypodermis, intestine) communicated by  Britton in 2001. The pattern was published in 2002 by Hashmi and Britton in 2002 (WBPaper00005099). Daniela can then associate WBPaper00005099 to Expr181 and Expr450.
 +
** The solution above still  does not work for all: An example is lad-2 personal communication from Oliver Hobert, 2002. Later published by Lishia Chen (2008). Removing Oliver’s personal communication will remove evidence of data provenance from the Hobert’s lab. Unless Oliver published this in a paper that was eluded from our flagging system (e.g. flagged SVM negative).
 +
** Daniela can go over the entire list and contact the authors for such cases.
 +
** Are personal communications used in other classes?
 +
** Action item: Daniela will add Persons in the person tag for such communications. Will request a model change to Hinxton. Will ask Magda to populate column 6 of the GAF file with Author data. Will add a request for the DQMs to allow Persons in the 'Evidence' in the JSON in addition to Papers
  
== January 18, 2018 ==
+
=== Author data in Expr_pattern ===
 +
* 399 Expression objects have the author tag populated. Most of them were submitted even prior Wen started working on Expr_pattern.
 +
** out of 399 objects, we have 32 for which the authors partially match. One example is Expr60, which has Bauer as extra author in the .ace file. Bauer is not listed as author in the paper.
 +
** should we keep the author info and store it in the Person tag? Even if we do, how are we submitting these to Alliance? And should we at all? This is legacy data
 +
** Decision: we can remove the authors and add in the remarks the historic info
  
=== WormBase Tutorials ===
+
=== Date tag in Expr_pattern ===
* May be good to get (possibly anonymous) written questions or suggestions after presenting
+
* The date tag seems to be populated for objects that have authors (above) to probably capture when the submission occurred.
* Wen will have Skype call with Yishi Jin
+
* In addition, Date is populated for a large scale submission from Ian hope (2006-03), later published.
* Micropublications
+
* We can still keep this info as is for WB (currently stored in citace minus) but what are we going to do for the Alliance submission? The tag was used last time in 2006 for the Hope study but prior to this was used in  the ‘90s (1990, 1998).
** how do we peer-review single experiment? No supporting information to corroborate a larger story
+
** We  can get rid of date, too. And pull the fo for the ones for which authors do not match
** Is the greater benefit of peer-review that the whole story is assessed by reviewers
 
** Do MPs help or hurt reproducibility?
 
** Larger papers may have lots of poor experiments that don't get much attention but still pass peer review
 
** Dedicated peer review on single experiment may be more rigorous
 
** What are the criteria/minimal requirements to micropublish?
 
* Concise descriptions
 
** SimpleMine has multiple descriptions output; people asked about the different types
 
** Yishi Jin suggested that we remind users to update manually written descriptions
 
** Showing last-updated date is important
 
** Automated descriptions relies on primary data; will rely on forms and community submissions
 
** Microreviews? Would want to guide authors what data we want; provide a template?
 
* Public/community education issues
 
** Users shouldn't assume that WormBase is comprehensively up to date
 
* Wen will also present at MidWest meeting (Ann Arbor, MI) in April and Boulder, Colorado in May
 
** Will assess topic interest ahead of time
 
  
=== New Cytoscape display for interactions ===
+
=== Proposed WormBase metrics page ===
* Sibyl developed a new Cytoscape display for interactions, now live with WS262 release
+
* Inspired by MGI's stats page:
* Simplified colors and subtypes
+
** http://www.informatics.jax.org/mgihome/homepages/stats/all_stats.shtml
* Redraw button to clean up the graph based on what you want to see
+
* Sibyl and Paulo working on. Prototype here: https://master.d25n59ij2csrbn.amplifyapp.com/
* Play around and let Sibyl and/or Chris know about issues
+
** Current prototype is C. elegans specific
 +
* Chris is collecting ideas and queries here:
 +
** https://docs.google.com/spreadsheets/d/1OeZuMRSHelVD7tGRIxEkCKOyDaBPN29wrGbNU3TGxKU/edit?usp=sharing
 +
* Could eventually be used across the Alliance
  
  
== January 25, 2018 ==
+
== Feb 18th, 2021 ==
  
=== UCSF visit report ===
+
=== CenGen data ===
 +
* How can we incorporate the CenGen data into WormBase pages? i.e. provide users info:
 +
** Per gene: what cells express this gene?
 +
** Per cell: what genes are expressed in this cell?
 +
** May be derived from Eduardo's data processing
 +
* CenGen has a weekly call: have invited Wen, Daniela, and Raymond
 +
** Too much for all three to join?
 +
* Good to establish healthy boundaries for responsibilities
 +
* Do they want to collaborate or no?
 +
* We can link to the main CenGen page; once gene-level data is available we can consume and make available
 +
* Eduardo's tool is one WormBase tool for processing and providing CenGen data; will make them aware
 +
* Ultimately this data (and its presentation) will need to get into the Alliance; may remain a WB-specific/portal feature for the near future
 +
* Alaska? Probably won't be maintained
 +
 
 +
=== Cleaning up bounced emails to outreach@wormbase.org ===
 +
* Many unread messages (~140) in inbox
 +
* Many of those are bounced emails from AFP pipeline and webinar announcements
 +
* If relevant people could review those bounced emails and, as appropriate, add people or email addresses to the Omit list using the Omit Form CGI (http://tazendra.caltech.edu/~postgres/cgi-bin/omit_form.cgi) that would be appreciated.
 +
 
 +
 
 +
== Feb 25th, 2021 ==
 +
 
 +
===Expr_pattern clean up===
 +
* Seldom populated  tags. Can we move the associated info into remarks and get rid of the tag? This is in view of the Alliance import
 +
** Protein_description 33 objects -> Decision: Move to remarks -> Done DR 2021/02/26. Redundant info such as CPL-1 in Protein_description and CPL-1 in gene name were omitted.
 +
<pre>Example: Expr_pattern : "Expr450"
 +
Gene "WBGene00000776"
 +
Protein_description "CPL-1"
 +
 
 +
Expr_pattern : "Expr552"
 +
Gene "WBGene00006528"
 +
Protein_description "Tubulin alpha"</pre>
 +
 
 +
** Sequence 12 objects -> Decision: Move to remarks
 +
<pre>Example: Expr_pattern : "Expr12"
 +
      Gene "WBGene00003976"
 +
      Sequence "Z28377|Z28375|Z28376"</pre>
 +
 
 +
** Laboratory 23 objects -> can infer via publication -> Decision: good to ignore
 +
<pre>Example: Expr_pattern : "Expr87"
 +
        …
 +
Laboratory "ML"
 +
Gene "WBGene00003012"</pre>
 +
 
 +
 
 +
* Empty tags. Can remove tags from WB Expression model? Yes
 +
** Cell -> 0 objects
 +
** Expressed_in -> 0 objects
 +
** Protein -> 0 objects
 +
** Pseudogene -> 0 objects
 +
 
 +
 
 +
* Microarray, Tiling Array, RNAseq associations -> Discuss with  the Expression WG how to bring in images for these and treat as image  objects liked to the high  throughput data
 +
**Microarray, Microarray_results
 +
<pre>example: Expr1050000 -Yanai study
 +
Currently accessible via the schema
 +
https://wormbase.org/species/c_briggsae/expr_pattern/Expr1050000#0123--10</pre>
 +
 
 +
 
 +
**Tiling_array
 +
<pre>Example: Expr1040545 - Miller study
 +
Currently accessible via the schema
 +
https://wormbase.org/species/all/expr_pattern/Expr1040545#0123--10</pre>
 +
 
 +
 
 +
**RNASeq
 +
<pre>Example:
 +
Expr_pattern : "Expr1142792" - Yanai study
 +
Gene "WBGene00007063"
 +
RNASeq "RNASeq_Study.SRP029448"
 +
Currently accessible  via the  schema
 +
https://wormbase.org/species/all/expr_pattern/Expr1142792#0123--10</pre>
 +
 
 +
* Others
 +
** Historical_gene -> 51 objects. How are historical gene tags treated for other classes in Alliance? -> keep in Alliance and maintain the same mechanism when  curation will be  moved over
 +
** EPIC -> Ad hoc Tag for Murray, no correspondence with method ontology terms -> ok to ignore
 +
** Species -> what are we doing with non elegans annotations?
 +
** MovieURL -> 32 - Mohler -> move to  movies -> talk again with Raymond
 +
 
 +
===Braun Server Room===
 +
* Manager Dave Mathog retired. Uncertain about its management or fate.

Latest revision as of 17:40, 26 February 2021

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings


2021 Meetings

January


Feb 4th, 2021

How the "duplicate" function works in OAs with respect to object IDs (Ranjana and Juancarlos)

  • A word of caution: when you duplicate a row, for those OAs with Object IDs (eg., WBGenotype00000014) note that the object ID gets duplicated as well and does not advance to the next ID like the PGID does
  • If you do use the "duplicate" function, remember to manually change the Object ID
  • We can implement checks to make sure distinct annotations/objects don't share the same ID

GAF Wiki and headers

Missing references in expression GAFs

  • ~300 missing from anatomy association file and ~45 missing from development association file
  • Daniela looking into missing refs; many are personal communications or very old papers
  • Will change ?Expr_pattern model to possibly remove ?Author reference and add in a ?Person reference instead
    • 399 objects in WS279 reference an author; Daniela will take a look
  • Would be good to have some reference for those objects in the GAF file on the FTP site; could use WBPerson when ready


Feb 11th, 2021

Alliance Literature Paper Tags

  • What do we definitely want to transfer to the Alliance?
  • Alliance literature group spreadsheet
  • Current flags vs legacy flags
  • Can we map everything to the proposed hierarchy or do we need to add some more classes?
  • Kimberly will review existing tags/flags to sort out what we know we need and what is questionable

Personal communications in Expr_pattern

  • 27 objects missing reference (personal communications)
    • Even if we capture the WBPerson in the Person tag, how are we submitting these to Alliance? The evidence required by the expression JSON spec https://github.com/alliance-genome/agr_schemas/blob/master/ingest/expression/wildtypeExpressionModelAnnotation.json (and other specs) must be a publication, as defined by the publicationRef.json https://github.com/alliance-genome/agr_schemas/blob/master/ingest/publicationRef.json. If there's no PMID for a publication listed as evidence, a MOD ID will suffice for the "publicationId" but we have no WBPaperID created for such objects.
    • One way to solve this: Daniela can go over the list and see if the initial personal communication resulted in a publication later on. One example is Expr181 (expression of cpl-1 in hypodermis and pharynx), communicated via email by Sarwar Hashmi in 2000, Expr450 (expression of cpl-1 in hypodermis, intestine) communicated by Britton in 2001. The pattern was published in 2002 by Hashmi and Britton in 2002 (WBPaper00005099). Daniela can then associate WBPaper00005099 to Expr181 and Expr450.
    • The solution above still does not work for all: An example is lad-2 personal communication from Oliver Hobert, 2002. Later published by Lishia Chen (2008). Removing Oliver’s personal communication will remove evidence of data provenance from the Hobert’s lab. Unless Oliver published this in a paper that was eluded from our flagging system (e.g. flagged SVM negative).
    • Daniela can go over the entire list and contact the authors for such cases.
    • Are personal communications used in other classes?
    • Action item: Daniela will add Persons in the person tag for such communications. Will request a model change to Hinxton. Will ask Magda to populate column 6 of the GAF file with Author data. Will add a request for the DQMs to allow Persons in the 'Evidence' in the JSON in addition to Papers

Author data in Expr_pattern

  • 399 Expression objects have the author tag populated. Most of them were submitted even prior Wen started working on Expr_pattern.
    • out of 399 objects, we have 32 for which the authors partially match. One example is Expr60, which has Bauer as extra author in the .ace file. Bauer is not listed as author in the paper.
    • should we keep the author info and store it in the Person tag? Even if we do, how are we submitting these to Alliance? And should we at all? This is legacy data
    • Decision: we can remove the authors and add in the remarks the historic info

Date tag in Expr_pattern

  • The date tag seems to be populated for objects that have authors (above) to probably capture when the submission occurred.
  • In addition, Date is populated for a large scale submission from Ian hope (2006-03), later published.
  • We can still keep this info as is for WB (currently stored in citace minus) but what are we going to do for the Alliance submission? The tag was used last time in 2006 for the Hope study but prior to this was used in the ‘90s (1990, 1998).
    • We can get rid of date, too. And pull the fo for the ones for which authors do not match

Proposed WormBase metrics page


Feb 18th, 2021

CenGen data

  • How can we incorporate the CenGen data into WormBase pages? i.e. provide users info:
    • Per gene: what cells express this gene?
    • Per cell: what genes are expressed in this cell?
    • May be derived from Eduardo's data processing
  • CenGen has a weekly call: have invited Wen, Daniela, and Raymond
    • Too much for all three to join?
  • Good to establish healthy boundaries for responsibilities
  • Do they want to collaborate or no?
  • We can link to the main CenGen page; once gene-level data is available we can consume and make available
  • Eduardo's tool is one WormBase tool for processing and providing CenGen data; will make them aware
  • Ultimately this data (and its presentation) will need to get into the Alliance; may remain a WB-specific/portal feature for the near future
  • Alaska? Probably won't be maintained

Cleaning up bounced emails to outreach@wormbase.org

  • Many unread messages (~140) in inbox
  • Many of those are bounced emails from AFP pipeline and webinar announcements
  • If relevant people could review those bounced emails and, as appropriate, add people or email addresses to the Omit list using the Omit Form CGI (http://tazendra.caltech.edu/~postgres/cgi-bin/omit_form.cgi) that would be appreciated.


Feb 25th, 2021

Expr_pattern clean up

  • Seldom populated tags. Can we move the associated info into remarks and get rid of the tag? This is in view of the Alliance import
    • Protein_description 33 objects -> Decision: Move to remarks -> Done DR 2021/02/26. Redundant info such as CPL-1 in Protein_description and CPL-1 in gene name were omitted.
Example: Expr_pattern : "Expr450"
	Gene	 "WBGene00000776"
	Protein_description	 "CPL-1"

	Expr_pattern : "Expr552"
	Gene	 "WBGene00006528"
	Protein_description	 "Tubulin alpha"
    • Sequence 12 objects -> Decision: Move to remarks
Example: Expr_pattern : "Expr12"
		      Gene	 "WBGene00003976"
		      Sequence	 "Z28377|Z28375|Z28376"
    • Laboratory 23 objects -> can infer via publication -> Decision: good to ignore
Example: Expr_pattern : "Expr87"
        …
	Laboratory	 "ML"
	Gene	 "WBGene00003012"


  • Empty tags. Can remove tags from WB Expression model? Yes
    • Cell -> 0 objects
    • Expressed_in -> 0 objects
    • Protein -> 0 objects
    • Pseudogene -> 0 objects


  • Microarray, Tiling Array, RNAseq associations -> Discuss with the Expression WG how to bring in images for these and treat as image objects liked to the high throughput data
    • Microarray, Microarray_results
example: Expr1050000 -Yanai study
Currently accessible via the schema
https://wormbase.org/species/c_briggsae/expr_pattern/Expr1050000#0123--10


    • Tiling_array
Example: Expr1040545 - Miller study
Currently accessible via the schema
https://wormbase.org/species/all/expr_pattern/Expr1040545#0123--10


    • RNASeq
Example: 
	Expr_pattern : "Expr1142792" - Yanai study
	Gene	 "WBGene00007063"
	RNASeq	 "RNASeq_Study.SRP029448"
Currently accessible  via the  schema
https://wormbase.org/species/all/expr_pattern/Expr1142792#0123--10
  • Others
    • Historical_gene -> 51 objects. How are historical gene tags treated for other classes in Alliance? -> keep in Alliance and maintain the same mechanism when curation will be moved over
    • EPIC -> Ad hoc Tag for Murray, no correspondence with method ontology terms -> ok to ignore
    • Species -> what are we doing with non elegans annotations?
    • MovieURL -> 32 - Mohler -> move to movies -> talk again with Raymond

Braun Server Room

  • Manager Dave Mathog retired. Uncertain about its management or fate.