Difference between revisions of "WBConfCall 2021.03.04-Agenda and Minutes"

From WormBaseWiki
Jump to navigationJump to search
 
(20 intermediate revisions by 3 users not shown)
Line 2: Line 2:
  
 
* Chrome failing to download ftp data [https://github.com/WormBase/website/issues/8087 #8087]
 
* Chrome failing to download ftp data [https://github.com/WormBase/website/issues/8087 #8087]
 +
** Todd suggests enabling SSL/ftps and we need to let users know that Chrome won't work for ftp downloads.
  
 
= Agenda and Minutes=
 
= Agenda and Minutes=
 +
 +
== Alliance harmonized vs unionized data (might want Paul around)==
 +
* Background and things that might not be obvious to some:
 +
** Harmonized Alliance model would be persistent and integrated.
 +
** Unionized Alliance model would have MODs move DBs to central architecture, but not integrated with each other.
 +
** WormBase curates to datatypes, most other MODs curate all data per paper.
 +
** WormBase worked off of aCeDB, an object database where modeling is easy for curators to do, and acedb handles XREFs and connecting data automatically.
 +
** Caltech curators were writing .ace text files to load into acedb, and early forms were UIs for templates for .ace text files.
 +
** PostgreSQL was implemented to allow storage, retrieval, and editing, but the purpose was to generate flat .ace file of slices of data.
 +
** Curation forms could straightforwardly capture the slices of data that needed curation, and let acedb handle how the data relates to each other.
 +
** Caltech PostgreSQL works really well for curation and is easy to adapt and expand, but none of the data relates to each other at the database level (no foreign keys, denormalized data)
 +
** .ace dumpers are a translation layer into a source of truth acedb, and if we had a different source of truth we could make translation layers into that, but we couldn't directly serve user webpages from the way the data is modeled in Caltech postgres.
 +
* If unionized at Alliance, Caltech postgres could move to AWS, but it isn't helpful without the acedb/datomic layer.  Is that layer on AWS, or can it be on AWS ?  (Anyone on the calls know who is pushing for unionized ?)
 +
* If harmonized at Alliance, progress is going slowly.  Sierra hoped we could look at database UMLs to harmonize, but Caltech curation DB doesn't have any relationships, it's just flat mapping for storing form data in a straightforward way to generate .ace files.  The source of truth and how curators model is acedb.
 +
* Can we make a harmonized model off of what's curated at Caltech ?
 +
** Of the datatypes that are dumped from Caltech, how much of their data is merged with non-Caltech data ?
 +
** How many datatypes are not at Caltech ? (Genes, etc.)
 +
 +
*''Hinxton have a script which allows for automatic conversion from acedb data model to biolink. That could be used to move all WB data to the persistent store, even for data classes not yet harmonised by the Alliance.''
 +
**''Magdalena suggests this is the most effective path to AWS harmonization''
 +
**''Biggest difference between biolink and acedb is that biolink supports inheritance, and can have a hierarchy of features.''
 +
*''with regard to AWS unionization, "we are in a bit of a mess right now"''
 +
*''AGR biolink hackathon in the near future?''
 +
 +
== YouTube WB seminars ==
 +
* Danger of having videos public ?
 +
** ''Not a danger, just not comfortable with video of self on youtube''
 +
** ''site wide agreement on video and privacy issues''
 +
** ''making slides with audio/subtitles available as an alternative''
 +
** ''WB youtube channel should be curated to remove out of date videos''
 +
 +
==Model change for the Expr_pattern class==
 +
* Proposal to add Person? UNIQUE Person to accommodate personal communications
 +
** ''perhaps the evidence tag is better? Except there isn't one in Expr_pattern :-)''
 +
* Proposal to remove the Author and Date tags. Author and Date are legacy data. We discussed the issue at a Caltech call a couple of weeks ago and agreed it was fine to remove.
 +
 +
== Antibodies MOD IDs ==
 +
* The Alliance Expression WG is working on the Antibody schema. WB does not have Antibody IDs
 +
* Best to generate in WB or have an Alliance minted ID?
 +
** Will generate WBAbIDs
 +
* Some antibodies are shared across alliance species.  MODs will still mint IDs.
 +
* Other MODs generate MOD-based antibody IDs, so they'll probably keep doing that, so we may as well.
 +
* Current names are [WBPaperID]::anti-<protein_name>, do we need a global ID since this is unique ?
 +
* Might be good to have IDs that don't have colons in their name.
 +
* Some names are not consistent with pattern, migth be better to have consistent serial IDs. (e.g. [WBPaper00002175]:cct-1)
 +
* All data that is in the current antibody name is also curated in the object, so nothing is lost if renamed.
 +
* Do IDs need to be transacted on for merges, and splits ?  No.

Latest revision as of 12:22, 5 March 2021

Help Desk

  • Chrome failing to download ftp data #8087
    • Todd suggests enabling SSL/ftps and we need to let users know that Chrome won't work for ftp downloads.

Agenda and Minutes

Alliance harmonized vs unionized data (might want Paul around)

  • Background and things that might not be obvious to some:
    • Harmonized Alliance model would be persistent and integrated.
    • Unionized Alliance model would have MODs move DBs to central architecture, but not integrated with each other.
    • WormBase curates to datatypes, most other MODs curate all data per paper.
    • WormBase worked off of aCeDB, an object database where modeling is easy for curators to do, and acedb handles XREFs and connecting data automatically.
    • Caltech curators were writing .ace text files to load into acedb, and early forms were UIs for templates for .ace text files.
    • PostgreSQL was implemented to allow storage, retrieval, and editing, but the purpose was to generate flat .ace file of slices of data.
    • Curation forms could straightforwardly capture the slices of data that needed curation, and let acedb handle how the data relates to each other.
    • Caltech PostgreSQL works really well for curation and is easy to adapt and expand, but none of the data relates to each other at the database level (no foreign keys, denormalized data)
    • .ace dumpers are a translation layer into a source of truth acedb, and if we had a different source of truth we could make translation layers into that, but we couldn't directly serve user webpages from the way the data is modeled in Caltech postgres.
  • If unionized at Alliance, Caltech postgres could move to AWS, but it isn't helpful without the acedb/datomic layer. Is that layer on AWS, or can it be on AWS ? (Anyone on the calls know who is pushing for unionized ?)
  • If harmonized at Alliance, progress is going slowly. Sierra hoped we could look at database UMLs to harmonize, but Caltech curation DB doesn't have any relationships, it's just flat mapping for storing form data in a straightforward way to generate .ace files. The source of truth and how curators model is acedb.
  • Can we make a harmonized model off of what's curated at Caltech ?
    • Of the datatypes that are dumped from Caltech, how much of their data is merged with non-Caltech data ?
    • How many datatypes are not at Caltech ? (Genes, etc.)
  • Hinxton have a script which allows for automatic conversion from acedb data model to biolink. That could be used to move all WB data to the persistent store, even for data classes not yet harmonised by the Alliance.
    • Magdalena suggests this is the most effective path to AWS harmonization
    • Biggest difference between biolink and acedb is that biolink supports inheritance, and can have a hierarchy of features.
  • with regard to AWS unionization, "we are in a bit of a mess right now"
  • AGR biolink hackathon in the near future?

YouTube WB seminars

  • Danger of having videos public ?
    • Not a danger, just not comfortable with video of self on youtube
    • site wide agreement on video and privacy issues
    • making slides with audio/subtitles available as an alternative
    • WB youtube channel should be curated to remove out of date videos

Model change for the Expr_pattern class

  • Proposal to add Person? UNIQUE Person to accommodate personal communications
    • perhaps the evidence tag is better? Except there isn't one in Expr_pattern :-)
  • Proposal to remove the Author and Date tags. Author and Date are legacy data. We discussed the issue at a Caltech call a couple of weeks ago and agreed it was fine to remove.

Antibodies MOD IDs

  • The Alliance Expression WG is working on the Antibody schema. WB does not have Antibody IDs
  • Best to generate in WB or have an Alliance minted ID?
    • Will generate WBAbIDs
  • Some antibodies are shared across alliance species. MODs will still mint IDs.
  • Other MODs generate MOD-based antibody IDs, so they'll probably keep doing that, so we may as well.
  • Current names are [WBPaperID]::anti-<protein_name>, do we need a global ID since this is unique ?
  • Might be good to have IDs that don't have colons in their name.
  • Some names are not consistent with pattern, migth be better to have consistent serial IDs. (e.g. [WBPaper00002175]:cct-1)
  • All data that is in the current antibody name is also curated in the object, so nothing is lost if renamed.
  • Do IDs need to be transacted on for merges, and splits ? No.