WBConfCall 2021.03.04-Agenda and Minutes
Jump to navigationJump to search
Revision as of 16:16, 4 March 2021 by Jchan (→Alliance harmonized vs unionized data (might want Paul around))
- Chrome failing to download ftp data #8087
Agenda and Minutes
Alliance harmonized vs unionized data (might want Paul around)
- Background and things that might not be obvious to some:
- Harmonized Alliance model would be persistent and integrated.
- Unionized Alliance model would have MODs move DBs to central architecture, but not integrated with each other.
- WormBase curates to datatypes, most other MODs curate all data per paper.
- WormBase worked off of aCeDB, an object database where modeling is easy for curators to do, and acedb handles XREFs and connecting data automatically.
- Caltech curators were writing .ace text files to load into acedb, and early forms were UIs for templates for .ace text files.
- PostgreSQL was implemented to allow storage, retrieval, and editing, but the purpose was to generate flat .ace file of slices of data.
- Curation forms could straightforwardly capture the slices of data that needed curation, and let acedb handle how the data relates to each other.
- Caltech PostgreSQL works really well for curation and is easy to adapt and expand, but none of the data relates to each other at the database level (no foreign keys, denormalized data)
- .ace dumpers are a translation layer into a source of truth acedb, and if we had a different source of truth we could make translation layers into that, but we couldn't directly serve user webpages from the way the data is modeled in Caltech postgres.
- If unionized at Alliance, Caltech postgres could move to AWS, but it isn't helpful without the acedb/datomic layer. Is that layer on AWS, or can it be on AWS ? (Anyone on the calls know who is pushing for unionized ?)
- If harmonized at Alliance, progress is going slowly. Sierra hoped we could look at database UMLs to harmonize, but Caltech curation DB doesn't have any relationships, it's just flat mapping for storing form data in a straightforward way to generate .ace files. The source of truth and how curators model is acedb.
- Can we make a harmonized model off of what's curated at Caltech ?
- Of the datatypes that are dumped from Caltech, how much of their data is merged with non-Caltech data ?
- How many datatypes are not at Caltech ? (Genes, etc.)
YouTube WB seminars
- Danger of having videos public ?