WormBase-Caltech Weekly Calls November 2019

November 7, 2019

The Allied Genetics Conference next April (2020) in/near Washington DC
Abstract deadline is Dec 5th
Alliance has a shared booth (3 adjacent booths)
Micropublications will have a booth (Karen and Daniela will attend)
Focus will be on highlighting the Alliance
Workshop at NLM in days following TAGC about curation at scale (Kimberly attending and chairing session)

Alliance needs a curation database
- A curation working group was proposed
- What needs to happen to get this going?
- Would include text mining tools/resources
- Would be good to have something like the curation status form
- MODs likely have their own special requirements, but there should probably be at least a common minimal set of features
- Variant sequence curation could be a good first start (if all MODs handle their own variant sequence curation) as a common data type
Micropubs pushing data submission forms; might as well house them within the Alliance
Would be good to have a common (or individually relevant) AFP form(s) for all Alliance members
- Maybe MOD curators can manage configuration files to indicate what is relevant for their species
- First priority is to focus on automatically recognizable entities/features from papers

Currently, Textpresso searches in paper main documents and all individual supplemental documents separately
This results in possibly getting many results for the same publication, each scored and displayed separately
Do we want Texptpresso to search on a single, consolidated file containing the main document of a paper AND the supplementals?
Currently, the scoring algorithm is often scoring supplemental documents higher than main papers, presumably due to a weighting of documents in which there is a higher percentage of sentences with matches to the keyword(s)
This cannot be done completely manually; agreed, this would have to be largely (completely?) automated
Would be good to check how PMC/Europe PMC handles articles in which main docs and supps are consolidated into a single PDF already (in addition to individual files)
Detecting duplicated sentences would be useful, but may be quite a thorny issue (need to research)
Chris will update GitHub ticket to ask Sibyl to NOT search on C. elegans supplementals, for now, and only search on main documents

Dayane Araújo has asked that a curator (Chris currently) attend a conference call (next Monday, Nov 25) hosted by Europe PMC about assessing biocuration across databases
Chris has asked for details but has so far not received anything specific
Should we attend? Yes, at least to listen. If complex questions come up, we can just tell them we'll look it up
Would be great if there were aggregated references for particular datasets so that users of data and analyses could be given all references to properly cite in their own article