WormBase-Caltech Weekly Calls November 2019
From WormBaseWikiJump to navigationJump to search
November 7, 2019
WS275 Citace upload
- Maybe Nov 22 upload to Hinxton
- CIT curators upload to Spica on Tues Nov 19
- Working data model document
- Several classes have a "Genotype" tag with text entry
- Mass_spec_experiment (no data as of WS273)
- Collecting all genotype text entries yields ~33,000 unique entries, with many different forms:
- Species entries, like "Acrobeloides butschlii wild isolate" or "C. briggsae"
- Strain entries, like "BA17[fem-1(hc-17)]" or "BB21" or "BL1[pK08F4.7::K08F4.7::GFP; rol-6(+)]"
- Anonymous transgenes, like "BEC-1::GFP" or "CAM-1-GFP" or "Ex[Pnpr-9::unc-103(gf)]"
- Complex constructs, like "C56C10.9(gk5253[loxP + Pmyo-2::GFP::unc-54 3' UTR + Prps-27::neoR::unc-54 3' UTR + loxP]) II"
- Text descriptions, like "Control" or "WT" or "Control worms fed on HT115 containing the L4440 vector without insert" or "N.A."
- Bacterial genotypes, like "E. coli [argA, lysA, mcrA, mcrB, IN(rrnD-rrnE)1, lambda-, rcn14::Tn10(DE3 lysogen::lavUV5 promoter -T7 polymerase]"
- Including balancers, like "F26H9.8(ok2510) I/hT2 [bli-4(e937) let-?(q782) qIs48] (I;III)"
- Reference to parent strain, like "Parent strain is AG359"
- Referring to RNAi, like "Pglr-1::wrm-1(RNAi)" or "Phsp-6::gfp; phb-1(RNAi)"
- Referring to apparent null or loss of function alleles, like "Phsp-4::GFP(zcIs4); daf-2(-)" or "ced-10(lf)"
Gene comparison SObA
November 14, 2019
- The Allied Genetics Conference next April (2020) in/near Washington DC
- Abstract deadline is Dec 5th
- Alliance has a shared booth (3 adjacent booths)
- Micropublications will have a booth (Karen and Daniela will attend)
- Focus will be on highlighting the Alliance
- Workshop at NLM in days following TAGC about curation at scale (Kimberly attending and chairing session)
Alliance all hands meeting
- Lightning talk topics?
- Single cell RNA Seq (Eduardo)
- SimpleMine? (Wen)
- SObA? (Raymond); still working on multi-species SObA
- Phenotype community curation?
- Alliance needs a curation database
- A curation working group was proposed
- What needs to happen to get this going?
- Would include text mining tools/resources
- Would be good to have something like the curation status form
- MODs likely have their own special requirements, but there should probably be at least a common minimal set of features
- Variant sequence curation could be a good first start (if all MODs handle their own variant sequence curation) as a common data type
- Micropubs pushing data submission forms; might as well house them within the Alliance
- Would be good to have a common (or individually relevant) AFP form(s) for all Alliance members
- Maybe MOD curators can manage configuration files to indicate what is relevant for their species
- First priority is to focus on automatically recognizable entities/features from papers
November 21, 2019
Textpresso: merging main docs and supps?
- Currently, Textpresso searches in paper main documents and all individual supplemental documents separately
- This results in possibly getting many results for the same publication, each scored and displayed separately
- Do we want Texptpresso to search on a single, consolidated file containing the main document of a paper AND the supplementals?
- Currently, the scoring algorithm is often scoring supplemental documents higher than main papers, presumably due to a weighting of documents in which there is a higher percentage of sentences with matches to the keyword(s)
- This cannot be done completely manually; agreed, this would have to be largely (completely?) automated
- Would be good to check how PMC/Europe PMC handles articles in which main docs and supps are consolidated into a single PDF already (in addition to individual files)
- Detecting duplicated sentences would be useful, but may be quite a thorny issue (need to research)
- Chris will update GitHub ticket to ask Sibyl to NOT search on C. elegans supplementals, for now, and only search on main documents
Europe PMC: biocuration landscape analysis
- Dayane Araújo has asked that a curator (Chris currently) attend a conference call (next Monday, Nov 25) hosted by Europe PMC about assessing biocuration across databases
- Chris has asked for details but has so far not received anything specific
- Should we attend? Yes, at least to listen. If complex questions come up, we can just tell them we'll look it up
- Would be great if there were aggregated references for particular datasets so that users of data and analyses could be given all references to properly cite in their own article