WormBase-Caltech Weekly Calls November 2019

From WormBaseWiki
Revision as of 16:31, 5 December 2019 by Cgrove (talk | contribs) (Created page with "== November 7, 2019 == === WS275 Citace upload === * Maybe Nov 22 upload to Hinxton * CIT curators upload to Spica on Tues Nov 19 === ?Genotype class === * [https://docs.goo...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

November 7, 2019

WS275 Citace upload

  • Maybe Nov 22 upload to Hinxton
  • CIT curators upload to Spica on Tues Nov 19

?Genotype class

  • Working data model document
  • Several classes have a "Genotype" tag with text entry
    • Strain
    • 2_point_data
    • Pos_neg_data
    • Multi_pt_data
    • RNAi
    • Phenotype_info
    • Mass_spec_experiment (no data as of WS273)
    • Condition
  • Collecting all genotype text entries yields ~33,000 unique entries, with many different forms:
    • Species entries, like "Acrobeloides butschlii wild isolate" or "C. briggsae"
    • Strain entries, like "BA17[fem-1(hc-17)]" or "BB21" or "BL1[pK08F4.7::K08F4.7::GFP; rol-6(+)]"
    • Anonymous transgenes, like "BEC-1::GFP" or "CAM-1-GFP" or "Ex[Pnpr-9::unc-103(gf)]"
    • Complex constructs, like "C56C10.9(gk5253[loxP + Pmyo-2::GFP::unc-54 3' UTR + Prps-27::neoR::unc-54 3' UTR + loxP]) II"
    • Text descriptions, like "Control" or "WT" or "Control worms fed on HT115 containing the L4440 vector without insert" or "N.A."
    • Bacterial genotypes, like "E. coli [argA, lysA, mcrA, mcrB, IN(rrnD-rrnE)1, lambda-, rcn14::Tn10(DE3 lysogen::lavUV5 promoter -T7 polymerase]"
    • Including balancers, like "F26H9.8(ok2510) I/hT2 [bli-4(e937) let-?(q782) qIs48] (I;III)"
    • Reference to parent strain, like "Parent strain is AG359"
    • Referring to RNAi, like "Pglr-1::wrm-1(RNAi)" or "Phsp-6::gfp; phb-1(RNAi)"
    • Referring to apparent null or loss of function alleles, like "Phsp-4::GFP(zcIs4); daf-2(-)" or "ced-10(lf)"

Gene comparison SObA

November 14, 2019

TAGC meeting

  • The Allied Genetics Conference next April (2020) in/near Washington DC
  • Abstract deadline is Dec 5th
  • Alliance has a shared booth (3 adjacent booths)
  • Micropublications will have a booth (Karen and Daniela will attend)
  • Focus will be on highlighting the Alliance
  • Workshop at NLM in days following TAGC about curation at scale (Kimberly attending and chairing session)

Alliance all hands meeting

  • Lightning talk topics?
    • Single cell RNA Seq (Eduardo)
    • SimpleMine? (Wen)
    • SObA? (Raymond); still working on multi-species SObA
    • Phenotype community curation?
    • Micropublications?
    • AFP?

Alliance general

  • Alliance needs a curation database
    • A curation working group was proposed
    • What needs to happen to get this going?
    • Would include text mining tools/resources
    • Would be good to have something like the curation status form
    • MODs likely have their own special requirements, but there should probably be at least a common minimal set of features
    • Variant sequence curation could be a good first start (if all MODs handle their own variant sequence curation) as a common data type
  • Micropubs pushing data submission forms; might as well house them within the Alliance
  • Would be good to have a common (or individually relevant) AFP form(s) for all Alliance members
    • Maybe MOD curators can manage configuration files to indicate what is relevant for their species
    • First priority is to focus on automatically recognizable entities/features from papers

November 21, 2019

Textpresso: merging main docs and supps?

  • Currently, Textpresso searches in paper main documents and all individual supplemental documents separately
  • This results in possibly getting many results for the same publication, each scored and displayed separately
  • Do we want Texptpresso to search on a single, consolidated file containing the main document of a paper AND the supplementals?
  • Currently, the scoring algorithm is often scoring supplemental documents higher than main papers, presumably due to a weighting of documents in which there is a higher percentage of sentences with matches to the keyword(s)
  • This cannot be done completely manually; agreed, this would have to be largely (completely?) automated
  • Would be good to check how PMC/Europe PMC handles articles in which main docs and supps are consolidated into a single PDF already (in addition to individual files)
  • Detecting duplicated sentences would be useful, but may be quite a thorny issue (need to research)
  • Chris will update GitHub ticket to ask Sibyl to NOT search on C. elegans supplementals, for now, and only search on main documents

Europe PMC: biocuration landscape analysis

  • Dayane Araújo has asked that a curator (Chris currently) attend a conference call (next Monday, Nov 25) hosted by Europe PMC about assessing biocuration across databases
  • Chris has asked for details but has so far not received anything specific
  • Should we attend? Yes, at least to listen. If complex questions come up, we can just tell them we'll look it up
  • Would be great if there were aggregated references for particular datasets so that users of data and analyses could be given all references to properly cite in their own article