|
|
Line 45: |
Line 45: |
| [[WormBase-Caltech_Weekly_Calls_October_2019|October]] | | [[WormBase-Caltech_Weekly_Calls_October_2019|October]] |
| | | |
− | | + | [[WormBase-Caltech_Weekly_Calls_November_2019|November]] |
− | == November 7, 2019 ==
| |
− | | |
− | === WS275 Citace upload ===
| |
− | * Maybe Nov 22 upload to Hinxton
| |
− | * CIT curators upload to Spica on Tues Nov 19
| |
− | | |
− | === ?Genotype class ===
| |
− | * [https://docs.google.com/document/d/19hP9r6BpPW3FSAeC_67FNyNq58NGp4eaXBT42Ch3gDE/edit?usp=sharing Working data model document]
| |
− | * Several classes have a "Genotype" tag with text entry
| |
− | ** Strain
| |
− | ** 2_point_data
| |
− | ** Pos_neg_data
| |
− | ** Multi_pt_data
| |
− | ** RNAi
| |
− | ** Phenotype_info
| |
− | ** Mass_spec_experiment (no data as of WS273)
| |
− | ** Condition
| |
− | * Collecting all genotype text entries yields ~33,000 unique entries, with many different forms:
| |
− | ** Species entries, like "Acrobeloides butschlii wild isolate" or "C. briggsae"
| |
− | ** Strain entries, like "BA17[fem-1(hc-17)]" or "BB21" or "BL1[pK08F4.7::K08F4.7::GFP; rol-6(+)]"
| |
− | ** Anonymous transgenes, like "BEC-1::GFP" or "CAM-1-GFP" or "Ex[Pnpr-9::unc-103(gf)]"
| |
− | ** Complex constructs, like "C56C10.9(gk5253[loxP + Pmyo-2::GFP::unc-54 3' UTR + Prps-27::neoR::unc-54 3' UTR + loxP]) II"
| |
− | ** Text descriptions, like "Control" or "WT" or "Control worms fed on HT115 containing the L4440 vector without insert" or "N.A."
| |
− | ** Bacterial genotypes, like "E. coli [argA, lysA, mcrA, mcrB, IN(rrnD-rrnE)1, lambda-, rcn14::Tn10(DE3 lysogen::lavUV5 promoter -T7 polymerase]"
| |
− | ** Including balancers, like "F26H9.8(ok2510) I/hT2 [bli-4(e937) let-?(q782) qIs48] (I;III)"
| |
− | ** Reference to parent strain, like "Parent strain is AG359"
| |
− | ** Referring to RNAi, like "Pglr-1::wrm-1(RNAi)" or "Phsp-6::gfp; phb-1(RNAi)"
| |
− | ** Referring to apparent null or loss of function alleles, like "Phsp-4::GFP(zcIs4); daf-2(-)" or "ced-10(lf)"
| |
− | | |
− | === Gene comparison SObA ===
| |
− | * http://wobr2.caltech.edu/~azurebrd/cgi-bin/soba_multi.cgi?action=Gene+Pair+to+SObA+Graph
| |
− | | |
− | | |
− | == November 14, 2019 ==
| |
− | | |
− | === TAGC meeting ===
| |
− | * The Allied Genetics Conference next April (2020) in/near Washington DC
| |
− | * Abstract deadline is Dec 5th
| |
− | * Alliance has a shared booth (3 adjacent booths)
| |
− | * Micropublications will have a booth (Karen and Daniela will attend)
| |
− | * Focus will be on highlighting the Alliance
| |
− | * Workshop at NLM in days following TAGC about curation at scale (Kimberly attending and chairing session)
| |
− | | |
− | === Alliance all hands meeting ===
| |
− | * Lightning talk topics?
| |
− | ** Single cell RNA Seq (Eduardo)
| |
− | ** SimpleMine? (Wen)
| |
− | ** SObA? (Raymond); still working on multi-species SObA
| |
− | ** Phenotype community curation?
| |
− | ** Micropublications?
| |
− | ** AFP?
| |
− | | |
− | === Alliance general ===
| |
− | * Alliance needs a curation database
| |
− | ** A curation working group was proposed
| |
− | ** What needs to happen to get this going?
| |
− | ** Would include text mining tools/resources
| |
− | ** Would be good to have something like the curation status form
| |
− | ** MODs likely have their own special requirements, but there should probably be at least a common minimal set of features
| |
− | ** Variant sequence curation could be a good first start (if all MODs handle their own variant sequence curation) as a common data type
| |
− | * Micropubs pushing data submission forms; might as well house them within the Alliance
| |
− | * Would be good to have a common (or individually relevant) AFP form(s) for all Alliance members
| |
− | ** Maybe MOD curators can manage configuration files to indicate what is relevant for their species
| |
− | ** First priority is to focus on automatically recognizable entities/features from papers
| |
− | | |
− | | |
− | == November 21, 2019 ==
| |
− | | |
− | === Textpresso: merging main docs and supps? ===
| |
− | * Currently, Textpresso searches in paper main documents and all individual supplemental documents separately
| |
− | * This results in possibly getting many results for the same publication, each scored and displayed separately
| |
− | * Do we want Texptpresso to search on a single, consolidated file containing the main document of a paper AND the supplementals?
| |
− | * Currently, the scoring algorithm is often scoring supplemental documents higher than main papers, presumably due to a weighting of documents in which there is a higher percentage of sentences with matches to the keyword(s)
| |
− | * This cannot be done completely manually; agreed, this would have to be largely (completely?) automated
| |
− | * Would be good to check how PMC/Europe PMC handles articles in which main docs and supps are consolidated into a single PDF already (in addition to individual files)
| |
− | * Detecting duplicated sentences would be useful, but may be quite a thorny issue (need to research)
| |
− | * Chris will update GitHub ticket to ask Sibyl to NOT search on C. elegans supplementals, for now, and only search on main documents
| |
− | | |
− | === Europe PMC: biocuration landscape analysis ===
| |
− | * Dayane Araújo has asked that a curator (Chris currently) attend a conference call (next Monday, Nov 25) hosted by Europe PMC about assessing biocuration across databases
| |
− | * Chris has asked for details but has so far not received anything specific
| |
− | * Should we attend? Yes, at least to listen. If complex questions come up, we can just tell them we'll look it up
| |
− | * Would be great if there were aggregated references for particular datasets so that users of data and analyses could be given all references to properly cite in their own article
| |