Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
m
 
(128 intermediate revisions by 6 users not shown)
Line 39: Line 39:
 
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
 
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
  
 +
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
  
== August 1, 2019 ==
+
[[WormBase-Caltech_Weekly_Calls_September_2019|September]]
  
=== Life stage public names missing in WS271 ===
+
[[WormBase-Caltech_Weekly_Calls_October_2019|October]]
* Did we ever get a patch in for this?
 
* WormMine only has WBls IDs (no public name) for almost all life stages
 
* Wen will resend the patch file
 
  
=== 2020 WB NAR paper ===
 
* Who's contributing?
 
** Raymond, Chris, Ranjana, Valerio, Daniela, Kimberly
 
* Topics
 
** Automated descriptions
 
** Ontology tools (SObA)
 
** Community curation
 
** Author first pass
 
  
=== 2019 IWM Workshop videos ===
+
== November 7, 2019 ==
* On YouTube, but not public yet
 
* Chris will make them public and send links to Ranjana for the blog post
 
  
 +
=== WS275 Citace upload ===
 +
* Maybe Nov 22 upload to Hinxton
 +
* CIT curators upload to Spica on Tues Nov 19
  
== August 15, 2019 ==
+
=== ?Genotype class ===
 +
* [https://docs.google.com/document/d/19hP9r6BpPW3FSAeC_67FNyNq58NGp4eaXBT42Ch3gDE/edit?usp=sharing Working data model document]
 +
* Several classes have a "Genotype" tag with text entry
 +
** Strain
 +
** 2_point_data
 +
** Pos_neg_data
 +
** Multi_pt_data
 +
** RNAi
 +
** Phenotype_info
 +
** Mass_spec_experiment (no data as of WS273)
 +
** Condition
 +
* Collecting all genotype text entries yields ~33,000 unique entries, with many different forms:
 +
** Species entries, like "Acrobeloides butschlii wild isolate" or "C. briggsae"
 +
** Strain entries, like "BA17[fem-1(hc-17)]" or "BB21" or "BL1[pK08F4.7::K08F4.7::GFP; rol-6(+)]"
 +
** Anonymous transgenes, like "BEC-1::GFP" or "CAM-1-GFP" or "Ex[Pnpr-9::unc-103(gf)]"
 +
** Complex constructs, like "C56C10.9(gk5253[loxP + Pmyo-2::GFP::unc-54 3' UTR + Prps-27::neoR::unc-54 3' UTR + loxP]) II"
 +
** Text descriptions, like "Control" or "WT" or "Control worms fed on HT115 containing the L4440 vector without insert" or "N.A."
 +
** Bacterial genotypes, like "E. coli [argA, lysA, mcrA, mcrB, IN(rrnD-rrnE)1, lambda-, rcn14::Tn10(DE3 lysogen::lavUV5 promoter -T7 polymerase]"
 +
** Including balancers, like "F26H9.8(ok2510) I/hT2 [bli-4(e937) let-?(q782) qIs48] (I;III)"
 +
** Reference to parent strain, like "Parent strain is AG359"
 +
** Referring to RNAi, like "Pglr-1::wrm-1(RNAi)" or "Phsp-6::gfp; phb-1(RNAi)"
 +
** Referring to apparent null or loss of function alleles, like "Phsp-4::GFP(zcIs4); daf-2(-)" or "ced-10(lf)"
  
=== GO Alliance slim terms ===
+
=== Gene comparison SObA ===
* We need to update our GO slim terms for WB GO ribbons to be in sync with Alliance
+
* http://wobr2.caltech.edu/~azurebrd/cgi-bin/soba_multi.cgi?action=Gene+Pair+to+SObA+Graph
* May need to watch out for terms that don't apply to worms
 
* Raymond gets slim terms into Solr from OBO release file; Sibyl collecting from different source; should make the same (pull from WB FTP site?)
 
  
=== Phenotype ontology patternization ===
 
* Now have 676 terms patternized (27% of 2,506 terms total)
 
* Have reviewed the class hierarchy, collecting list of unexpected class subsumptions
 
* Issues to address collected here: https://docs.google.com/document/d/1IWtQbEQ-elM-U5SQyU4VfIH3vdJp6taVMGViIjGyVks/edit?usp=sharing
 
  
 +
== November 14, 2019 ==
  
== August 22, 2019 ==
+
=== TAGC meeting ===
 +
* The Allied Genetics Conference next April (2020) in/near Washington DC
 +
* Abstract deadline is Dec 5th
 +
* Alliance has a shared booth (3 adjacent booths)
 +
* Micropublications will have a booth (Karen and Daniela will attend)
 +
* Focus will be on highlighting the Alliance
 +
* Workshop at NLM in days following TAGC about curation at scale (Kimberly attending and chairing session)
  
=== Obsolete ontology terms in Postgres ===
+
=== Alliance all hands meeting ===
* There are currently 172 GO annotations in the GO OA referring to obsolete GO terms
+
* Lightning talk topics?
** https://docs.google.com/spreadsheets/d/14iG3-s0GrZ3_W87iOjD6tZiQiUJklRRjgs6ARFWi9E4/edit?usp=sharing
+
** Single cell RNA Seq (Eduardo)
* We would like a mechanism for detecting and alerting curators to obsolete ontology terms in the OA/Postgres
+
** SimpleMine? (Wen)
 +
** SObA? (Raymond); still working on multi-species SObA
 +
** Phenotype community curation?
 +
** Micropublications?
 +
** AFP?
 +
 
 +
=== Alliance general ===
 +
* Alliance needs a curation database
 +
** A curation working group was proposed
 +
** What needs to happen to get this going?
 +
** Would include text mining tools/resources
 +
** Would be good to have something like the curation status form
 +
** MODs likely have their own special requirements, but there should probably be at least a common minimal set of features
 +
** Variant sequence curation could be a good first start (if all MODs handle their own variant sequence curation) as a common data type
 +
* Micropubs pushing data submission forms; might as well house them within the Alliance
 +
* Would be good to have a common (or individually relevant) AFP form(s) for all Alliance members
 +
** Maybe MOD curators can manage configuration files to indicate what is relevant for their species
 +
** First priority is to focus on automatically recognizable entities/features from papers
 +
 
 +
 
 +
== November 21, 2019 ==
 +
 
 +
=== Textpresso: merging main docs and supps? ===
 +
* Currently, Textpresso searches in paper main documents and all individual supplemental documents separately
 +
* This results in possibly getting many results for the same publication, each scored and displayed separately
 +
* Do we want Texptpresso to search on a single, consolidated file containing the main document of a paper AND the supplementals?
 +
* Currently, the scoring algorithm is often scoring supplemental documents higher than main papers, presumably due to a weighting of documents in which there is a higher percentage of sentences with matches to the keyword(s)

Latest revision as of 21:29, 20 November 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August

September

October


November 7, 2019

WS275 Citace upload

  • Maybe Nov 22 upload to Hinxton
  • CIT curators upload to Spica on Tues Nov 19

?Genotype class

  • Working data model document
  • Several classes have a "Genotype" tag with text entry
    • Strain
    • 2_point_data
    • Pos_neg_data
    • Multi_pt_data
    • RNAi
    • Phenotype_info
    • Mass_spec_experiment (no data as of WS273)
    • Condition
  • Collecting all genotype text entries yields ~33,000 unique entries, with many different forms:
    • Species entries, like "Acrobeloides butschlii wild isolate" or "C. briggsae"
    • Strain entries, like "BA17[fem-1(hc-17)]" or "BB21" or "BL1[pK08F4.7::K08F4.7::GFP; rol-6(+)]"
    • Anonymous transgenes, like "BEC-1::GFP" or "CAM-1-GFP" or "Ex[Pnpr-9::unc-103(gf)]"
    • Complex constructs, like "C56C10.9(gk5253[loxP + Pmyo-2::GFP::unc-54 3' UTR + Prps-27::neoR::unc-54 3' UTR + loxP]) II"
    • Text descriptions, like "Control" or "WT" or "Control worms fed on HT115 containing the L4440 vector without insert" or "N.A."
    • Bacterial genotypes, like "E. coli [argA, lysA, mcrA, mcrB, IN(rrnD-rrnE)1, lambda-, rcn14::Tn10(DE3 lysogen::lavUV5 promoter -T7 polymerase]"
    • Including balancers, like "F26H9.8(ok2510) I/hT2 [bli-4(e937) let-?(q782) qIs48] (I;III)"
    • Reference to parent strain, like "Parent strain is AG359"
    • Referring to RNAi, like "Pglr-1::wrm-1(RNAi)" or "Phsp-6::gfp; phb-1(RNAi)"
    • Referring to apparent null or loss of function alleles, like "Phsp-4::GFP(zcIs4); daf-2(-)" or "ced-10(lf)"

Gene comparison SObA


November 14, 2019

TAGC meeting

  • The Allied Genetics Conference next April (2020) in/near Washington DC
  • Abstract deadline is Dec 5th
  • Alliance has a shared booth (3 adjacent booths)
  • Micropublications will have a booth (Karen and Daniela will attend)
  • Focus will be on highlighting the Alliance
  • Workshop at NLM in days following TAGC about curation at scale (Kimberly attending and chairing session)

Alliance all hands meeting

  • Lightning talk topics?
    • Single cell RNA Seq (Eduardo)
    • SimpleMine? (Wen)
    • SObA? (Raymond); still working on multi-species SObA
    • Phenotype community curation?
    • Micropublications?
    • AFP?

Alliance general

  • Alliance needs a curation database
    • A curation working group was proposed
    • What needs to happen to get this going?
    • Would include text mining tools/resources
    • Would be good to have something like the curation status form
    • MODs likely have their own special requirements, but there should probably be at least a common minimal set of features
    • Variant sequence curation could be a good first start (if all MODs handle their own variant sequence curation) as a common data type
  • Micropubs pushing data submission forms; might as well house them within the Alliance
  • Would be good to have a common (or individually relevant) AFP form(s) for all Alliance members
    • Maybe MOD curators can manage configuration files to indicate what is relevant for their species
    • First priority is to focus on automatically recognizable entities/features from papers


November 21, 2019

Textpresso: merging main docs and supps?

  • Currently, Textpresso searches in paper main documents and all individual supplemental documents separately
  • This results in possibly getting many results for the same publication, each scored and displayed separately
  • Do we want Texptpresso to search on a single, consolidated file containing the main document of a paper AND the supplementals?
  • Currently, the scoring algorithm is often scoring supplemental documents higher than main papers, presumably due to a weighting of documents in which there is a higher percentage of sentences with matches to the keyword(s)