Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
Line 17: Line 17:
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
GoToMeeting link: https://www.gotomeet.me/wormbase1
+
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
  
= 2018 Meetings =
 
  
[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
 
  
[[WormBase-Caltech_Weekly_Calls_February_2018|February]]
+
= 2020 Meetings =
  
[[WormBase-Caltech_Weekly_Calls_March_2018|March]]
+
[[WormBase-Caltech_Weekly_Calls_January_2020|January]]
  
[[WormBase-Caltech_Weekly_Calls_April_2018|April]]
+
[[WormBase-Caltech_Weekly_Calls_February_2020|February]]
  
 +
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
  
== May 3, 2018 ==
+
[[WormBase-Caltech_Weekly_Calls_April_2020|April]]
  
=== SimpleMine output ===
+
[[WormBase-Caltech_Weekly_Calls_May_2020|May]]
* Wen was working on simplification of SimpleMine output
 
* Considering removing general terms in an ontology when more specific terms exist
 
* Concern that we would be removing information; will keep terms
 
  
=== SPELL topics ===
+
[[WormBase-Caltech_Weekly_Calls_June_2020|June]]
* Topics in SPELL need some organization, possibly trimming
 
* We could create a graph (SObA) display of topics (based on GO process)
 
  
=== SPELL problem with WS265 ===
+
[[WormBase-Caltech_Weekly_Calls_July_2020|July]]
* Wen had to debug; SPELL has limit of how many genes can be processed per data set (46,340)
 
* Wen trying to accommodate, deleting some genes from data set that had no expression (kludge)
 
* Wen will write to Matt Hibbs to ask how to deal with
 
* Will Alliance work together on a system to analyze large scale expression data?
 
  
=== Curator candidate ===
+
[[WormBase-Caltech_Weekly_Calls_August_2020|August]]
* Will arrive at 10am
 
* Skype calls with remote curators
 
* Curators will send group Skype handle and requested time to talk
 
  
  
== May 10, 2018 ==
+
== September 3, 2020 ==
  
=== ECO terms for genome editing ===
+
=== WS279 Citace upload ===
* Asking group's feedback on ECO terms for genome editing (Daniela)
+
* September 25
* What would be used for fly-enhancer trap experiment?
+
* Local CIT upload (to Wen) Tuesday September 22 by 10am Pacific
* Genomically encoded GFP, for example
 
* ECO: GFP localization
 
* ECO term, example: Fluorescent protein transcript localization evidence
 
* Single copy transgene? Endogenous locus?
 
* Whether it is CRISPR or not may not be relevant
 
* May request ECO terms that capture distinct types of transgenes evidence
 
* Will use generic term for now
 
* Do other MODs use ECO?
 
* May want to capture endogenous/non-endogenous, multi-copy/single-copy distinctions
 
* Many of these features are captured in the transgene and construct objects already; specific ECO code redundant?
 
  
=== ZFIN SAB ===
+
=== New AFP datatype for curation status form (CSF)? ===
* Significant involvement in Alliance
+
* afp_othergenefunc (to capture gene function other than enzymatic activity)
* Interest in micropublications; will push a pilot
+
* Can wait a bit before adding to CSF; if there will be dedicated curation for that data type we may add later
* June 19, moving from older DB to Postgres
 
* Investigating automation for some curation processes
 
* Students review which papers to include; acquire PDF
 
* Curate paper-by-paper
 
  
=== Alliance ===
+
=== GO annotation for description ===
* Supplement request for year 3, due soon (May 15)
+
* Kimberly will look into 'male tail tip developement' terms for description
* Formal report from NHGRI, 18-month plan looks good
 
* Further future plans (from NHGRI perspective) aren't quite clear
 
* Software infrastructure?
 
* Central- vs. MOD-control of resources questions
 
* Likely will have to write a NIH proposal in Fall or Winter
 
* How much is Alliance going to handle human variants?
 
* NHGRI interested in metabolomics; Alliance plans?
 
  
=== Genome-wide screens ===
+
=== Migrate wobr1 server to AWS ===
* Had help desk question about phenotypic screens in organisms other than worms, flies, yeast, bacteria
+
* SOLR, WOBr, SObA, Enrichment analysis
* There have been human cell line phenotypic screens (e.g. siRNA/shRNA); who curates these, if anyone?
+
* Working with Sibyl on the process
* Also, induced pluripotent stem cell experiments
+
* Will try migrating wobr1 first, as a test case
 +
* May eventually move, for example, Tazendra
 +
** Will there be drawbacks to doing this?
 +
** Can we regularly ssh into Tazendra on AWS without issue? Costs for transferring large files
 +
* Don't yet know the details of the costs, but we can try and keep track
 +
* We should move into WB or Alliance AWS instances (or Stanford)
  
=== Nameserver issues ===
+
=== WormBase talk at Boston Area Worm Meeting ===
* Issue came up about assigning unique WBStrain IDs
+
* https://www.umassmed.edu/ambroslab/meetings/bawm/
* Can use a nightly nameserver dump from Hinxton to populate Postgres/OA
+
* Meeting will be virtual on Zoom
* Will need to clean up existing strains in Postgres
+
* Chris will give a talk September 23rd, 6pm Eastern, 3pm Pacific (probably first speaker but not sure)
* Also, considering unique IDs for genotypes
+
* Send topic requests to Chris for the talk
* Mechanics of naming and managing naming of objects
+
* Current topics:
** Nightly syncing (cronjob) to nameserver
+
** Micropublications
** Ideally, we would have instant updates; Hinxton firewall prevents direct access; Matt working on establishing a separate nameserver location to gain direct access
+
** Author First Pass
** Strain names (at least historically) have been updated quarterly from CGC file
+
** Automated Gene Descriptions
** Curators need mechanism to create and use strain (and variation) names right away
+
** Community Curation
** Current system requires manual denormalization step; has worked so far
+
** WB Query Tools
 +
* Chris will ask organizers if others can join: post on WormBase blog? May draw too large an audience for the zoom channel;
 +
* If the organizers record the talk, we can post it on the blog and WB YouTube channel
  
  
== May 17, 2018 ==
+
== September 10, 2020 ==
  
=== Strains ===
+
=== GO GAF Files ===
* Create a strain OA? Central curation tool for strain data?
+
* WS278 GO GAF is using the new 2.2 file format
* Would need to maintain synchrony with CGC and Hinxton
+
* [https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md Specifications]
* Postgres/Tazendra variation adding CGI: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo
+
* GO will not be emitting GAF 2.2 until the end of the year (will try to get a fixed date)
* Will add similar link for Strains, adding info to obo_name_strain and obo_data_strain as well as a tempfile, which will those objects in postgres when the nightly_geneace.pl updates the OA strain info
+
* Implications for gene descriptions, but what about other tools, applications at WB?
 +
* Some errors in the current WS278 GAF, but will get fixed soon
 +
** Impacts gene descriptions for WS278, so using WS277 for now; will be OK once source GAF is fixed
  
=== SAB 2018 ===
+
===Disease files on FTP===
* Who will present? Present what?
+
*No changes for WS278 FTP files, except an extra custom file that Hinxton is producing mostly for gene descriptions so that we don't get the wrong human diseases via poor orthology calls (see GitHub issue https://github.com/WormBase/website/issues/7839)
* We can generate a central document with stats to give to SAB
+
*Will be some consolidation of disease data possibly for WS279 FTP files, implications for downstream tools--gene descriptions, WOBR, ??
* Ask SAB for opinions and guidance?
 
* Would be good to assess current efforts and priorities, ask if we should stay the course or make modifications to our approach
 
  
 +
=== WormBase talk at Worcester Area Worm Meeting ===
 +
* Chris set to give talk on December 1st, 2020, via Zoom, 4pm (or 4:30pm) EST / 1pm Pacific
 +
* 30 minute slot; complementary to BAWM talk?
 +
* WormBase members can attend
  
== May 24th, 2018 ==
+
=== WormBase talk at Boston Area Worm Meeting ===
 +
* https://www.umassmed.edu/ambroslab/meetings/bawm/
 +
* Chris will give a talk September 23rd (< 2 weeks), 6pm Eastern, 3pm Pacific (probably first speaker but not sure)
 +
* WormBase members can attend; Zoom link will be sent next Monday
  
=== Helpdesk ===
+
=== Worm Anatomy Ontology Fixes ===
*[https://github.com/WormBase/website/issues/6423 Not sure if this falls under content....]
+
* Raymond working on addressing warnings and errors (missing definitions, duplicate labels, duplicate definitions)
* User asked about phenotype information being incorporated into the gene description
+
* Assessing the best way to address them
* Chris directed him to phenotype submission form; he has since submitted the phenotypes
+
* Would be good to be able to automate some edits; options:
* Phenotype data hasn't yet been incorporated into the automated descriptions pipeline
+
** Use OWL API (need someone proficient in coding with OWL API)
* We can direct him to the gene description submission form
+
** Convert to OBO, programmatically edit, convert back to OWL?
* Ranjana will respond to user email
+
** Use Cellfie plugin for Protege?
* Will likely swap out older WB automated description pipeline with newer Alliance automated description pipeline
+
** Should discuss with Nico once we have a sense as to what changes need to be made
  
=== Relations Ontology ===
+
=== Dead Variations in Postgres ===
*Working on new model for importing RO terms into WormBase
+
* Currently pulling in variations/alleles from Postgres to populate "allele" and "genetic perturbation" categories in Textpresso
*Question: Do we need to import all of RO?
+
* Noticed that many transgene names were being included, which can result in false positives for the categories
**For example, RO also has terms from other ontologies:
+
* Looking at Postgres table "obo_data_variation", there are ~100,000 entries, ~40,000 with status "Dead" and ~60,000 with status "Live"
    [Term]
+
* Trying to determine the history of the "Dead" variations, where most of the transgene names are coming from
    id: GO:0003674
+
* May still want to include "Dead" variations/alleles in Textpresso category for historical reasons
    name: molecular_function
+
* We could use pattern matching to filter out transgene names
    is_a: BFO:0000015 ! process
+
* Possible to search C. elegans corpus for all "real" allele public names in the "Dead" set to see if they should still be included in categories
    property_value: IAO:0000589 "molecular process" xsd:string
 
*Question: How much term information do we want and/or need?
 
**For example:
 
    [Typedef]
 
    id: BFO:0000050
 
    name: part of
 
    def: "a core relation that holds between a part and its whole" []
 
    property_value: IAO:0000111 "is part of" xsd:string
 
    property_value: IAO:0000112 "my brain is part of my body (continuant parthood, two material entities)" xsd:string
 
    property_value: IAO:0000112 "my stomach cavity is part of my stomach (continuant parthood, immaterial entity is part of material entity)" xsd:string
 
    property_value: IAO:0000112 "this day is part of this year (occurrent parthood)" xsd:string
 
    property_value: IAO:0000116 "Everything is part of itself. Any part of any part of a thing is itself part of that thing. Two distinct things cannot be part of each other." xsd:string
 
    property_value: IAO:0000116 "Occurrents are not subject to change and so parthood between occurrents holds for all the times that the part exists. Many continuants are subject to change, so parthood between continuants will only hold at
 
    certain times, but this is difficult to specify in OWL. See https://code.google.com/p/obo-relations/wiki/ROAndTime" xsd:string
 
    property_value: IAO:0000116 "Parthood requires the part and the whole to have compatible classes: only an occurrent can be part of an occurrent; only a process can be part of a process; only a continuant can be part of a continuant; only an
 
    independent continuant can be part of an independent continuant; only an immaterial entity can be part of an immaterial entity; only a specifically dependent continuant can be part of a specifically dependent continuant; only a generically
 
    dependent continuant can be part of a generically dependent continuant. (This list is not exhaustive.)\n\nA continuant cannot be part of an occurrent: use 'participates in'. An occurrent cannot be part of a continuant: use 'has participant'. A
 
    material entity cannot be part of an immaterial entity: use 'has location'. A specifically dependent continuant cannot be part of an independent continuant: use 'inheres in'. An independent continuant cannot be part of a specifically dependent
 
    continuant: use 'bearer of'." xsd:string
 
    property_value: IAO:0000118 "part_of" xsd:string
 
    property_value: RO:0001900 RO:0001901
 
    property_value: seeAlso http://ontologydesignpatterns.org/wiki/Community:Parts_and_Collections
 
    property_value: seeAlso http://ontologydesignpatterns.org/wiki/Submissions:PartOf
 
    property_value: seeAlso http://www.obofoundry.org/ro/#OBO_REL:part_of xsd:string
 
    is_transitive: true
 
    is_a: RO:0002131 ! overlaps
 
    inverse_of: BFO:0000051 ! has part
 
  
*For curation, we could import only the BFO and RO ID spaces of RO, but include all of the tag-value pairs (the usage examples might be helpful)
+
=== WS279 Citace Upload ===
*For WB, though, we could injest only the BFO and RO ID spaces of RO, and only include in the model: id, name, def, is_a, domain, range, and inverse_of tags
+
* Local Caltech upload to Spica, Tuesday September 22, 10am Pacific
**We can always link out from WB to pages with more detail on RO terms
 
  
*Will use RO ids in ?GO_annotation model in the Annotation_relation part (model will need an update)
 
**The ?GO_annotation model also refers to relations used in annotation extensions.  Unfortunately, though, not all annotation extension relations are in RO, so we can't yet use RO in this part of the model.
 
**We can either import these other GO relations as a separate class, or import them if/when they get included in RO (there is a PRO/RO meeting scheduled for late October with some preliminary phone conferences prior).
 
**The Alliance Gene Expression group is also dealing with this issue.
 
*Where can we use RO terms in other curation models?
 
* Kimberly will add to the agenda for the next WB site-wide conference call
 
  
=== Methods in Molecular Biology book ===
+
== September 17, 2020 ==
* Eukaryotic Genome Databases, has WormBase chapter
 
* Book arrived at Caltech
 
* Chris will ask publishers about getting PDFs without watermarks
 
  
=== ICBO 2018 meeting ===
+
=== Species errors in CITace ===
* International Conference on Biological Ontology (2018)
+
* Wen is reviewing species coming from CIT and will send individual curators lists of species that need correction
* Raymond considering submitting abstract
 
* Not clear if it needs to be a full paper; can we resubmit Biocuration meeting abstract?
 
* Can the content be published elsewhere once submitted to the meeting?
 
  
 +
=== Webinars ===
 +
* We could perform a general WormBase webinar to cover all of WormBase, maybe one hour long
 +
* Allow 20-30 minutes for discussion? Can allow interruptions but plan for one hour
 +
* Follow up with more specific webinars in following months: JBrowse, WormMine, Micropublication, AFP, Textpresso Central, Ontology browser & Enrichment tools, SPELL, ParaSite BioMart
 +
* Chris and Wen can discuss how to setup
 +
* Should we have people register? Maybe
 +
* AFP probably doesn't need an hour; maybe split an hour across micropublications and AFP?
  
 +
=== Transcription factors and regulatory networks ===
 +
* Had another question about TFs, asking for common TFs for a list of genes
 +
* An issue is that the TF binding data we have is in disparate forms, trying to reconcile
 +
* We have a ?Transcription_factor class; it would be good to update and integrate with other related data types
  
== May 31th, 2018 ==
+
=== Alzheimer's disease portal ===
 +
* Funding has been awarded for Alzheimer's research
 +
* Ranjana: should the disease working group consider more about an Alliance Alzheimer's disease portal?
 +
* Paul S: Alliance SAB tomorrow; we'll see what the SAB says
 +
* Can look at other resources like RGD disease portals and Reactome disease-related pathway models
 +
* Ruth Lovering is doing some work in this regard
  
=== Feedbacks from Front Range Worm Meeting ===
+
=== GO meeting ===
* Is it possible to collect old theses online and load them into Textpresso?
+
* All are welcome to attend
* Shall we suggest authors put "elegans" in titles and abstracts? Min Han said some of his papers do not have this keyword.  
+
* Will discuss GAF format changes, etc.
* Community curation. Erin Osborne Nishimura mentioned camps and courses for undergraduate research. Can they do allele and phenotype curation for WormBase? Who will follow up with her?
+
* Ranjana: should she and Valerio attend to learn about GO changes that affect the gene descriptions pipeline?
* Shall we send the AFP form to some users for feedback?
+
** Kimberly: there could be a particular breakout group that Ranjana and Valerio could attend to discuss
 +
 
 +
=== New GAF 2.2 file ===
 +
* Kimberly has reviewed and sent feedback to Michael P
 +
* Valerio would like to stay in the loop to test the new files
 +
 
 +
=== Data mining tool comparison sheet ===
 +
* https://docs.google.com/spreadsheets/d/1vBTDBOKfXn9GcdpF1bXI62VEJ7hwyz2hOyAZcoV1_ng/edit?usp=sharing
 +
* Needs an update
 +
* Could we make this available to users? A link in the Tools menu?
 +
* Is this useful to users? Would they understand it? Maybe be better as a curator resource
 +
* Could this be micropublished?
 +
** Possible; may want to consider a series of publications with videos of webinars, etc.

Latest revision as of 16:39, 17 September 2020

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings



2020 Meetings

January

February

March

April

May

June

July

August


September 3, 2020

WS279 Citace upload

  • September 25
  • Local CIT upload (to Wen) Tuesday September 22 by 10am Pacific

New AFP datatype for curation status form (CSF)?

  • afp_othergenefunc (to capture gene function other than enzymatic activity)
  • Can wait a bit before adding to CSF; if there will be dedicated curation for that data type we may add later

GO annotation for description

  • Kimberly will look into 'male tail tip developement' terms for description

Migrate wobr1 server to AWS

  • SOLR, WOBr, SObA, Enrichment analysis
  • Working with Sibyl on the process
  • Will try migrating wobr1 first, as a test case
  • May eventually move, for example, Tazendra
    • Will there be drawbacks to doing this?
    • Can we regularly ssh into Tazendra on AWS without issue? Costs for transferring large files
  • Don't yet know the details of the costs, but we can try and keep track
  • We should move into WB or Alliance AWS instances (or Stanford)

WormBase talk at Boston Area Worm Meeting

  • https://www.umassmed.edu/ambroslab/meetings/bawm/
  • Meeting will be virtual on Zoom
  • Chris will give a talk September 23rd, 6pm Eastern, 3pm Pacific (probably first speaker but not sure)
  • Send topic requests to Chris for the talk
  • Current topics:
    • Micropublications
    • Author First Pass
    • Automated Gene Descriptions
    • Community Curation
    • WB Query Tools
  • Chris will ask organizers if others can join: post on WormBase blog? May draw too large an audience for the zoom channel;
  • If the organizers record the talk, we can post it on the blog and WB YouTube channel


September 10, 2020

GO GAF Files

  • WS278 GO GAF is using the new 2.2 file format
  • Specifications
  • GO will not be emitting GAF 2.2 until the end of the year (will try to get a fixed date)
  • Implications for gene descriptions, but what about other tools, applications at WB?
  • Some errors in the current WS278 GAF, but will get fixed soon
    • Impacts gene descriptions for WS278, so using WS277 for now; will be OK once source GAF is fixed

Disease files on FTP

  • No changes for WS278 FTP files, except an extra custom file that Hinxton is producing mostly for gene descriptions so that we don't get the wrong human diseases via poor orthology calls (see GitHub issue https://github.com/WormBase/website/issues/7839)
  • Will be some consolidation of disease data possibly for WS279 FTP files, implications for downstream tools--gene descriptions, WOBR, ??

WormBase talk at Worcester Area Worm Meeting

  • Chris set to give talk on December 1st, 2020, via Zoom, 4pm (or 4:30pm) EST / 1pm Pacific
  • 30 minute slot; complementary to BAWM talk?
  • WormBase members can attend

WormBase talk at Boston Area Worm Meeting

Worm Anatomy Ontology Fixes

  • Raymond working on addressing warnings and errors (missing definitions, duplicate labels, duplicate definitions)
  • Assessing the best way to address them
  • Would be good to be able to automate some edits; options:
    • Use OWL API (need someone proficient in coding with OWL API)
    • Convert to OBO, programmatically edit, convert back to OWL?
    • Use Cellfie plugin for Protege?
    • Should discuss with Nico once we have a sense as to what changes need to be made

Dead Variations in Postgres

  • Currently pulling in variations/alleles from Postgres to populate "allele" and "genetic perturbation" categories in Textpresso
  • Noticed that many transgene names were being included, which can result in false positives for the categories
  • Looking at Postgres table "obo_data_variation", there are ~100,000 entries, ~40,000 with status "Dead" and ~60,000 with status "Live"
  • Trying to determine the history of the "Dead" variations, where most of the transgene names are coming from
  • May still want to include "Dead" variations/alleles in Textpresso category for historical reasons
  • We could use pattern matching to filter out transgene names
  • Possible to search C. elegans corpus for all "real" allele public names in the "Dead" set to see if they should still be included in categories

WS279 Citace Upload

  • Local Caltech upload to Spica, Tuesday September 22, 10am Pacific


September 17, 2020

Species errors in CITace

  • Wen is reviewing species coming from CIT and will send individual curators lists of species that need correction

Webinars

  • We could perform a general WormBase webinar to cover all of WormBase, maybe one hour long
  • Allow 20-30 minutes for discussion? Can allow interruptions but plan for one hour
  • Follow up with more specific webinars in following months: JBrowse, WormMine, Micropublication, AFP, Textpresso Central, Ontology browser & Enrichment tools, SPELL, ParaSite BioMart
  • Chris and Wen can discuss how to setup
  • Should we have people register? Maybe
  • AFP probably doesn't need an hour; maybe split an hour across micropublications and AFP?

Transcription factors and regulatory networks

  • Had another question about TFs, asking for common TFs for a list of genes
  • An issue is that the TF binding data we have is in disparate forms, trying to reconcile
  • We have a ?Transcription_factor class; it would be good to update and integrate with other related data types

Alzheimer's disease portal

  • Funding has been awarded for Alzheimer's research
  • Ranjana: should the disease working group consider more about an Alliance Alzheimer's disease portal?
  • Paul S: Alliance SAB tomorrow; we'll see what the SAB says
  • Can look at other resources like RGD disease portals and Reactome disease-related pathway models
  • Ruth Lovering is doing some work in this regard

GO meeting

  • All are welcome to attend
  • Will discuss GAF format changes, etc.
  • Ranjana: should she and Valerio attend to learn about GO changes that affect the gene descriptions pipeline?
    • Kimberly: there could be a particular breakout group that Ranjana and Valerio could attend to discuss

New GAF 2.2 file

  • Kimberly has reviewed and sent feedback to Michael P
  • Valerio would like to stay in the loop to test the new files

Data mining tool comparison sheet