Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
(430 intermediate revisions by 12 users not shown)
Line 19: Line 19:
 
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
GoToMeeting link: https://www.gotomeet.me/wormbase1
 
  
  
= 2019 Meetings =
 
  
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
+
= 2020 Meetings =
  
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
+
[[WormBase-Caltech_Weekly_Calls_January_2020|January]]
  
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
+
[[WormBase-Caltech_Weekly_Calls_February_2020|February]]
  
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
+
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
  
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
+
[[WormBase-Caltech_Weekly_Calls_April_2020|April]]
  
 +
[[WormBase-Caltech_Weekly_Calls_May_2020|May]]
  
== June 6, 2019 ==
 
  
=== New SObA graphs ===
+
== June 4, 2020 ==
* May put graphs within existing widgets, but don't need to rush to get that ready for IWM
 
  
=== Phenotype association file format ===
+
=== Citace (tentative) upload ===
* May be best to leave the format as is
+
* CIT curators upload to citace on Tuesday, July 7th, 10am Pacific
* There are problems; paper IDs keep switching columns
+
* Citace upload to Hinxton on Friday, July 10th
* Would need to revisit the reasoning for why we do it that way
 
* When will the Alliance produce a similar/replacement file? Not sure
 
  
=== Phenotype requests ===
+
=== Caltech reopening ===
* Sent out 1140 emails on May 30
+
* Paul looking to get plan approved
* Since have received 374 annotations from 54 papers (42 requested, 12 additional)
+
* People that want to come to campus need to watch training video
* 21 papers flagged as not having phenotypes
+
* Masks available in Paul's lab
* Of 1140 papers emailed about, 35 emails bounced, and have received some flagging or curation on 63 (63/1105 = ~6% response rate), in first week
+
* Can have maximum of 3 people in WormBase rooms at a time; probably best to only allow one person per WB room
 +
** Could possibly have 2 people in big room (Church 64) as long as they stay at least 10 feet apart
 +
* Need to coordinate, maybe make a Google calendar to do so (also Slack)
 +
* Before and after you go to campus, you need to take your temperature and assess your symptoms (if any) and submit info on form
 +
* Also, need to submit who you were in contact with for contact tracing
 +
* Form is used all week, and hold on to it until asked to be submitted
 +
* If someone goes in to the office, they could print several forms for people to pick up in WB offices
  
 +
=== Nameserver ===
 +
* Nameserver was down
 +
* CIT curators would still like to have a single form to interact with
 +
* Is it possible to create objects at Caltech and let a cronjob assign IDs via the nameserver? May not be a good idea
 +
* Still putting genotype and all info for a strain in the reason/why field in the nameserver
 +
* We plan to eventually connect strains to genotypes, but need model changes and curation effort to sort out
 +
* Hinxton is pulling in CGC strains, how often?
 +
* Caltech could possibly get a block of IDs
  
== June 13, 2019 ==
+
=== Alliance SimpleMine ===
 +
* Any updates? 3.1 feature freeze is tomorrow
 +
* Pending on PI decision; Paul S. will bring it up tomorrow on the Alliance PI call
  
=== IWM ===
 
* Coordinating transportation of swag boxes to Pauley Pavilion
 
* Workshop on Saturday June 22, from 1pm to 2:30pm
 
* Saturday morning micropublication breakfast 7:30 - 8:30am
 
* Workshop
 
** Presenters: it may be best to present as use cases rather than a research project
 
** Chris will cover SimpleMine for Wen
 
** Chris: won't do live demo; only screenshots, maybe some video
 
* Paul's lab will do marathon bibs to show lab affiliation and graphical abstract
 
* Paul's talk
 
** Cover Alliance
 
** New features
 
*** SObA (for new data)
 
*** Complete for protein-protein interactions
 
*** RNASeq tools
 
*** Updated automated gene concise descriptions?
 
** Phenotype community curation
 
*** Chris will send Paul numbers on: top community curators, overall stats (number of annotations, papers, curators)
 
** Author First Pass
 
** Micropublication
 
  
=== SGD SAB ===
+
== June 11, 2020 ==
* Paul attended
 
* Alliance publicity was discussed
 
* SAB likes the Alliance orthology features
 
* Working on topics: displaying papers and data
 
* Pathways: discussion about best approach
 
* Metabolic engineering
 
* Meta data about RNASeq data
 
** SPELL tool, basically only tool of its kind available; need new tools
 
* Species-specific proteins: how best to find them? HMMs (Jackhammer)?
 
  
=== Concise descriptions ===
+
=== Name Service ===
* Progress being made within the Alliance to update the automated concise gene descriptions
+
* Testing site now up; linked to Mangolassi
* We will still accept manually written descriptions and display them in parallel with automated descriptions
+
* CGI from Juancarlos not accepting all characters, including double quotes like "
 +
* Example submission that fails via CGI
 +
WBPaper000XXXX; genotype: blah::' " ` / < > [ ] { } ? , . ( ) * ^ & % $ # @ ! \ | &alpha; &beta; Ω ≈ µ ≤ ≥ ÷ æ … ˚ ∆ ∂ ß œ ∑ † ¥ ¨ ü i î ø π “  ‘ « • – ≠ Å ´ ∏ » ± — ‚ °
 +
* Juancarlos will look into and try to fix
  
=== Micropublications ===
+
=== Alliance Literature group ===
* If people are requesting manually written gene descriptions, they could submit a microreview
+
* Textpresso vs. OntoMate vs. PubMed
* Concern was expressed about how to handle a really high throughput of submissions:
+
* Still some confusion about what the different tasks can be performed in each tool
** Daniela: Working towards automating as much of the processing pipeline as possible
+
* Working on collecting different use cases on spreadsheet
** Raymond: The throughput will be handled appropriately depending on demand; priority scheme will help
+
* Sentence-based search is big strength of Textpresso
** Not getting lots of submissions yet, probably won't be inundated in the near future
+
* At latest meeting performed some large searches for OntoMate and Textpresso
** Karen: tools are still being developed; the platform is not being advertised as much as it could be; will ramp up outreach and communication once tools are in place to handle more submissions
+
* Literature acquisition: still needs work
* Karen: Micropublications team will reach out to curators to help build submission forms for respective data types
+
** Using SVM vs. Textpresso search to find relevant papers
 +
** Species based SVM? Currently use string matching to derive different corpora
 +
** Finding genes and determining which species those genes belong to?
  
=== Undiagnosed Disease Network data ===
+
=== Alliance priorities? ===
* Andy Golden will meet with Ranjana and Chris at IWM to discuss
+
* Transcription regulatory networks
* Andy asked about protocol pages at WormBase?
+
* Interactions can focus on network viewer eventually
* Paul: Bioprotocols and Protocols IO
+
** May want different versions/flavors of interaction viewers
* Maybe we could interface with those existing resources to link to relevant protocols from WormBase (and WormBook)
+
** May also want to work closely with GO and GO-CAMs
 +
* Gene descriptions can focus on information poor genes, protein domains, etc.
  
 +
=== Sandbox visual cues ===
 +
* Juancarlos and Daniela will discuss ways to provide visual cues that a curator is on a sandbox form (on Mangolassi) vs live form (on Tazendra)
 +
* AFP and Micropub dev sites have indicators
 +
* Could play with changing the background color? Maybe too hard to look at?
 +
* Change the color of the title of the form, e.g. the OA?
 +
* Will add red text "Development Site" at top of the OA form
  
==June 27th, 2019==
+
=== Evidence Code Ontology ===
===IWM 2019: Feedback from Users===
+
* Kimberly and Juancarlos have worked on a parser
* Anatomy term synonym search
+
* Will load into ACEDB soon
** User pointed out that "RnB" search returns 0 results; GitHub ticket made to index anatomy synonyms
 
* Ciliated neurons
 
** User pointed out that male ciliated neurons are missing as a subclass of term "ciliated neuron"; GitHub ticket made, easy fix
 
* Import of 22G and 26G RNAs
 
** Spoke to Julie Claycomb
 
** These are short RNAs transcribed by RNA-dependent RNA Polymerase (RdRP) off of mRNA molecules
 
** Should these be instantiated as gene objects in WormBase? Julie argues that these are not genes
 
*** Should these just be transcript objects? Would they be linked to a gene? Or maybe also to any transcripts from which they could be derived?
 
** Many map uniquely to the genome, but some map in multiple locations
 
** Associated data for now would likely just be protein-RNA interactions (Argonaute-RNA interactions)
 
*** May eventually include phenotype and/or gene ontology (biological process) annotations
 
* Ranjana & Chris spoke with Andy Golden
 
** Andy and his lab will submit phenotype and disease data as they become available (likely pre-publication); we will likely point to a consortium as source until paper is published
 
** There is still a strong need for cross-species variant mapping/visualization
 
* miRNA binding sites
 
** User asked at workshop and at booth; can we show miRNA binding sites in JBrowse? We would need to collect the data
 
** There are many sources of miRNA-target interactions, some experimental, most predicted
 
*** Chris compiled [https://docs.google.com/spreadsheets/d/19-txXrGi-ROFuByyLnQYvWbKl3o12KZON56qto3MN6g/edit?usp=sharing list of interaction databases] for Alliance interactions working group
 
* Ontology aware diffs of annotations (gene1 expression vs. gene2 expression)
 
* Promoter sequence in experimental constructs
 
* Workshop went well
 
** Next time, maybe have people bring laptops and follow along; be more interactive
 
** We could do webinars, for WormMine for example, allow people to work along with the presentation
 
*** Do other MODs/groups do webinars? How have they been? Useful?
 
** Competing with other workshops during the IWM
 
** Can focus on new, less-used features for webinars, tutorial videos
 
* Hawaiian genome in JBrowse
 
* Had internet stability issues at UCLA; can we get a local, dedicated WiFi?
 
* Next meeting (2021) will likely be in Europe (Cambridge UK?)
 
* User at cGal workshop asked about tissue-specific promoters/transgenes
 
** Have ~30 drivers and ~30 effectors; will WB take them in unpublished? Could make BioRChiv preprint (quick, before peer-review) and/or micropublication (after peer-review)
 
  
=== TAGC meeting ===
 
* Next April (2020)
 
* Alliance representation needed
 
  
===Giving disease model annotations a stable identifier===
+
== June 18, 2020 ==
*Currently disease model annotations get a temporary ID at the time of dump,
 
<pre style="white-space: pre-wrap;
 
white-space: -moz-pre-wrap;
 
white-space: -pre-wrap;
 
white-space: -o-pre-wrap;
 
word-wrap: break-word">
 
Disease_model_annotation : "00000004"
 
Disease_term "DOID:0050833"
 
Disease_of_species "Homo sapiens"
 
Variation "WBVar00275555"
 
Disease_relevant_gene "WBGene00011559"
 
Inferred_gene "WBGene00011559"
 
Association_type "is_implicated_in"
 
Evidence_code "IMP"
 
Genetic_sex "hermaphrodite"
 
Paper_evidence "WBPaper00035924"
 
Database "OMIM" "gene" "613891 "
 
Database "OMIM" "disease" "258900"
 
Curator_confirmed "WBPerson324"
 
Date_last_updated "2017-04-24"
 
</pre>
 
*Would like to institute stable identifiers across releases, so the plan is to call these objects, 'WBDisease_annotation:<number>', so then the above identifier would become 'WBDisease_annotation:00000004', or 'WBDiseaseannot:00000004' or 'WBDiseaseAnnot:00000004'
 
*ID convention--is underscore allowed?
 
*Is 'WBDisease_annotation:00000004' too long for acedb?
 
* Need to ask Kevin, Hinxton; what are the other implications for maintaining and generating persistent, unique IDs
 
  
=== Anatomy ontology issues ===
+
=== Undergrad phenotype submissions ===
* Currently, "intestinal muscle" is considered "part of" intestine
+
* Chris gave presentation to Lina Dahlberg's class about community phenotype curation
** User asked for intestinally expressed genes; using WOBr would also retrieve genes in intestinal muscle
+
* Class took survey about experience with presentation and experience trying to curate worm phenotypes
** David Hall confirmed that instestinal muscle cells are not part of intestine
+
** Survey results: https://www.dropbox.com/s/00cit5aitv8yu27/Dahlberg_class_survey_results.xlsx?dl=0
** Maybe we can change to: "intestinal muscle" part_of "alimentary system"
+
** Some students didn't benefit, but most did; nice feedback!
* Currently, "amphid process" is considered "part of" each type of amphid neuron like AWC, AFD, etc.
+
** Lina intends to publish/micropublish the survey results so please don't share
** Problem is that users looking in WOBr for AWC-expressed genes will be given genes expressed in ANY amphid process regardless of which cell
+
* Since April 24, the class has submitted 171 annotations from 23 papers (some redundant and some still under review)
** Propose to change to: "amphid process" part_of "amphid neuron" only
+
 
 +
=== Special characters in OA/Postgres ===
 +
* There are many special characters in free text entries in the OA; probably all from copy-pasting directly from PDF
 +
* In some cases it seems the special characters cause problems for downstream scripts (e.g. FTP interactions file generator)
 +
* It would probably be good to script the replacement of special characters with their appropriate simple characters or encoded characters
 +
* Juancarlos wrote Perl script on Mangolassi at:
 +
** /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters/get_summary_characters.pl
 +
** Will find bad characters and their pgids for a given Postgres table
 +
** Will find bad data and their pgids for the same table
 +
** People can query their data tables for these characters
 +
* Chris & Wen will work on compiling a list of bad characters that tend to come up
 +
 
 +
=== Citace upload ===
 +
* July 10th citace-to-Hinxton upload
 +
* July 7th citace upload, but Wen will be on vacation so will upload to Wen on Tuesday, June 30th
 +
 
 +
 
 +
== June 25, 2020 ==
 +
 
 +
=== Caltech Summer Student ===
 +
* Paul has new summer student
 +
** Molecular lesion curation, maybe
 +
** Are early stops more or less likely to be null mutations?
 +
** Alleles are flagged as null in WB in the context of phenotypes
 +
** Would be good to query Postgres for null alleles and work from there
 +
* Fernando
 +
** Anatomy function
 +
** GO curation? Curating transcription factors?
 +
*** Checking for consistent curation
 +
 
 +
=== Worm Community Diversity Meeting ===
 +
* Organized by Ahna Skop and Dana Miller
 +
* Invite posted on Facebook "C. elegans Researchers" group
 +
* Two meetings held: one Thursday (June 18th), one Friday (June 19th)
 +
* Chris attended last Friday (June 19th)
 +
* Worm Board looking to take input and ideas from this meeting and incorporate into meetings and events
 +
* One idea was to document and track outreach efforts and what people have learned from them and organize them in a central location, maybe WormBase or Worm Community Forum
 +
* Also, there was a suggestion to have a tool that could inform potential students of worm labs in their respective local area
 +
** Ask Todd; he used to have a map of researchers; Todd had asked Cecilia to curate lab location and institution
 +
* Person and Laboratory addresses in ACEDB have a different format, so looking to reconcile
 +
* Do we know how many labs are still viable? Check for a paper verified in the last 5 years, or requested strains from the CGC recently
 +
** Most labs were real
 +
 
 +
=== C_elegans Slack group ===
 +
* Called "C_elegans"
 +
* Chris made a "WormBase" channel for people to post questions, comments
 +
* Chris will look into inviting everyone and possibly integrating with help@wormbase.org email list
 +
 
 +
=== WormBase Outreach Webinars ===
 +
* While travel is still restricted, we should consider WormBase webinars
 +
* Scott working on a JBrowse webinar
 +
* Could have a different topic each month
 +
* Should collect topics to cover and assign speakers (maybe multiple speakers per topic; keep it lively)
 +
* Should set up a schedule
 +
* How should we advertise? Can post on blog, twitter, etc.
 +
 
 +
=== New transcripts expanding gene range ===
 +
* Will bring up at next week's site-wide call
 +
* Possibly due to incorporation of newer nanopore reads
 +
* Many genes coming in WS277 have expanded well beyond the gene limits as seen in WS276
 +
** Example genes: pes-2.2, pck-2, herc-1, atic-1
 +
* Has several repercussions:
 +
** WormBase does not submit alleles affecting more than one gene; with these gene expansions suddenly alleles once only affecting a single gene are now affecting two genes, and so are now omitted from loading into the Alliance (including any phenotype and/or disease annotations)
 +
** Some expanded genes are now being attributed with thousands of alleles/variants
 +
 
 +
=== Citace upload ===
 +
* Upload files to Spica/Wen by Tuesday (June 30th) 10am
 +
* Wen will clean up folders in Spica (older files from WS277 not cleared out for some reason)
 +
 
 +
==July 9th, 2020==
 +
===Gene names issue in SimpleMine and other mining tools===
 +
*Wen: Last week, Jonathan Ewbank raised the issue of gene names that may refer to multiple objects.
 +
*this can be an issue for multiple data mining tools including WormMine, BioMart, and Gene Set Enrichment.
 +
*Perhaps have a standalone approach to check if any gene name among a list may refer to multiple objects (users check their name lists before submitting them to any data mining tool).
 +
*Jae: The public name issue has heterogeneous natures. That means there may be no single solution to solve all those problems.
 +
*Gene list curation from high-throughput studies, confusing usage of public names probably less than 2% (still cannot be ignored). See examples below--
 +
**single public name is assigned to multiple WBgene ID, Wen has a list of these genes
 +
**overlapped or dicistronic genes, ex. mrpl-44 and F02A9.10
 +
**overlapped or dicistronic, but has a single sequence name, examples:
 +
    exos-4.1 and tin-9.2 (B0564.1)
 +
    eat-18 and lev-10 (Y105E8A.7)
 +
    cha-1 and unc-17 (ZC416.8)
 +
 
 +
**simple confusion from authors, ex. mdh-1 and mdh-2
 +
*One of the most significant problems is a propagation to other DB and papers of  these gene name issues.
 +
*We can make a special note for each gene page, but the people using batch analysis could not catch that easily.
 +
*Conclusion: Jae and Wen will work on a tool that lets Users "sanitize" their gene lists before submission to data mining tools.  They will also write a microPub explaining this issue to the community.
 +
 
 +
===Wormicloud===
 +
*Please test and leave any feedback on the word cloud tool (Wormicloud), https://wormicloud.textpressolab.com/
 +
*Valerio and Jae have worked on a tool that uses data in Textpresso; given a keyword, eg. "transposon", the tool generates a word cloud and word trend.
 +
*Any keyword can generate a graph that plots trends of occurence across the years in publication abstracts.
 +
 
 +
===Noctua 2.0 form ready to use===
 +
*Caltech summer student will try using Noctua initially for dauer (neuronal signaling) pathways
 +
 
 +
===Nightly names service updates to postgres===
 +
*Nightly using Matt's wb-names-export.jar to get full output of genes from datomic/names service, and updating postgres based on that.

Latest revision as of 22:11, 9 July 2020

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings



2020 Meetings

January

February

March

April

May


June 4, 2020

Citace (tentative) upload

  • CIT curators upload to citace on Tuesday, July 7th, 10am Pacific
  • Citace upload to Hinxton on Friday, July 10th

Caltech reopening

  • Paul looking to get plan approved
  • People that want to come to campus need to watch training video
  • Masks available in Paul's lab
  • Can have maximum of 3 people in WormBase rooms at a time; probably best to only allow one person per WB room
    • Could possibly have 2 people in big room (Church 64) as long as they stay at least 10 feet apart
  • Need to coordinate, maybe make a Google calendar to do so (also Slack)
  • Before and after you go to campus, you need to take your temperature and assess your symptoms (if any) and submit info on form
  • Also, need to submit who you were in contact with for contact tracing
  • Form is used all week, and hold on to it until asked to be submitted
  • If someone goes in to the office, they could print several forms for people to pick up in WB offices

Nameserver

  • Nameserver was down
  • CIT curators would still like to have a single form to interact with
  • Is it possible to create objects at Caltech and let a cronjob assign IDs via the nameserver? May not be a good idea
  • Still putting genotype and all info for a strain in the reason/why field in the nameserver
  • We plan to eventually connect strains to genotypes, but need model changes and curation effort to sort out
  • Hinxton is pulling in CGC strains, how often?
  • Caltech could possibly get a block of IDs

Alliance SimpleMine

  • Any updates? 3.1 feature freeze is tomorrow
  • Pending on PI decision; Paul S. will bring it up tomorrow on the Alliance PI call


June 11, 2020

Name Service

  • Testing site now up; linked to Mangolassi
  • CGI from Juancarlos not accepting all characters, including double quotes like "
  • Example submission that fails via CGI
WBPaper000XXXX; genotype: blah::' " ` / < > [ ] { } ? , . ( ) * ^ & % $ # @ ! \ | α β Ω ≈ µ ≤ ≥ ÷ æ … ˚ ∆ ∂ ß œ ∑ † ¥ ¨ ü i î ø π “   ‘ « • – ≠ Å ´ ∏ » ± — ‚ °
  • Juancarlos will look into and try to fix

Alliance Literature group

  • Textpresso vs. OntoMate vs. PubMed
  • Still some confusion about what the different tasks can be performed in each tool
  • Working on collecting different use cases on spreadsheet
  • Sentence-based search is big strength of Textpresso
  • At latest meeting performed some large searches for OntoMate and Textpresso
  • Literature acquisition: still needs work
    • Using SVM vs. Textpresso search to find relevant papers
    • Species based SVM? Currently use string matching to derive different corpora
    • Finding genes and determining which species those genes belong to?

Alliance priorities?

  • Transcription regulatory networks
  • Interactions can focus on network viewer eventually
    • May want different versions/flavors of interaction viewers
    • May also want to work closely with GO and GO-CAMs
  • Gene descriptions can focus on information poor genes, protein domains, etc.

Sandbox visual cues

  • Juancarlos and Daniela will discuss ways to provide visual cues that a curator is on a sandbox form (on Mangolassi) vs live form (on Tazendra)
  • AFP and Micropub dev sites have indicators
  • Could play with changing the background color? Maybe too hard to look at?
  • Change the color of the title of the form, e.g. the OA?
  • Will add red text "Development Site" at top of the OA form

Evidence Code Ontology

  • Kimberly and Juancarlos have worked on a parser
  • Will load into ACEDB soon


June 18, 2020

Undergrad phenotype submissions

  • Chris gave presentation to Lina Dahlberg's class about community phenotype curation
  • Class took survey about experience with presentation and experience trying to curate worm phenotypes
  • Since April 24, the class has submitted 171 annotations from 23 papers (some redundant and some still under review)

Special characters in OA/Postgres

  • There are many special characters in free text entries in the OA; probably all from copy-pasting directly from PDF
  • In some cases it seems the special characters cause problems for downstream scripts (e.g. FTP interactions file generator)
  • It would probably be good to script the replacement of special characters with their appropriate simple characters or encoded characters
  • Juancarlos wrote Perl script on Mangolassi at:
    • /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters/get_summary_characters.pl
    • Will find bad characters and their pgids for a given Postgres table
    • Will find bad data and their pgids for the same table
    • People can query their data tables for these characters
  • Chris & Wen will work on compiling a list of bad characters that tend to come up

Citace upload

  • July 10th citace-to-Hinxton upload
  • July 7th citace upload, but Wen will be on vacation so will upload to Wen on Tuesday, June 30th


June 25, 2020

Caltech Summer Student

  • Paul has new summer student
    • Molecular lesion curation, maybe
    • Are early stops more or less likely to be null mutations?
    • Alleles are flagged as null in WB in the context of phenotypes
    • Would be good to query Postgres for null alleles and work from there
  • Fernando
    • Anatomy function
    • GO curation? Curating transcription factors?
      • Checking for consistent curation

Worm Community Diversity Meeting

  • Organized by Ahna Skop and Dana Miller
  • Invite posted on Facebook "C. elegans Researchers" group
  • Two meetings held: one Thursday (June 18th), one Friday (June 19th)
  • Chris attended last Friday (June 19th)
  • Worm Board looking to take input and ideas from this meeting and incorporate into meetings and events
  • One idea was to document and track outreach efforts and what people have learned from them and organize them in a central location, maybe WormBase or Worm Community Forum
  • Also, there was a suggestion to have a tool that could inform potential students of worm labs in their respective local area
    • Ask Todd; he used to have a map of researchers; Todd had asked Cecilia to curate lab location and institution
  • Person and Laboratory addresses in ACEDB have a different format, so looking to reconcile
  • Do we know how many labs are still viable? Check for a paper verified in the last 5 years, or requested strains from the CGC recently
    • Most labs were real

C_elegans Slack group

  • Called "C_elegans"
  • Chris made a "WormBase" channel for people to post questions, comments
  • Chris will look into inviting everyone and possibly integrating with help@wormbase.org email list

WormBase Outreach Webinars

  • While travel is still restricted, we should consider WormBase webinars
  • Scott working on a JBrowse webinar
  • Could have a different topic each month
  • Should collect topics to cover and assign speakers (maybe multiple speakers per topic; keep it lively)
  • Should set up a schedule
  • How should we advertise? Can post on blog, twitter, etc.

New transcripts expanding gene range

  • Will bring up at next week's site-wide call
  • Possibly due to incorporation of newer nanopore reads
  • Many genes coming in WS277 have expanded well beyond the gene limits as seen in WS276
    • Example genes: pes-2.2, pck-2, herc-1, atic-1
  • Has several repercussions:
    • WormBase does not submit alleles affecting more than one gene; with these gene expansions suddenly alleles once only affecting a single gene are now affecting two genes, and so are now omitted from loading into the Alliance (including any phenotype and/or disease annotations)
    • Some expanded genes are now being attributed with thousands of alleles/variants

Citace upload

  • Upload files to Spica/Wen by Tuesday (June 30th) 10am
  • Wen will clean up folders in Spica (older files from WS277 not cleared out for some reason)

July 9th, 2020

Gene names issue in SimpleMine and other mining tools

  • Wen: Last week, Jonathan Ewbank raised the issue of gene names that may refer to multiple objects.
  • this can be an issue for multiple data mining tools including WormMine, BioMart, and Gene Set Enrichment.
  • Perhaps have a standalone approach to check if any gene name among a list may refer to multiple objects (users check their name lists before submitting them to any data mining tool).
  • Jae: The public name issue has heterogeneous natures. That means there may be no single solution to solve all those problems.
  • Gene list curation from high-throughput studies, confusing usage of public names probably less than 2% (still cannot be ignored). See examples below--
    • single public name is assigned to multiple WBgene ID, Wen has a list of these genes
    • overlapped or dicistronic genes, ex. mrpl-44 and F02A9.10
    • overlapped or dicistronic, but has a single sequence name, examples:
   exos-4.1 and tin-9.2 (B0564.1)
   eat-18 and lev-10 (Y105E8A.7)
   cha-1 and unc-17 (ZC416.8)
    • simple confusion from authors, ex. mdh-1 and mdh-2
  • One of the most significant problems is a propagation to other DB and papers of these gene name issues.
  • We can make a special note for each gene page, but the people using batch analysis could not catch that easily.
  • Conclusion: Jae and Wen will work on a tool that lets Users "sanitize" their gene lists before submission to data mining tools. They will also write a microPub explaining this issue to the community.

Wormicloud

  • Please test and leave any feedback on the word cloud tool (Wormicloud), https://wormicloud.textpressolab.com/
  • Valerio and Jae have worked on a tool that uses data in Textpresso; given a keyword, eg. "transposon", the tool generates a word cloud and word trend.
  • Any keyword can generate a graph that plots trends of occurence across the years in publication abstracts.

Noctua 2.0 form ready to use

  • Caltech summer student will try using Noctua initially for dauer (neuronal signaling) pathways

Nightly names service updates to postgres

  • Nightly using Matt's wb-names-export.jar to get full output of genes from datomic/names service, and updating postgres based on that.