Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
(423 intermediate revisions by 10 users not shown)
Line 21: Line 21:
 
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
  
 +
= 2021 Meetings =
  
 +
[[WormBase-Caltech_Weekly_Calls_January_2021|January]]
  
= 2020 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_February_2021|February]]
  
[[WormBase-Caltech_Weekly_Calls_January_2020|January]]
+
[[WormBase-Caltech_Weekly_Calls_March_2021|March]]
  
[[WormBase-Caltech_Weekly_Calls_February_2020|February]]
+
[[WormBase-Caltech_Weekly_Calls_April_2021|April]]
  
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
+
[[WormBase-Caltech_Weekly_Calls_May_2021|May]]
  
[[WormBase-Caltech_Weekly_Calls_April_2020|April]]
 
  
[[WormBase-Caltech_Weekly_Calls_May_2020|May]]
+
== June 3, 2021 ==
  
 +
=== Reserving meeting rooms ===
 +
* Raymond encountering challenges with setting up regular meeting room reservations in Chen building
 +
* We've been asked to make reservations one week in advance
 +
* Need to use a room if we reserve it
  
== June 4, 2020 ==
+
=== Summer student(s) ===
 +
* Anatomy function project with Raymond
 +
* Many types of anatomy function data submitted via AFP
  
=== Citace (tentative) upload ===
+
== June 10, 2021 ==
* CIT curators upload to citace on Tuesday, July 7th, 10am Pacific
 
* Citace upload to Hinxton on Friday, July 10th
 
  
=== Caltech reopening ===
+
=== Variation-Gene Associations ===
* Paul looking to get plan approved
+
*Some QC on AFP-extracted data led to the realization that at least some of the 'tm' variations aren't associated with genes on tazendra
* People that want to come to campus need to watch training video
+
*https://github.com/WormBase/author-first-pass/issues/204
* Masks available in Paul's lab
+
*https://github.com/WormBase/website/issues/8262
* Can have maximum of 3 people in WormBase rooms at a time; probably best to only allow one person per WB room
+
*It looks like non-manually asserted variation-gene associations will be generated via the VEP pipeline during the build, so Caltech would need to get this information from each WB release
** Could possibly have 2 people in big room (Church 64) as long as they stay at least 10 feet apart
 
* Need to coordinate, maybe make a Google calendar to do so (also Slack)
 
* Before and after you go to campus, you need to take your temperature and assess your symptoms (if any) and submit info on form
 
* Also, need to submit who you were in contact with for contact tracing
 
* Form is used all week, and hold on to it until asked to be submitted
 
* If someone goes in to the office, they could print several forms for people to pick up in WB offices
 
  
=== Nameserver ===
+
===Variation in name service but not in OA===
* Nameserver was down
+
*Ranjana: I could not find gk315316 in the OA though it exists in the name server. I agree that we probably don’t want to let all the million mutations into the OA since that would slow the drop-downs, but when we need one for curation, what needs to be done?
* CIT curators would still like to have a single form to interact with
+
*Juancarlos: That might be right.  It seems to try to create the variation in the name service, and if it gets a 409 Conflict error, it adds it to the temp variation file, and the obo_ tables in postgres. Since it fails to create in the name service, that's probably okay with Hinxton, and since it gets added to postgres, you should be able to use it in the OA, and since it gets added to the temp variation file, on future updates of the ontology it gets added again. Probably best if someone confirms that's the process (and maybe points us to a wiki ?)
* Is it possible to create objects at Caltech and let a cronjob assign IDs via the nameserver? May not be a good idea
 
* Still putting genotype and all info for a strain in the reason/why field in the nameserver
 
* We plan to eventually connect strains to genotypes, but need model changes and curation effort to sort out
 
* Hinxton is pulling in CGC strains, how often?
 
* Caltech could possibly get a block of IDs
 
  
=== Alliance SimpleMine ===
+
*Solution from Karen and Chris: If the Hinxton name server already has the variation but it isn't in the OA (as expected for Million Mutation Project variants like gk315316), we just need to add it through the old temp variations CGI:
* Any updates? 3.1 feature freeze is tomorrow
 
* Pending on PI decision; Paul S. will bring it up tomorrow on the Alliance PI call
 
  
 +
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo
  
== June 11, 2020 ==
+
making sure to enter the variation with name-space-WBVarID like:
  
=== Name Service ===
+
gk315316 WBVar01148785
* Testing site now up; linked to Mangolassi
 
* CGI from Juancarlos not accepting all characters, including double quotes like "
 
* Example submission that fails via CGI
 
WBPaper000XXXX; genotype: blah::' " ` / < > [ ] { } ? , . ( ) * ^ & % $ # @ ! \ | &alpha; &beta; Ω ≈ µ ≤ ≥ ÷ æ … ˚ ∆ ∂ ß œ ∑ † ¥ ¨ ü i î ø π “  ‘ « • – ≠ Å ´ ∏ » ± — ‚ °
 
* Juancarlos will look into and try to fix
 
  
=== Alliance Literature group ===
+
and then, after refresh, it should be available to the OA. Hinxton never has to get involved in this scenario.
* Textpresso vs. OntoMate vs. PubMed
 
* Still some confusion about what the different tasks can be performed in each tool
 
* Working on collecting different use cases on spreadsheet
 
* Sentence-based search is big strength of Textpresso
 
* At latest meeting performed some large searches for OntoMate and Textpresso
 
* Literature acquisition: still needs work
 
** Using SVM vs. Textpresso search to find relevant papers
 
** Species based SVM? Currently use string matching to derive different corpora
 
** Finding genes and determining which species those genes belong to?
 
  
=== Alliance priorities? ===
+
=== Confirm WS282 Upload Dates ===
* Transcription regulatory networks
+
*July 6th?
* Interactions can focus on network viewer eventually
+
*Data freeze/upload date on the release schedule is July 12th
** May want different versions/flavors of interaction viewers
 
** May also want to work closely with GO and GO-CAMs
 
* Gene descriptions can focus on information poor genes, protein domains, etc.
 
  
=== Sandbox visual cues ===
+
=== CenGen bar plots ===
* Juancarlos and Daniela will discuss ways to provide visual cues that a curator is on a sandbox form (on Mangolassi) vs live form (on Tazendra)
+
*Initially discussed to have the bar plot images going in as image data
* AFP and Micropub dev sites have indicators
+
*CenGen group wants interactive bar plots similar to the modENCODE bar plots currently displayed in the FPKM expression data section on the expression widget. That way users could hover over a bar plot and see the cell type, the expression value (TPM, in our case) and the proportion of cells of each neuron type expressing the gene.
* Could play with changing the background color? Maybe too hard to look at?
+
*They can provide the underlying data and have the WB team generate interactive plots for each gene
* Change the color of the title of the form, e.g. the OA?
+
*Sibyl said that this is feasible and we could: 1. bring the data files in OR 2. call the CenGen API on the fly
* Will add red text "Development Site" at top of the OA form
+
*The first approach may be more work but better in the long run as we store the data
 +
*Will ping Hinxton and see how they can integrate the data
  
=== Evidence Code Ontology ===
+
* Bring in data both as pictures and interactive bar plots
* Kimberly and Juancarlos have worked on a parser
+
* Ping Hinxon on GitHub to move this forward
* Will load into ACEDB soon
 
 
 
 
 
== June 18, 2020 ==
 
 
 
=== Undergrad phenotype submissions ===
 
* Chris gave presentation to Lina Dahlberg's class about community phenotype curation
 
* Class took survey about experience with presentation and experience trying to curate worm phenotypes
 
** Survey results: https://www.dropbox.com/s/00cit5aitv8yu27/Dahlberg_class_survey_results.xlsx?dl=0
 
** Some students didn't benefit, but most did; nice feedback!
 
** Lina intends to publish/micropublish the survey results so please don't share
 
* Since April 24, the class has submitted 171 annotations from 23 papers (some redundant and some still under review)
 
 
 
=== Special characters in OA/Postgres ===
 
* There are many special characters in free text entries in the OA; probably all from copy-pasting directly from PDF
 
* In some cases it seems the special characters cause problems for downstream scripts (e.g. FTP interactions file generator)
 
* It would probably be good to script the replacement of special characters with their appropriate simple characters or encoded characters
 
* Juancarlos wrote Perl script on Mangolassi at:
 
** /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters/get_summary_characters.pl
 
** Will find bad characters and their pgids for a given Postgres table
 
** Will find bad data and their pgids for the same table
 
** People can query their data tables for these characters
 
* Chris & Wen will work on compiling a list of bad characters that tend to come up
 
 
 
=== Citace upload ===
 
* July 10th citace-to-Hinxton upload
 
* July 7th citace upload, but Wen will be on vacation so will upload to Wen on Tuesday, June 30th
 
 
 
 
 
== June 25, 2020 ==
 
 
 
=== Caltech Summer Student ===
 
* Paul has new summer student
 
** Molecular lesion curation, maybe
 
** Are early stops more or less likely to be null mutations?
 
** Alleles are flagged as null in WB in the context of phenotypes
 
** Would be good to query Postgres for null alleles and work from there
 
* Fernando
 
** Anatomy function
 
** GO curation? Curating transcription factors?
 
*** Checking for consistent curation
 
 
 
=== Worm Community Diversity Meeting ===
 
* Organized by Ahna Skop and Dana Miller
 
* Invite posted on Facebook "C. elegans Researchers" group
 
* Two meetings held: one Thursday (June 18th), one Friday (June 19th)
 
* Chris attended last Friday (June 19th)
 
* Worm Board looking to take input and ideas from this meeting and incorporate into meetings and events
 
* One idea was to document and track outreach efforts and what people have learned from them and organize them in a central location, maybe WormBase or Worm Community Forum
 
* Also, there was a suggestion to have a tool that could inform potential students of worm labs in their respective local area
 
** Ask Todd; he used to have a map of researchers; Todd had asked Cecilia to curate lab location and institution
 
* Person and Laboratory addresses in ACEDB have a different format, so looking to reconcile
 
* Do we know how many labs are still viable? Check for a paper verified in the last 5 years, or requested strains from the CGC recently
 
** Most labs were real
 
 
 
=== C_elegans Slack group ===
 
* Called "C_elegans"
 
* Chris made a "WormBase" channel for people to post questions, comments
 
* Chris will look into inviting everyone and possibly integrating with help@wormbase.org email list
 
 
 
=== WormBase Outreach Webinars ===
 
* While travel is still restricted, we should consider WormBase webinars
 
* Scott working on a JBrowse webinar
 
* Could have a different topic each month
 
* Should collect topics to cover and assign speakers (maybe multiple speakers per topic; keep it lively)
 
* Should set up a schedule
 
* How should we advertise? Can post on blog, twitter, etc.
 
 
 
=== New transcripts expanding gene range ===
 
* Will bring up at next week's site-wide call
 
* Possibly due to incorporation of newer nanopore reads
 
* Many genes coming in WS277 have expanded well beyond the gene limits as seen in WS276
 
** Example genes: pes-2.2, pck-2, herc-1, atic-1
 
* Has several repercussions:
 
** WormBase does not submit alleles affecting more than one gene; with these gene expansions suddenly alleles once only affecting a single gene are now affecting two genes, and so are now omitted from loading into the Alliance (including any phenotype and/or disease annotations)
 
** Some expanded genes are now being attributed with thousands of alleles/variants
 
 
 
=== Citace upload ===
 
* Upload files to Spica/Wen by Tuesday (June 30th) 10am
 
* Wen will clean up folders in Spica (older files from WS277 not cleared out for some reason)
 
 
 
==July 9th, 2020==
 
===Gene names issue in SimpleMine and other mining tools===
 
*Wen: Last week, Jonathan Ewbank raised the issue of gene names that may refer to multiple objects.
 
*this can be an issue for multiple data mining tools including WormMine, BioMart, and Gene Set Enrichment.
 
*Perhaps have a standalone approach to check if any gene name among a list may refer to multiple objects (users check their name lists before submitting them to any data mining tool).
 
*Jae: The public name issue has heterogeneous natures. That means there may be no single solution to solve all those problems.
 
*Gene list curation from high-throughput studies, confusing usage of public names probably less than 2% (still cannot be ignored). See examples below--
 
**single public name is assigned to multiple WBgene ID, Wen has a list of these genes
 
**overlapped or dicistronic genes, ex. mrpl-44 and F02A9.10
 
**overlapped or dicistronic, but has a single sequence name, examples:
 
    exos-4.1 and tin-9.2 (B0564.1)
 
    eat-18 and lev-10 (Y105E8A.7)
 
    cha-1 and unc-17 (ZC416.8)
 
 
 
**simple confusion from authors, ex. mdh-1 and mdh-2
 
*One of the most significant problems is a propagation to other DB and papers of  these gene name issues.
 
*We can make a special note for each gene page, but the people using batch analysis could not catch that easily.
 
*Conclusion: Jae and Wen will work on a tool that lets Users "sanitize" their gene lists before submission to data mining tools.  They will also write a microPub explaining this issue to the community.
 
 
 
===Wormicloud===
 
*Valerio and Jae have worked on a tool that uses data in Textpresso; given a keyword, eg. "transposon", the tool generates a word cloud
 
*With a pair of keywords can generate a graph that plots trends of occurence across the years in publication abstracts
 
 
 
===Noctua 2.0 form ready to use===
 
*Caltech summer student will try using Noctua initially for dauer (neuronal signaling) pathways
 
 
 
===Nightly names service updates to postgres===
 
*Nightly using Matt's wb-names-export.jar to get full output of genes from datomic/names service, and updating postgres based on that.
 

Latest revision as of 18:59, 10 June 2021

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

January

February

March

April

May


June 3, 2021

Reserving meeting rooms

  • Raymond encountering challenges with setting up regular meeting room reservations in Chen building
  • We've been asked to make reservations one week in advance
  • Need to use a room if we reserve it

Summer student(s)

  • Anatomy function project with Raymond
  • Many types of anatomy function data submitted via AFP

June 10, 2021

Variation-Gene Associations

Variation in name service but not in OA

  • Ranjana: I could not find gk315316 in the OA though it exists in the name server. I agree that we probably don’t want to let all the million mutations into the OA since that would slow the drop-downs, but when we need one for curation, what needs to be done?
  • Juancarlos: That might be right. It seems to try to create the variation in the name service, and if it gets a 409 Conflict error, it adds it to the temp variation file, and the obo_ tables in postgres. Since it fails to create in the name service, that's probably okay with Hinxton, and since it gets added to postgres, you should be able to use it in the OA, and since it gets added to the temp variation file, on future updates of the ontology it gets added again. Probably best if someone confirms that's the process (and maybe points us to a wiki ?)
  • Solution from Karen and Chris: If the Hinxton name server already has the variation but it isn't in the OA (as expected for Million Mutation Project variants like gk315316), we just need to add it through the old temp variations CGI:

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo

making sure to enter the variation with name-space-WBVarID like:

gk315316 WBVar01148785

and then, after refresh, it should be available to the OA. Hinxton never has to get involved in this scenario.

Confirm WS282 Upload Dates

  • July 6th?
  • Data freeze/upload date on the release schedule is July 12th

CenGen bar plots

  • Initially discussed to have the bar plot images going in as image data
  • CenGen group wants interactive bar plots similar to the modENCODE bar plots currently displayed in the FPKM expression data section on the expression widget. That way users could hover over a bar plot and see the cell type, the expression value (TPM, in our case) and the proportion of cells of each neuron type expressing the gene.
  • They can provide the underlying data and have the WB team generate interactive plots for each gene
  • Sibyl said that this is feasible and we could: 1. bring the data files in OR 2. call the CenGen API on the fly
  • The first approach may be more work but better in the long run as we store the data
  • Will ping Hinxton and see how they can integrate the data
  • Bring in data both as pictures and interactive bar plots
  • Ping Hinxon on GitHub to move this forward