WormBase-Caltech Weekly Calls
From WormBaseWiki
Contents
Previous Years
2020 Meetings
June 4, 2020
Citace (tentative) upload
- CIT curators upload to citace on Tuesday, July 7th, 10am Pacific
- Citace upload to Hinxton on Friday, July 10th
Caltech reopening
- Paul looking to get plan approved
- People that want to come to campus need to watch training video
- Masks available in Paul's lab
- Can have maximum of 3 people in WormBase rooms at a time; probably best to only allow one person per WB room
- Could possibly have 2 people in big room (Church 64) as long as they stay at least 10 feet apart
- Need to coordinate, maybe make a Google calendar to do so (also Slack)
- Before and after you go to campus, you need to take your temperature and assess your symptoms (if any) and submit info on form
- Also, need to submit who you were in contact with for contact tracing
- Form is used all week, and hold on to it until asked to be submitted
- If someone goes in to the office, they could print several forms for people to pick up in WB offices
Nameserver
- Nameserver was down
- CIT curators would still like to have a single form to interact with
- Is it possible to create objects at Caltech and let a cronjob assign IDs via the nameserver? May not be a good idea
- Still putting genotype and all info for a strain in the reason/why field in the nameserver
- We plan to eventually connect strains to genotypes, but need model changes and curation effort to sort out
- Hinxton is pulling in CGC strains, how often?
- Caltech could possibly get a block of IDs
Alliance SimpleMine
- Any updates? 3.1 feature freeze is tomorrow
- Pending on PI decision; Paul S. will bring it up tomorrow on the Alliance PI call
June 11, 2020
Name Service
- Testing site now up; linked to Mangolassi
- CGI from Juancarlos not accepting all characters, including double quotes like "
- Example submission that fails via CGI
WBPaper000XXXX; genotype: blah::' " ` / < > [ ] { } ? , . ( ) * ^ & % $ # @ ! \ | α β Ω ≈ µ ≤ ≥ ÷ æ … ˚ ∆ ∂ ß œ ∑ † ¥ ¨ ü i î ø π “ ‘ « • – ≠ Å ´ ∏ » ± — ‚ °
- Juancarlos will look into and try to fix
Alliance Literature group
- Textpresso vs. OntoMate vs. PubMed
- Still some confusion about what the different tasks can be performed in each tool
- Working on collecting different use cases on spreadsheet
- Sentence-based search is big strength of Textpresso
- At latest meeting performed some large searches for OntoMate and Textpresso
- Literature acquisition: still needs work
- Using SVM vs. Textpresso search to find relevant papers
- Species based SVM? Currently use string matching to derive different corpora
- Finding genes and determining which species those genes belong to?
Alliance priorities?
- Transcription regulatory networks
- Interactions can focus on network viewer eventually
- May want different versions/flavors of interaction viewers
- May also want to work closely with GO and GO-CAMs
- Gene descriptions can focus on information poor genes, protein domains, etc.
Sandbox visual cues
- Juancarlos and Daniela will discuss ways to provide visual cues that a curator is on a sandbox form (on Mangolassi) vs live form (on Tazendra)
- AFP and Micropub dev sites have indicators
- Could play with changing the background color? Maybe too hard to look at?
- Change the color of the title of the form, e.g. the OA?
- Will add red text "Development Site" at top of the OA form
Evidence Code Ontology
- Kimberly and Juancarlos have worked on a parser
- Will load into ACEDB soon
June 18, 2020
Undergrad phenotype submissions
- Chris gave presentation to Lina Dahlberg's class about community phenotype curation
- Class took survey about experience with presentation and experience trying to curate worm phenotypes
- Survey results: https://www.dropbox.com/s/00cit5aitv8yu27/Dahlberg_class_survey_results.xlsx?dl=0
- Some students didn't benefit, but most did; nice feedback!
- Lina intends to publish/micropublish the survey results so please don't share
- Since April 24, the class has submitted 171 annotations from 23 papers (some redundant and some still under review)
Special characters in OA/Postgres
- There are many special characters in free text entries in the OA; probably all from copy-pasting directly from PDF
- In some cases it seems the special characters cause problems for downstream scripts (e.g. FTP interactions file generator)
- It would probably be good to script the replacement of special characters with their appropriate simple characters or encoded characters
- Juancarlos wrote Perl script on Mangolassi at:
- /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters/get_summary_characters.pl
- Will find bad characters and their pgids for a given Postgres table
- Will find bad data and their pgids for the same table
- People can query their data tables for these characters
- Chris & Wen will work on compiling a list of bad characters that tend to come up
Citace upload
- July 10th citace-to-Hinxton upload
- July 7th citace upload, but Wen will be on vacation so will upload to Wen on Tuesday, June 30th
June 25, 2020
Caltech Summer Student
- Paul has new summer student
- Molecular lesion curation, maybe
- Are early stops more or less likely to be null mutations?
- Alleles are flagged as null in WB in the context of phenotypes
- Would be good to query Postgres for null alleles and work from there
- Fernando
- Anatomy function
- GO curation? Curating transcription factors?
- Checking for consistent curation
Worm Community Diversity Meeting
- Organized by Ahna Skop and Dana Miller
- Invite posted on Facebook "C. elegans Researchers" group
- Two meetings held: one Thursday (June 18th), one Friday (June 19th)
- Chris attended last Friday (June 19th)
- Worm Board looking to take input and ideas from this meeting and incorporate into meetings and events
- One idea was to document and track outreach efforts and what people have learned from them and organize them in a central location, maybe WormBase or Worm Community Forum
- Also, there was a suggestion to have a tool that could inform potential students of worm labs in their respective local area
- Ask Todd; he used to have a map of researchers; Todd had asked Cecilia to curate lab location and institution
- Person and Laboratory addresses in ACEDB have a different format, so looking to reconcile
- Do we know how many labs are still viable? Check for a paper verified in the last 5 years, or requested strains from the CGC recently
- Most labs were real
C_elegans Slack group
- Called "C_elegans"
- Chris made a "WormBase" channel for people to post questions, comments
- Chris will look into inviting everyone and possibly integrating with help@wormbase.org email list
WormBase Outreach Webinars
- While travel is still restricted, we should consider WormBase webinars
- Scott working on a JBrowse webinar
- Could have a different topic each month
- Should collect topics to cover and assign speakers (maybe multiple speakers per topic; keep it lively)
- Should set up a schedule
- How should we advertise? Can post on blog, twitter, etc.
New transcripts expanding gene range
- Will bring up at next week's site-wide call
- Possibly due to incorporation of newer nanopore reads
- Many genes coming in WS277 have expanded well beyond the gene limits as seen in WS276
- Example genes: pes-2.2, pck-2, herc-1, atic-1
- Has several repercussions:
- WormBase does not submit alleles affecting more than one gene; with these gene expansions suddenly alleles once only affecting a single gene are now affecting two genes, and so are now omitted from loading into the Alliance (including any phenotype and/or disease annotations)
- Some expanded genes are now being attributed with thousands of alleles/variants
Citace upload
- Upload files to Spica/Wen by Tuesday (June 30th) 10am
- Wen will clean up folders in Spica (older files from WS277 not cleared out for some reason)
July 9th, 2020
Gene names issue in SimpleMine and other mining tools
- Wen: Last week, Jonathan Ewbank raised the issue of gene names that may refer to multiple objects.
- this can be an issue for multiple data mining tools including WormMine, BioMart, and Gene Set Enrichment.
- Perhaps have a standalone approach to check if any gene name among a list may refer to multiple objects (users check their name lists before submitting them to any data mining tool).
- Jae: The public name issue has heterogeneous natures. That means there may be no single solution to solve all those problems.
- Gene list curation from high-throughput studies, confusing usage of public names probably less than 2% (still cannot be ignored). See examples below--
- single public name is assigned to multiple WBgene ID, Wen has a list of these genes
- overlapped or dicistronic genes, ex. mrpl-44 and F02A9.10
- overlapped or dicistronic, but has a single sequence name, examples:
exos-4.1 and tin-9.2 (B0564.1) eat-18 and lev-10 (Y105E8A.7) cha-1 and unc-17 (ZC416.8)
- simple confusion from authors, ex. mdh-1 and mdh-2
- One of the most significant problems is a propagation to other DB and papers of these gene name issues.
- We can make a special note for each gene page, but the people using batch analysis could not catch that easily.
- Conclusion: Jae and Wen will work on a tool that lets Users "sanitize" their gene lists before submission to data mining tools. They will also write a microPub explaining this issue to the community.
Wormicloud
- Please test and leave any feedback on the word cloud tool (Wormicloud), https://wormicloud.textpressolab.com/
- Valerio and Jae have worked on a tool that uses data in Textpresso; given a keyword, eg. "transposon", the tool generates a word cloud and word trend.
- Any keyword can generate a graph that plots trends of occurence across the years in publication abstracts.
Noctua 2.0 form ready to use
- Caltech summer student will try using Noctua initially for dauer (neuronal signaling) pathways
Nightly names service updates to postgres
- Nightly using Matt's wb-names-export.jar to get full output of genes from datomic/names service, and updating postgres based on that.