WormBase-Caltech Weekly Calls
From WormBaseWikiJump to navigationJump to search
July 9th, 2020
Gene names issue in SimpleMine and other mining tools
- Wen: Last week, Jonathan Ewbank raised the issue of gene names that may refer to multiple objects.
- this can be an issue for multiple data mining tools including WormMine, BioMart, and Gene Set Enrichment.
- Perhaps have a standalone approach to check if any gene name among a list may refer to multiple objects (users check their name lists before submitting them to any data mining tool).
- Jae: The public name issue has heterogeneous natures. That means there may be no single solution to solve all those problems.
- Gene list curation from high-throughput studies, confusing usage of public names probably less than 2% (still cannot be ignored). See examples below--
- single public name is assigned to multiple WBgene ID, Wen has a list of these genes
- overlapped or dicistronic genes, ex. mrpl-44 and F02A9.10
- overlapped or dicistronic, but has a single sequence name, examples:
exos-4.1 and tin-9.2 (B0564.1) eat-18 and lev-10 (Y105E8A.7) cha-1 and unc-17 (ZC416.8)
- simple confusion from authors, ex. mdh-1 and mdh-2
- One of the most significant problems is a propagation to other DB and papers of these gene name issues.
- We can make a special note for each gene page, but the people using batch analysis could not catch that easily.
- Conclusion: Jae and Wen will work on a tool that lets Users "sanitize" their gene lists before submission to data mining tools. They will also write a microPub explaining this issue to the community.
- Please test and leave any feedback on the word cloud tool (Wormicloud), https://wormicloud.textpressolab.com/
- Valerio and Jae have worked on a tool that uses data in Textpresso; given a keyword, eg. "transposon", the tool generates a word cloud and word trend.
- Any keyword can generate a graph that plots trends of occurence across the years in publication abstracts.
Noctua 2.0 form ready to use
- Caltech summer student will try using Noctua initially for dauer (neuronal signaling) pathways
Nightly names service updates to postgres
- Nightly using Matt's wb-names-export.jar to get full output of genes from datomic/names service, and updating postgres based on that.