|
|
(37 intermediate revisions by 3 users not shown) |
Line 36: |
Line 36: |
| [[WormBase-Caltech_Weekly_Calls_May_2020|May]] | | [[WormBase-Caltech_Weekly_Calls_May_2020|May]] |
| | | |
| + | [[WormBase-Caltech_Weekly_Calls_June_2020|June]] |
| | | |
− | == June 4, 2020 ==
| + | [[WormBase-Caltech_Weekly_Calls_July_2020|July]] |
| | | |
− | === Citace (tentative) upload ===
| |
− | * CIT curators upload to citace on Tuesday, July 7th, 10am Pacific
| |
− | * Citace upload to Hinxton on Friday, July 10th
| |
| | | |
− | === Caltech reopening === | + | ==August 6th, 2020== |
− | * Paul looking to get plan approved
| + | ===Experimental conditions data flow into Alliance=== |
− | * People that want to come to campus need to watch training video
| + | *Experimental conditions in disease annotations: WB has inducers (used to recapitulate the disease condition) and modifiers (a modifier can ameliorate, exacerbate, or have no effect, on the disease condition) |
− | * Masks available in Paul's lab
| + | *We use the WB Molecule CV for Inducers and Modifiers in disease annotation |
− | * Can have maximum of 3 people in WormBase rooms at a time; probably best to only allow one person per WB room
| + | *Experimental conditions in phenotype annotations: are free text (captured in remarks); will probably need to formalize later on |
− | ** Could possibly have 2 people in big room (Church 64) as long as they stay at least 10 feet apart
| + | *So for data flow into Alliance: |
− | * Need to coordinate, maybe make a Google calendar to do so (also Slack)
| + | **In the short term we will load the Molecule CV into the Alliance (Ranjana and Michael P. will work on this) |
− | * Before and after you go to campus, you need to take your temperature and assess your symptoms (if any) and submit info on form
| + | **Groups will switch to using common data model that works for all and common ontology/ontologies in the near future. |
− | * Also, need to submit who you were in contact with for contact tracing
| + | * How do we handle genetic sex? Part of condition? |
− | * Form is used all week, and hold on to it until asked to be submitted
| + | ** Condition has been intended for external/environmental conditions, whereas genetic sex is inherent to the organism of study |
− | * If someone goes in to the office, they could print several forms for people to pick up in WB offices
| + | ** Expression pattern curation needs genetic sex; needs a model at the Alliance for capturing sex |
− | | |
− | === Nameserver ===
| |
− | * Nameserver was down
| |
− | * CIT curators would still like to have a single form to interact with
| |
− | * Is it possible to create objects at Caltech and let a cronjob assign IDs via the nameserver? May not be a good idea
| |
− | * Still putting genotype and all info for a strain in the reason/why field in the nameserver
| |
− | * We plan to eventually connect strains to genotypes, but need model changes and curation effort to sort out
| |
− | * Hinxton is pulling in CGC strains, how often?
| |
− | * Caltech could possibly get a block of IDs
| |
− | | |
− | === Alliance SimpleMine ===
| |
− | * Any updates? 3.1 feature freeze is tomorrow
| |
− | * Pending on PI decision; Paul S. will bring it up tomorrow on the Alliance PI call
| |
− | | |
− | | |
− | == June 11, 2020 ==
| |
− | | |
− | === Name Service === | |
− | * Testing site now up; linked to Mangolassi
| |
− | * CGI from Juancarlos not accepting all characters, including double quotes like "
| |
− | * Example submission that fails via CGI
| |
− | WBPaper000XXXX; genotype: blah::' " ` / < > [ ] { } ? , . ( ) * ^ & % $ # @ ! \ | α β Ω ≈ µ ≤ ≥ ÷ æ … ˚ ∆ ∂ ß œ ∑ † ¥ ¨ ü i î ø π “ ‘ « • – ≠ Å ´ ∏ » ± — ‚ °
| |
− | * Juancarlos will look into and try to fix
| |
− | | |
− | === Alliance Literature group ===
| |
− | * Textpresso vs. OntoMate vs. PubMed | |
− | * Still some confusion about what the different tasks can be performed in each tool
| |
− | * Working on collecting different use cases on spreadsheet
| |
− | * Sentence-based search is big strength of Textpresso
| |
− | * At latest meeting performed some large searches for OntoMate and Textpresso
| |
− | * Literature acquisition: still needs work
| |
− | ** Using SVM vs. Textpresso search to find relevant papers
| |
− | ** Species based SVM? Currently use string matching to derive different corpora
| |
− | ** Finding genes and determining which species those genes belong to?
| |
− | | |
− | === Alliance priorities? ===
| |
− | * Transcription regulatory networks
| |
− | * Interactions can focus on network viewer eventually
| |
− | ** May want different versions/flavors of interaction viewers
| |
− | ** May also want to work closely with GO and GO-CAMs
| |
− | * Gene descriptions can focus on information poor genes, protein domains, etc.
| |
− | | |
− | === Sandbox visual cues ===
| |
− | * Juancarlos and Daniela will discuss ways to provide visual cues that a curator is on a sandbox form (on Mangolassi) vs live form (on Tazendra)
| |
− | * AFP and Micropub dev sites have indicators
| |
− | * Could play with changing the background color? Maybe too hard to look at?
| |
− | * Change the color of the title of the form, e.g. the OA?
| |
− | * Will add red text "Development Site" at top of the OA form | |
− | | |
− | === Evidence Code Ontology ===
| |
− | * Kimberly and Juancarlos have worked on a parser
| |
− | * Will load into ACEDB soon | |
− | | |
− | | |
− | == June 18, 2020 ==
| |
− | | |
− | === Undergrad phenotype submissions ===
| |
− | * Chris gave presentation to Lina Dahlberg's class about community phenotype curation
| |
− | * Class took survey about experience with presentation and experience trying to curate worm phenotypes
| |
− | ** Survey results: https://www.dropbox.com/s/00cit5aitv8yu27/Dahlberg_class_survey_results.xlsx?dl=0
| |
− | ** Some students didn't benefit, but most did; nice feedback!
| |
− | ** Lina intends to publish/micropublish the survey results so please don't share
| |
− | * Since April 24, the class has submitted 171 annotations from 23 papers (some redundant and some still under review)
| |
− | | |
− | === Special characters in OA/Postgres ===
| |
− | * There are many special characters in free text entries in the OA; probably all from copy-pasting directly from PDF
| |
− | * In some cases it seems the special characters cause problems for downstream scripts (e.g. FTP interactions file generator)
| |
− | * It would probably be good to script the replacement of special characters with their appropriate simple characters or encoded characters
| |
− | * Juancarlos wrote Perl script on Mangolassi at:
| |
− | ** /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters/get_summary_characters.pl | |
− | ** Will find bad characters and their pgids for a given Postgres table
| |
− | ** Will find bad data and their pgids for the same table | |
− | ** People can query their data tables for these characters
| |
− | * Chris & Wen will work on compiling a list of bad characters that tend to come up
| |
− | | |
− | === Citace upload ===
| |
− | * July 10th citace-to-Hinxton upload
| |
− | * July 7th citace upload, but Wen will be on vacation so will upload to Wen on Tuesday, June 30th
| |
− | | |
− | | |
− | == June 25, 2020 ==
| |
− | | |
− | === Caltech Summer Student ===
| |
− | * Paul has new summer student
| |
− | ** Molecular lesion curation, maybe
| |
− | ** Are early stops more or less likely to be null mutations?
| |
− | ** Alleles are flagged as null in WB in the context of phenotypes
| |
− | ** Would be good to query Postgres for null alleles and work from there
| |
− | * Fernando
| |
− | ** Anatomy function
| |
− | ** GO curation? Curating transcription factors?
| |
− | *** Checking for consistent curation
| |
− | | |
− | === Worm Community Diversity Meeting ===
| |
− | * Organized by Ahna Skop and Dana Miller
| |
− | * Invite posted on Facebook "C. elegans Researchers" group
| |
− | * Two meetings held: one Thursday (June 18th), one Friday (June 19th)
| |
− | * Chris attended last Friday (June 19th) | |
− | * Worm Board looking to take input and ideas from this meeting and incorporate into meetings and events | |
− | * One idea was to document and track outreach efforts and what people have learned from them and organize them in a central location, maybe WormBase or Worm Community Forum
| |
− | * Also, there was a suggestion to have a tool that could inform potential students of worm labs in their respective local area
| |
− | ** Ask Todd; he used to have a map of researchers; Todd had asked Cecilia to curate lab location and institution
| |
− | * Person and Laboratory addresses in ACEDB have a different format, so looking to reconcile
| |
− | * Do we know how many labs are still viable? Check for a paper verified in the last 5 years, or requested strains from the CGC recently
| |
− | ** Most labs were real
| |
− | | |
− | === C_elegans Slack group ===
| |
− | * Called "C_elegans"
| |
− | * Chris made a "WormBase" channel for people to post questions, comments
| |
− | * Chris will look into inviting everyone and possibly integrating with help@wormbase.org email list
| |
− | | |
− | === WormBase Outreach Webinars ===
| |
− | * While travel is still restricted, we should consider WormBase webinars
| |
− | * Scott working on a JBrowse webinar
| |
− | * Could have a different topic each month
| |
− | * Should collect topics to cover and assign speakers (maybe multiple speakers per topic; keep it lively)
| |
− | * Should set up a schedule
| |
− | * How should we advertise? Can post on blog, twitter, etc. | |
− | | |
− | === New transcripts expanding gene range ===
| |
− | * Will bring up at next week's site-wide call
| |
− | * Possibly due to incorporation of newer nanopore reads
| |
− | * Many genes coming in WS277 have expanded well beyond the gene limits as seen in WS276 | |
− | ** Example genes: pes-2.2, pck-2, herc-1, atic-1 | |
− | * Has several repercussions:
| |
− | ** WormBase does not submit alleles affecting more than one gene; with these gene expansions suddenly alleles once only affecting a single gene are now affecting two genes, and so are now omitted from loading into the Alliance (including any phenotype and/or disease annotations)
| |
− | ** Some expanded genes are now being attributed with thousands of alleles/variants
| |
− | | |
− | === Citace upload ===
| |
− | * Upload files to Spica/Wen by Tuesday (June 30th) 10am
| |
− | * Wen will clean up folders in Spica (older files from WS277 not cleared out for some reason)
| |
− | | |
− | ==July 9th, 2020==
| |
− | ===Gene names issue in SimpleMine and other mining tools===
| |
− | *Wen: Last week, Jonathan Ewbank raised the issue of gene names that may refer to multiple objects.
| |
− | *this can be an issue for multiple data mining tools including WormMine, BioMart, and Gene Set Enrichment.
| |
− | *Perhaps have a standalone approach to check if any gene name among a list may refer to multiple objects (users check their name lists before submitting them to any data mining tool).
| |
− | *Jae: The public name issue has heterogeneous natures. That means there may be no single solution to solve all those problems.
| |
− | *Gene list curation from high-throughput studies, confusing usage of public names probably less than 2% (still cannot be ignored). See examples below--
| |
− | **single public name is assigned to multiple WBgene ID, Wen has a list of these genes
| |
− | **overlapped or dicistronic genes, ex. mrpl-44 and F02A9.10
| |
− | **overlapped or dicistronic, but has a single sequence name, examples:
| |
− | exos-4.1 and tin-9.2 (B0564.1)
| |
− | eat-18 and lev-10 (Y105E8A.7)
| |
− | cha-1 and unc-17 (ZC416.8)
| |
− | | |
− | **simple confusion from authors, ex. mdh-1 and mdh-2
| |
− | *One of the most significant problems is a propagation to other DB and papers of these gene name issues.
| |
− | *We can make a special note for each gene page, but the people using batch analysis could not catch that easily. | |
− | *Conclusion: Jae and Wen will work on a tool that lets Users "sanitize" their gene lists before submission to data mining tools. They will also write a microPub explaining this issue to the community. | |
− | | |
− | ===Wormicloud===
| |
− | *Please test and leave any feedback on the word cloud tool (Wormicloud), https://wormicloud.textpressolab.com/
| |
− | *Valerio and Jae have worked on a tool that uses data in Textpresso; given a keyword, eg. "transposon", the tool generates a word cloud and word trend.
| |
− | *Any keyword can generate a graph that plots trends of occurence across the years in publication abstracts.
| |
− | | |
− | ===Noctua 2.0 form ready to use===
| |
− | *Caltech summer student will try using Noctua initially for dauer (neuronal signaling) pathways
| |
− | | |
− | ===Nightly names service updates to postgres===
| |
− | *Nightly using Matt's wb-names-export.jar to get full output of genes from datomic/names service, and updating postgres based on that.
| |