WormBase-Caltech Weekly Calls April 2021
From WormBaseWiki
Jump to navigationJump to searchApril 1, 2021
Antibodies
- Alignment of the antibody class to Alliance:
- Propose to move possible_pseudonym (192) and Other_animal (37) to remarks. Those tags are not currently used for curation.
- Other animal is sometimes used for older annotations, e.g. authors say that the antibodies were raised both in rats and rabbits. Standard practice would create 2 records, one for the rat antibody and one for the rabbit.
- Possible pseudonym was used when a curator was not able to unambiguously assign a previous antibody to a record. (we have a Other name -synonym- tag to capture unambiguous ones). When moving to remarks we can keep a controlled vocabulary for easy future parsing, e.g. “possible_pseudonym:”
- Antigen field: currently separated into Protein, peptide, and other_antigen (e.g.: homogenate of early C.elegans embryos, sperm). Propose to use just one antigen field to capture antigen info.
- Propose to move possible_pseudonym (192) and Other_animal (37) to remarks. Those tags are not currently used for curation.
All changes proposed above were approved by the group
textpress-dev clean up
- Michael has asked curators to assess what they have on textpresso-dev as it will not be around forever :-(
- is it okay to transfer data and files we want to keep to tazendra? and then to our own individual machines?
- Direct access may be possible via Caltech VPN
- Do we want to move content to AWS? May be complicated; it is still easy and cheap to maintain local file systems/machines
Braun servers
- 3 servers stored in Braun server room; is there a new contact person for accessing these servers?
- Mike Miranda replacement just getting settled; Paul will find out who is managing the server room and let Raymond know
Citace upload
- Next Friday, April 9th, by end of the day
- Wen will contact Paul Davis for the frozen WS280 models file
April 8, 2021
Braun server outage
- Raymond fixed; now Spica, wobr and wobr2 are back up
Textpresso API
- Was down yesterday affecting WormiCloud; Michael has fixed
- Valerio will learn how to manage the API for the future
Grant opportunities
- Possibilities to apply for supplements
- May 15th deadline
- Druggable genome project
- Pharos: https://pharos.nih.gov/
- could we contribute?
- Visualization, tools, etc.
- Automated person descriptions?
- Automated descriptions for proteins, ion channels, druggable targets, etc.?
New WS280 ONTOLOGY FTP directory
- Changes requested here: https://github.com/WormBase/website/issues/7900
- Here's the FTP URL: ftp://ftp.wormbase.org/pub/wormbase/releases/WS280/ONTOLOGY/
- Known issues (Chris will report):
- Ontology files are provided as ".gaf" in addition to ".obo"; we need to remove the ".gaf" OBO files
- Some files are duplicated and/or have inappropriate file extensions
Odd characters in Postgres
- Daniela and Juancarlos discovered some errors with respect to special characters pasted into the OA
- Daniela would like to automatically pull in micropublication text (e.g. figure captions) into Postgres
- We would need an automated way to convert special characters, like degree symbols ° into html unicode \°\;
- Juancarlos and Valerio will look into possibly switching from a Perl module to a Python module to handle special characters
April 15, 2021
Special characters in Postgres/OA
- Juancarlos working on/proposing a plan to store UTF-8 characters in Postgres and the OA which would then get converted, at dumping, to HTML entities (e.g. α) for the ACE files
- There is still a bit of cleanup needed to fix or remove special characters (not necessarily UTF-8) that apparently got munged upon copy/pasting into the OA in the past
- Note: copy/paste from a PDF often works fine, but sometimes does not work as expected so manual intervention would be needed (e.g. entering Greek characters by hand in UTF-8 format)
- Would copy/pasting from HTML be better than PDF?
- For Person curation it would be good to be able to faithfully store and display appropriate foreign characters (e.g. Chinese characters, Danish characters, etc.)
- Mangolassi script called "get_summary_characters.pl" located here: /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters
- Juancarlos will modify script to take a data type code as an argument on the command line and return all Postgres tables (and their respective PGIDs) that have special characters, e.g.
- $ ./get_summary_characters.pl exp
- $ ./get_summary_characters.pl int
- $ ./get_summary_characters.pl grg
- or could pass just the datatype + field (postgres table). e.g.
- $ ./get_summary_characters.pl pic_description
- Juancarlos will email everyone once it's ready. It's ready, email sent. Script is at /home/postgres/work/pgpopulation/oa_general/20210411_unicode_html/get_summary_characters.pl Symlink this to your directory and run it from there, it will create files in the directory you are at when running it.
- Juancarlos will modify script to take a data type code as an argument on the command line and return all Postgres tables (and their respective PGIDs) that have special characters, e.g.
- Action items:
- Juancarlos will update the "get_summary_characters.pl" script as described above
- Curators should use the "get_summary_characters.pl" to look for (potentially) bad characters in their OAs/Postgres tables
- Need to perform bulk (automated) replacement of existing HTML entities into corresponding UTF-8 characters
- Curators will need to work with Juancarlos for each OA to modify the dumper
- Juancarlos will write (or append to existing) Postgres/OA dumping scripts to:
- 1) Convert UTF-8 characters to HTML entities in ACE files
- 2) Convert special quote and hyphen characters into simple versions that don't need special handling
CeNGEN pictures
- Model change went in to accommodate images from the CeNGEN project
- Want gene page images for CeNGEN data; have the specifications for such images been worked out? Maybe not yet
- Raymond and Daniela will work with data producers to acquire images when ready
Supplement opportunities
- Money available for software development to "harden" existing software
- Might be possible to make Eduardo's single cell analysis tools more sustainable
- Could make WormiCloud adapted to Alliance?
- Put Noctua on more stable production footing? (GO cannot apply as they are in final year of existing grant)
Student project for Textpresso
- Create tool to allow user to submit text and return a list of similar papers
- Use cases:
- curator wants an alert to find papers similar to what they've curated
- look for potential reviewers of a paper based on similar text content
April 22, 2021
LinkML hackathon
- Need to consider who works on what and how to coordinate
- Need to practice good Git practice
- Merge main branch into local branch before merging back into main branch to make sure everything works
- How will we best handle AceDB hash structures? likely use something like Mark QT demonstrated
- Do we have any/many hash-within-hash structures? #Molecular_change is used as a hash and tags within that model all reference the #Evidence hash
- GO annotation extensions offer an interesting challenge
IWM workshop
- Need to submit a workshop schedule (who speaks about what and when) by next Thursday April 29th
- An initial idea was to promote data in ACEDB that may be underutilized or many users may be unaware of
- An example might be transcription factor data: the ?Transcription_factor class and the modENCODE TF data
- Single cell data and tools? CeNGEN, Eduardo's single cell tools
- RNA-Seq FPKM values for genes and related data; Wen will write script to pull out FPKM values from SRA data and send to Magdalena
- In addition to WB data types, we will cover Alliance, AFP, and community curation
- Google doc for workshop here: https://docs.google.com/document/d/1H9ARhBRMKBNuOhjyxVQ_1o6cysvpppI7uA-TJrO_UZ4/edit?usp=sharing
WB Progress Report
- Due April 30th
- There will be two documents: progress and plans
- Place text in the appropriate places (don't write as a single integrated unit)
- Paul S will put together a Google doc
- We CAN include Alliance harmonization efforts
- 2020 Progress report: https://docs.google.com/document/d/1f3ettnkvwoKKiaAA4TSrpSQPEF7FmVVn6u2UdflA_So/edit?usp=sharing
- Last year milestone was WS276; we will compare to WS280
- Google "WormBase Grants" folder: https://drive.google.com/drive/folders/1p8x9tEOfZ4DQvTcPSdNR5-JoPJu--ZAu?usp=sharing
- 2021 Progress Report document here: https://docs.google.com/document/d/13E9k5JvDpUN4kWnrTm4M2iphnAJSTpk02ZiGl8O6bM4/edit?usp=sharing
April 29, 2021
IWM Workshop Schedule
- Schedule format due today (April 29th)
- Tentative schedule here
- Format proposal is 4, 15-minute talks followed by 30 minutes of open discussion / Q&A
- Still need someone to speak (~15 minutes) about the Alliance
WB Progress Report
- 2021 documents in this Google Drive folder
- Note: there is one 2021 "Progress" document and a second (separate) "Future Plans" document
- Existing future plans text has been moved to the "Future Plans" document
OpenBiosystems RNAi clone IDs
- User looking to map Open Biosystems RNAi clone names to WB clone names
- We may need to get a mapping file from Open Biosystems
FPKM data
- Wen has produced a csv file of FPKM values; can generate as part of the SPELL pipeline
- May be better to generate at Hinxton
OA Dumpers
- Daniela and Juancarlos have been working on the Picture OA and Expr OA dumpers
- Inconsistencies have accumulated for all OA dumpers as each has been made separately
- Juancarlos is working on a generalized, modular way to handle dumping
- Should we handle historical genes in the same way across OAs?
- Sure, but we need the "Historical_gene" tag in the respective ACEDB model
- Decision: we will continue to only dump historical genes for specific OAs, with a plan to maybe make consistent across OAs in the future
- Could we retroactively deal with paper-gene connections? We could possibly look in Postgres history tables to see which genes had been replaced previously (by Kimberly)
Gene name ambiguities
- Jae noticed that some gene names associated with multiple WBGene IDs (e.g. one public name is the same as another gene's other name) have the same references attached
- May require updating the paper-gene connections for some of these
- One example is cep-1 gene. It associates with 3 diff WBgeneID and sharing papers in the reference widget.
NIH Supplement for AI readiness
- Could we set up curation for neural circuits using a knowledge graph (e.g. GO-CAM)?
- Maybe we could convert the anatomy function model to LinkML -> OWL statements?
- Maybe setup a graphical curation interface?
- Transcriptional regulation
- Would be good to establish a common model (for the Alliance?)
- CeNGEN project produced lots of predictions of TF binding sites based on single-cell expression data; Eduardo: these models should be able to be regenerated each time new data sets are published, but this requires greater integration in a central, sustainable resource
- Paul S can send a link for the supplement
Variant First Pass Pipeline
- Valerio: Are there any existing pipelines to make allele-paper and/or strain-paper associations?
- Not sure, should ask Karen