Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
m (Replaced content with "= Previous Years = 2009 Meetings 2011 Meetings WormBase-Caltech_Weekly_Calls_2012|2012 M...")
Tag: Replaced
Line 31: Line 31:
== April 1, 2021 ==
=== Antibodies ===
* Alignment of the antibody class to Alliance:
** Propose to move possible_pseudonym (192) and Other_animal (37) to remarks. Those tags are not currently used for curation.
*** Other animal is sometimes used for older annotations, e.g. authors say that the antibodies were raised both  in rats and rabbits. Standard practice would create 2 records, one for the rat antibody and one for the rabbit.
*** Possible pseudonym was used when  a curator was not able to unambiguously assign a previous antibody to a record. (we have a Other name -synonym- tag to capture unambiguous ones). When moving to remarks we can keep a controlled vocabulary for easy future parsing, e.g. “possible_pseudonym:”
** Antigen field: currently separated into Protein, peptide, and other_antigen (e.g.: homogenate of early C.elegans embryos, sperm). Propose to use just one antigen field to capture antigen info.
All changes proposed above were approved by the group
=== textpress-dev clean up ===
* Michael has asked curators to assess what they have on textpresso-dev as it will not be around forever :-(
* is it okay to transfer data and files we want to keep to tazendra? and then to our own individual machines?
* Direct access may be possible via Caltech VPN
* Do we want to move content to AWS? May be complicated; it is still easy and cheap to maintain local file systems/machines
=== Braun servers ===
* 3 servers stored in Braun server room; is there a new contact person for accessing these servers?
* Mike Miranda replacement just getting settled; Paul will find out who is managing the server room and let Raymond know
=== Citace upload ===
* Next Friday, April 9th, by end of the day
* Wen will contact Paul Davis for the frozen WS280 models file
== April 8, 2021 ==
=== Braun server outage ===
* Raymond fixed; now Spica, wobr and wobr2 are back up
=== Textpresso API ===
* Was down yesterday affecting WormiCloud; Michael has fixed
* Valerio will learn how to manage the API for the future
=== Grant opportunities ===
* Possibilities to apply for supplements
* May 15th deadline
* Druggable genome project
** Pharos: https://pharos.nih.gov/
** could we contribute?
* Visualization, tools, etc.
* Automated person descriptions?
* Automated descriptions for proteins, ion channels, druggable targets, etc.?
=== New WS280 ONTOLOGY FTP directory ===
* Changes requested here: https://github.com/WormBase/website/issues/7900
* Here's the FTP URL: ftp://ftp.wormbase.org/pub/wormbase/releases/WS280/ONTOLOGY/
* Known issues (Chris will report):
** Ontology files are provided as ".gaf" in addition to ".obo"; we need to remove the ".gaf" OBO files
** Some files are duplicated and/or have inappropriate file extensions
=== Odd characters in Postgres ===
* Daniela and Juancarlos discovered some errors with respect to special characters pasted into the OA
* Daniela would like to automatically pull in micropublication text (e.g. figure captions) into Postgres
* We would need an automated way to convert special characters, like degree symbols ° into html unicode \&deg\;
* Juancarlos and Valerio will look into possibly switching from a Perl module to a Python module to handle special characters
== April 15, 2021 ==
=== Special characters in Postgres/OA ===
* Juancarlos working on/proposing a plan to store UTF-8 characters in Postgres and the OA which would then get converted, at dumping, to HTML entities (e.g. α) for the ACE files
* There is still a bit of cleanup needed to fix or remove special characters (not necessarily UTF-8) that apparently got munged upon copy/pasting into the OA in the past
* Note: copy/paste from a PDF often works fine, but sometimes does not work as expected so manual intervention would be needed (e.g. entering Greek characters by hand in UTF-8 format)
* Would copy/pasting from HTML be better than PDF?
* For Person curation it would be good to be able to faithfully store and display appropriate foreign characters (e.g. Chinese characters, Danish characters, etc.)
* Mangolassi script called "get_summary_characters.pl" located here: /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters
** Juancarlos will modify script to take a data type code as an argument on the command line and return all Postgres tables (and their respective PGIDs) that have special characters, e.g.
*** $ ./get_summary_characters.pl exp
*** $ ./get_summary_characters.pl int
*** $ ./get_summary_characters.pl grg
** or could pass just the datatype + field (postgres table). e.g.
*** $ ./get_summary_characters.pl pic_description
** Juancarlos will email everyone once it's ready.  It's ready, email sent.  Script is at /home/postgres/work/pgpopulation/oa_general/20210411_unicode_html/get_summary_characters.pl  Symlink this to your directory and run it from there, it will create files in the directory you are at when running it.
* Action items:
** Juancarlos will update the "get_summary_characters.pl" script as described above
** Curators should use the "get_summary_characters.pl" to look for (potentially) bad characters in their OAs/Postgres tables
** Need to perform bulk (automated) replacement of existing HTML entities into corresponding UTF-8 characters
** Curators will need to work with Juancarlos for each OA to modify the dumper
** Juancarlos will write (or append to existing) Postgres/OA dumping scripts to:
*** 1) Convert UTF-8 characters to HTML entities in ACE files
*** 2) Convert special quote and hyphen characters into simple versions that don't need special handling
=== CeNGEN pictures ===
* Model change went in to accommodate images from the CeNGEN project
* Want gene page images for CeNGEN data; have the specifications for such images been worked out? Maybe not yet
* Raymond and Daniela will work with data producers to acquire images when ready
=== Supplement opportunities ===
* Money available for software development to "harden" existing software
* Might be possible to make Eduardo's single cell analysis tools more sustainable
* Could make WormiCloud adapted to Alliance?
* Put Noctua on more stable production footing? (GO cannot apply as they are in final year of existing grant)
=== Student project for Textpresso ===
* Create tool to allow user to submit text and return a list of similar papers
* Use cases:
** curator wants an alert to find papers similar to what they've curated
** look for potential reviewers of a paper based on similar text content
== April 22, 2021 ==
=== LinkML hackathon ===
* Need to consider who works on what and how to coordinate
* Need to practice good Git practice
** Merge main branch into local branch before merging back into main branch to make sure everything works
* How will we best handle AceDB hash structures? likely use something like Mark QT demonstrated
** Do we have any/many hash-within-hash structures? #Molecular_change is used as a hash and tags within that model all reference the #Evidence hash
* GO annotation extensions offer an interesting challenge
=== IWM workshop ===
* Need to submit a workshop schedule (who speaks about what and when) by next Thursday April 29th
* An initial idea was to promote data in ACEDB that may be underutilized or many users may be unaware of
** An example might be transcription factor data: the ?Transcription_factor class and the modENCODE TF data
** Single cell data and tools? CeNGEN, Eduardo's single cell tools
** RNA-Seq FPKM values for genes and related data; Wen will write script to pull out FPKM values from SRA data and send to Magdalena
* In addition to WB data types, we will cover Alliance, AFP, and community curation
* Google doc for workshop here: https://docs.google.com/document/d/1H9ARhBRMKBNuOhjyxVQ_1o6cysvpppI7uA-TJrO_UZ4/edit?usp=sharing
=== WB Progress Report ===
* Due April 30th
* There will be two documents: progress and plans
* Place text in the appropriate places (don't write as a single integrated unit)
* Paul S will put together a Google doc
* We CAN include Alliance harmonization efforts
* 2020 Progress report: https://docs.google.com/document/d/1f3ettnkvwoKKiaAA4TSrpSQPEF7FmVVn6u2UdflA_So/edit?usp=sharing
* Last year milestone was WS276; we will compare to WS280
* Google "WormBase Grants" folder: https://drive.google.com/drive/folders/1p8x9tEOfZ4DQvTcPSdNR5-JoPJu--ZAu?usp=sharing
* 2021 Progress Report document here: https://docs.google.com/document/d/13E9k5JvDpUN4kWnrTm4M2iphnAJSTpk02ZiGl8O6bM4/edit?usp=sharing
== April 29, 2021 ==
=== IWM Workshop Schedule ===
* Schedule format due today (April 29th)
* [https://docs.google.com/document/d/1H9ARhBRMKBNuOhjyxVQ_1o6cysvpppI7uA-TJrO_UZ4/edit#bookmark=id.jrjo4xhfnh7b Tentative schedule here]
* Format proposal is 4, 15-minute talks followed by 30 minutes of open discussion / Q&A
* Still need someone to speak (~15 minutes) about the Alliance
=== WB Progress Report ===
* 2021 documents in [https://drive.google.com/drive/folders/1p8x9tEOfZ4DQvTcPSdNR5-JoPJu--ZAu?usp=sharing this Google Drive folder]
* Note: there is one [https://docs.google.com/document/d/13E9k5JvDpUN4kWnrTm4M2iphnAJSTpk02ZiGl8O6bM4/edit?usp=sharing 2021 "Progress" document] and a second (separate) [https://docs.google.com/document/d/1j0HkCwuimK6DD-ui1tAkYMNpLRhxR9xb1FdSDZXFXCI/edit?usp=sharing "Future Plans" document]
* Existing future plans text has been moved to the "Future Plans" document
=== OpenBiosystems RNAi clone IDs ===
* User looking to map Open Biosystems RNAi clone names to WB clone names
* We may need to get a mapping file from Open Biosystems
=== FPKM data ===
* Wen has produced a csv file of FPKM values; can generate as part of the SPELL pipeline
* May be better to generate at Hinxton
=== OA Dumpers ===
* Daniela and Juancarlos have been working on the Picture OA and Expr OA dumpers
* Inconsistencies have accumulated for all OA dumpers as each has been made separately
* Juancarlos is working on a generalized, modular way to handle dumping
* Should we handle historical genes in the same way across OAs?
** Sure, but we need the "Historical_gene" tag in the respective ACEDB model
** Decision: we will continue to only dump historical genes for specific OAs, with a plan to maybe make consistent across OAs in the future
* Could we retroactively deal with paper-gene connections? We could possibly look in Postgres history tables to see which genes had been replaced previously (by Kimberly)
=== Gene name ambiguities ===
* Jae noticed that some gene names associated with multiple WBGene IDs (e.g. one public name is the same as another gene's other name) have the same references attached
* May require updating the paper-gene connections for some of these
* One example is cep-1 gene. It associates with 3 diff WBgeneID and sharing papers in the reference widget.
=== NIH Supplement for AI readiness ===
* Could we set up curation for neural circuits using a knowledge graph (e.g. GO-CAM)?
** Maybe we could convert the anatomy function model to LinkML -> OWL statements?
** Maybe setup a graphical curation interface?
* Transcriptional regulation
** Would be good to establish a common model (for the Alliance?)
** CeNGEN project produced lots of predictions of TF binding sites based on single-cell expression data; Eduardo: these models should be able to be regenerated each time new data sets are published, but this requires greater integration in a central, sustainable resource
* Paul S can send a link for the supplement
=== Variant First Pass Pipeline ===
* Valerio: Are there any existing pipelines to make allele-paper and/or strain-paper associations?
* Not sure, should ask Karen

Revision as of 17:55, 13 May 2021