Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
m (Replaced content with "= Previous Years = 2009 Meetings 2011 Meetings WormBase-Caltech_Weekly_Calls_2012|2012 M...")
Tag: Replaced
Line 31: Line 31:
 
[[WormBase-Caltech_Weekly_Calls_March_2021|March]]
 
[[WormBase-Caltech_Weekly_Calls_March_2021|March]]
  
 
+
[[WormBase-Caltech_Weekly_Calls_April_2021|April]]
== April 1, 2021 ==
 
 
 
=== Antibodies ===
 
* Alignment of the antibody class to Alliance:
 
** Propose to move possible_pseudonym (192) and Other_animal (37) to remarks. Those tags are not currently used for curation.
 
*** Other animal is sometimes used for older annotations, e.g. authors say that the antibodies were raised both  in rats and rabbits. Standard practice would create 2 records, one for the rat antibody and one for the rabbit.
 
*** Possible pseudonym was used when  a curator was not able to unambiguously assign a previous antibody to a record. (we have a Other name -synonym- tag to capture unambiguous ones). When moving to remarks we can keep a controlled vocabulary for easy future parsing, e.g. “possible_pseudonym:”
 
** Antigen field: currently separated into Protein, peptide, and other_antigen (e.g.: homogenate of early C.elegans embryos, sperm). Propose to use just one antigen field to capture antigen info.
 
 
 
All changes proposed above were approved by the group
 
 
 
=== textpress-dev clean up ===
 
* Michael has asked curators to assess what they have on textpresso-dev as it will not be around forever :-(
 
* is it okay to transfer data and files we want to keep to tazendra? and then to our own individual machines?
 
* Direct access may be possible via Caltech VPN
 
* Do we want to move content to AWS? May be complicated; it is still easy and cheap to maintain local file systems/machines
 
 
 
=== Braun servers ===
 
* 3 servers stored in Braun server room; is there a new contact person for accessing these servers?
 
* Mike Miranda replacement just getting settled; Paul will find out who is managing the server room and let Raymond know
 
 
 
=== Citace upload ===
 
* Next Friday, April 9th, by end of the day
 
* Wen will contact Paul Davis for the frozen WS280 models file
 
 
 
 
 
== April 8, 2021 ==
 
 
 
=== Braun server outage ===
 
* Raymond fixed; now Spica, wobr and wobr2 are back up
 
 
 
=== Textpresso API ===
 
* Was down yesterday affecting WormiCloud; Michael has fixed
 
* Valerio will learn how to manage the API for the future
 
 
 
=== Grant opportunities ===
 
* Possibilities to apply for supplements
 
* May 15th deadline
 
* Druggable genome project
 
** Pharos: https://pharos.nih.gov/
 
** could we contribute?
 
* Visualization, tools, etc.
 
* Automated person descriptions?
 
* Automated descriptions for proteins, ion channels, druggable targets, etc.?
 
 
 
=== New WS280 ONTOLOGY FTP directory ===
 
* Changes requested here: https://github.com/WormBase/website/issues/7900
 
* Here's the FTP URL: ftp://ftp.wormbase.org/pub/wormbase/releases/WS280/ONTOLOGY/
 
* Known issues (Chris will report):
 
** Ontology files are provided as ".gaf" in addition to ".obo"; we need to remove the ".gaf" OBO files
 
** Some files are duplicated and/or have inappropriate file extensions
 
 
 
=== Odd characters in Postgres ===
 
* Daniela and Juancarlos discovered some errors with respect to special characters pasted into the OA
 
* Daniela would like to automatically pull in micropublication text (e.g. figure captions) into Postgres
 
* We would need an automated way to convert special characters, like degree symbols ° into html unicode \&deg\;
 
* Juancarlos and Valerio will look into possibly switching from a Perl module to a Python module to handle special characters
 
 
 
 
 
== April 15, 2021 ==
 
 
 
=== Special characters in Postgres/OA ===
 
* Juancarlos working on/proposing a plan to store UTF-8 characters in Postgres and the OA which would then get converted, at dumping, to HTML entities (e.g. α) for the ACE files
 
* There is still a bit of cleanup needed to fix or remove special characters (not necessarily UTF-8) that apparently got munged upon copy/pasting into the OA in the past
 
* Note: copy/paste from a PDF often works fine, but sometimes does not work as expected so manual intervention would be needed (e.g. entering Greek characters by hand in UTF-8 format)
 
* Would copy/pasting from HTML be better than PDF?
 
* For Person curation it would be good to be able to faithfully store and display appropriate foreign characters (e.g. Chinese characters, Danish characters, etc.)
 
* Mangolassi script called "get_summary_characters.pl" located here: /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters
 
** Juancarlos will modify script to take a data type code as an argument on the command line and return all Postgres tables (and their respective PGIDs) that have special characters, e.g.
 
*** $ ./get_summary_characters.pl exp
 
*** $ ./get_summary_characters.pl int
 
*** $ ./get_summary_characters.pl grg
 
** or could pass just the datatype + field (postgres table). e.g.
 
*** $ ./get_summary_characters.pl pic_description
 
** Juancarlos will email everyone once it's ready.  It's ready, email sent.  Script is at /home/postgres/work/pgpopulation/oa_general/20210411_unicode_html/get_summary_characters.pl  Symlink this to your directory and run it from there, it will create files in the directory you are at when running it.
 
* Action items:
 
** Juancarlos will update the "get_summary_characters.pl" script as described above
 
** Curators should use the "get_summary_characters.pl" to look for (potentially) bad characters in their OAs/Postgres tables
 
** Need to perform bulk (automated) replacement of existing HTML entities into corresponding UTF-8 characters
 
** Curators will need to work with Juancarlos for each OA to modify the dumper
 
** Juancarlos will write (or append to existing) Postgres/OA dumping scripts to:
 
*** 1) Convert UTF-8 characters to HTML entities in ACE files
 
*** 2) Convert special quote and hyphen characters into simple versions that don't need special handling
 
 
 
=== CeNGEN pictures ===
 
* Model change went in to accommodate images from the CeNGEN project
 
* Want gene page images for CeNGEN data; have the specifications for such images been worked out? Maybe not yet
 
* Raymond and Daniela will work with data producers to acquire images when ready
 
 
 
=== Supplement opportunities ===
 
* Money available for software development to "harden" existing software
 
* Might be possible to make Eduardo's single cell analysis tools more sustainable
 
* Could make WormiCloud adapted to Alliance?
 
* Put Noctua on more stable production footing? (GO cannot apply as they are in final year of existing grant)
 
 
 
=== Student project for Textpresso ===
 
* Create tool to allow user to submit text and return a list of similar papers
 
* Use cases:
 
** curator wants an alert to find papers similar to what they've curated
 
** look for potential reviewers of a paper based on similar text content
 
 
 
 
 
== April 22, 2021 ==
 
 
 
=== LinkML hackathon ===
 
* Need to consider who works on what and how to coordinate
 
* Need to practice good Git practice
 
** Merge main branch into local branch before merging back into main branch to make sure everything works
 
* How will we best handle AceDB hash structures? likely use something like Mark QT demonstrated
 
** Do we have any/many hash-within-hash structures? #Molecular_change is used as a hash and tags within that model all reference the #Evidence hash
 
* GO annotation extensions offer an interesting challenge
 
 
 
=== IWM workshop ===
 
* Need to submit a workshop schedule (who speaks about what and when) by next Thursday April 29th
 
* An initial idea was to promote data in ACEDB that may be underutilized or many users may be unaware of
 
** An example might be transcription factor data: the ?Transcription_factor class and the modENCODE TF data
 
** Single cell data and tools? CeNGEN, Eduardo's single cell tools
 
** RNA-Seq FPKM values for genes and related data; Wen will write script to pull out FPKM values from SRA data and send to Magdalena
 
* In addition to WB data types, we will cover Alliance, AFP, and community curation
 
* Google doc for workshop here: https://docs.google.com/document/d/1H9ARhBRMKBNuOhjyxVQ_1o6cysvpppI7uA-TJrO_UZ4/edit?usp=sharing
 
 
 
=== WB Progress Report ===
 
* Due April 30th
 
* There will be two documents: progress and plans
 
* Place text in the appropriate places (don't write as a single integrated unit)
 
* Paul S will put together a Google doc
 
* We CAN include Alliance harmonization efforts
 
* 2020 Progress report: https://docs.google.com/document/d/1f3ettnkvwoKKiaAA4TSrpSQPEF7FmVVn6u2UdflA_So/edit?usp=sharing
 
* Last year milestone was WS276; we will compare to WS280
 
* Google "WormBase Grants" folder: https://drive.google.com/drive/folders/1p8x9tEOfZ4DQvTcPSdNR5-JoPJu--ZAu?usp=sharing
 
* 2021 Progress Report document here: https://docs.google.com/document/d/13E9k5JvDpUN4kWnrTm4M2iphnAJSTpk02ZiGl8O6bM4/edit?usp=sharing
 
 
 
 
 
== April 29, 2021 ==
 
 
 
=== IWM Workshop Schedule ===
 
* Schedule format due today (April 29th)
 
* [https://docs.google.com/document/d/1H9ARhBRMKBNuOhjyxVQ_1o6cysvpppI7uA-TJrO_UZ4/edit#bookmark=id.jrjo4xhfnh7b Tentative schedule here]
 
* Format proposal is 4, 15-minute talks followed by 30 minutes of open discussion / Q&A
 
* Still need someone to speak (~15 minutes) about the Alliance
 
 
 
=== WB Progress Report ===
 
* 2021 documents in [https://drive.google.com/drive/folders/1p8x9tEOfZ4DQvTcPSdNR5-JoPJu--ZAu?usp=sharing this Google Drive folder]
 
* Note: there is one [https://docs.google.com/document/d/13E9k5JvDpUN4kWnrTm4M2iphnAJSTpk02ZiGl8O6bM4/edit?usp=sharing 2021 "Progress" document] and a second (separate) [https://docs.google.com/document/d/1j0HkCwuimK6DD-ui1tAkYMNpLRhxR9xb1FdSDZXFXCI/edit?usp=sharing "Future Plans" document]
 
* Existing future plans text has been moved to the "Future Plans" document
 
 
 
=== OpenBiosystems RNAi clone IDs ===
 
* User looking to map Open Biosystems RNAi clone names to WB clone names
 
* We may need to get a mapping file from Open Biosystems
 
 
 
=== FPKM data ===
 
* Wen has produced a csv file of FPKM values; can generate as part of the SPELL pipeline
 
* May be better to generate at Hinxton
 
 
 
=== OA Dumpers ===
 
* Daniela and Juancarlos have been working on the Picture OA and Expr OA dumpers
 
* Inconsistencies have accumulated for all OA dumpers as each has been made separately
 
* Juancarlos is working on a generalized, modular way to handle dumping
 
* Should we handle historical genes in the same way across OAs?
 
** Sure, but we need the "Historical_gene" tag in the respective ACEDB model
 
** Decision: we will continue to only dump historical genes for specific OAs, with a plan to maybe make consistent across OAs in the future
 
* Could we retroactively deal with paper-gene connections? We could possibly look in Postgres history tables to see which genes had been replaced previously (by Kimberly)
 
 
 
=== Gene name ambiguities ===
 
* Jae noticed that some gene names associated with multiple WBGene IDs (e.g. one public name is the same as another gene's other name) have the same references attached
 
* May require updating the paper-gene connections for some of these
 
* One example is cep-1 gene. It associates with 3 diff WBgeneID and sharing papers in the reference widget.
 
 
 
=== NIH Supplement for AI readiness ===
 
* Could we set up curation for neural circuits using a knowledge graph (e.g. GO-CAM)?
 
** Maybe we could convert the anatomy function model to LinkML -> OWL statements?
 
** Maybe setup a graphical curation interface?
 
* Transcriptional regulation
 
** Would be good to establish a common model (for the Alliance?)
 
** CeNGEN project produced lots of predictions of TF binding sites based on single-cell expression data; Eduardo: these models should be able to be regenerated each time new data sets are published, but this requires greater integration in a central, sustainable resource
 
* Paul S can send a link for the supplement
 
 
 
=== Variant First Pass Pipeline ===
 
* Valerio: Are there any existing pipelines to make allele-paper and/or strain-paper associations?
 
* Not sure, should ask Karen
 

Revision as of 17:55, 13 May 2021