|
|
(7 intermediate revisions by the same user not shown) |
Line 31: |
Line 31: |
| [[WormBase-Caltech_Weekly_Calls_March_2021|March]] | | [[WormBase-Caltech_Weekly_Calls_March_2021|March]] |
| | | |
| + | [[WormBase-Caltech_Weekly_Calls_April_2021|April]] |
| | | |
− | == April 1, 2021 ==
| |
| | | |
− | === Antibodies === | + | == May 13, 2021 == |
− | * Alignment of the antibody class to Alliance:
| |
− | ** Propose to move possible_pseudonym (192) and Other_animal (37) to remarks. Those tags are not currently used for curation.
| |
− | *** Other animal is sometimes used for older annotations, e.g. authors say that the antibodies were raised both in rats and rabbits. Standard practice would create 2 records, one for the rat antibody and one for the rabbit.
| |
− | *** Possible pseudonym was used when a curator was not able to unambiguously assign a previous antibody to a record. (we have a Other name -synonym- tag to capture unambiguous ones). When moving to remarks we can keep a controlled vocabulary for easy future parsing, e.g. “possible_pseudonym:”
| |
− | ** Antigen field: currently separated into Protein, peptide, and other_antigen (e.g.: homogenate of early C.elegans embryos, sperm). Propose to use just one antigen field to capture antigen info.
| |
| | | |
− | All changes proposed above were approved by the group
| + | === Textpresso supplement === |
| + | * Due Monday |
| + | * Michael working with Paul S |
| | | |
− | === textpress-dev clean up === | + | === AWS credits === |
− | * Michael has asked curators to assess what they have on textpresso-dev as it will not be around forever :-( | + | * Michael and Valerio were awarded AWS credits, more than they can use |
− | * is it okay to transfer data and files we want to keep to tazendra? and then to our own individual machines? | + | * Maybe they can be repurposed |
− | * Direct access may be possible via Caltech VPN
| + | * Valerio will play around with AWS to determine the best/cheapest configuration before migrating to the Alliance |
− | * Do we want to move content to AWS? May be complicated; it is still easy and cheap to maintain local file systems/machines | |
| | | |
− | === Braun servers === | + | === Automated gene descriptions === |
− | * 3 servers stored in Braun server room; is there a new contact person for accessing these servers? | + | * Will the Alliance ever accommodate non-elegans worm species? Can we port over the computed/derived descriptions for non-elegans species to the Alliance? |
− | * Mike Miranda replacement just getting settled; Paul will find out who is managing the server room and let Raymond know | + | * Maybe have clade-specific descriptions based on the popular model (worms based on C. elegans); may be provided in MOD portal page(s) |
| + | * May be the focus of an Alliance supplement |
| + | * We want a flexible pipeline that can be configured depending on availability of data (e.g. protein domains) |
| | | |
− | === Citace upload === | + | === IWM 2021 WB Workshop === |
− | * Next Friday, April 9th, by end of the day | + | * Scheduled for June 22, 2021 |
− | * Wen will contact Paul Davis for the frozen WS280 models file
| + | * Session begins at 8:30am Pacific / 11:30am Eastern / 4:30pm UK |
− | | + | * Workshop runs for 90 minutes: 4 15-minute talks followed by 30 minute Q&A session |
− | | + | * Here is the submitted workshop schedule: |
− | == April 8, 2021 ==
| + | 11:30 am (EDT) Magdalena Zarowiecki, EMBL-EBI, A whistle-stop tour of all the types of data you can find in WormBase |
− | | + | 11:45 am (EDT) Chris Grove, California Institute of Technology, Researching transcriptional regulation using WormBase transcription factors, TF binding sites and the modENCODE data |
− | === Braun server outage ===
| + | 12:00 pm (EDT) Ranjana Kishore, California Institute of Technology, Comparative genomics and disease research using Alliance of Genome Resources |
− | * Raymond fixed; now Spica, wobr and wobr2 are back up | + | 12:15 pm (EDT) Daniela Raciti, California Institute of Technology, How can you contribute? Community curation and tools, and the author-first-pass (AFP) pipeline |
− | | + | 12:30 pm (EDT) Chris Grove, California Institute of Technology, Open Discussion / Q & A |
− | === Textpresso API ===
| |
− | * Was down yesterday affecting WormiCloud; Michael has fixed
| |
− | * Valerio will learn how to manage the API for the future
| |
− | | |
− | === Grant opportunities ===
| |
− | * Possibilities to apply for supplements
| |
− | * May 15th deadline
| |
− | * Druggable genome project
| |
− | ** Pharos: https://pharos.nih.gov/
| |
− | ** could we contribute? | |
− | * Visualization, tools, etc.
| |
− | * Automated person descriptions?
| |
− | * Automated descriptions for proteins, ion channels, druggable targets, etc.?
| |
− | | |
− | === New WS280 ONTOLOGY FTP directory ===
| |
− | * Changes requested here: https://github.com/WormBase/website/issues/7900
| |
− | * Here's the FTP URL: ftp://ftp.wormbase.org/pub/wormbase/releases/WS280/ONTOLOGY/ | |
− | * Known issues (Chris will report):
| |
− | ** Ontology files are provided as ".gaf" in addition to ".obo"; we need to remove the ".gaf" OBO files
| |
− | ** Some files are duplicated and/or have inappropriate file extensions
| |
− | | |
− | === Odd characters in Postgres ===
| |
− | * Daniela and Juancarlos discovered some errors with respect to special characters pasted into the OA
| |
− | * Daniela would like to automatically pull in micropublication text (e.g. figure captions) into Postgres
| |
− | * We would need an automated way to convert special characters, like degree symbols ° into html unicode \°\;
| |
− | * Juancarlos and Valerio will look into possibly switching from a Perl module to a Python module to handle special characters
| |
− | | |
− | | |
− | == April 15, 2021 ==
| |
− | | |
− | === Special characters in Postgres/OA ===
| |
− | * Juancarlos working on/proposing a plan to store UTF-8 characters in Postgres and the OA which would then get converted, at dumping, to HTML entities (e.g. α) for the ACE files
| |
− | * There is still a bit of cleanup needed to fix or remove special characters (not necessarily UTF-8) that apparently got munged upon copy/pasting into the OA in the past
| |
− | * Note: copy/paste from a PDF often works fine, but sometimes does not work as expected so manual intervention would be needed (e.g. entering Greek characters by hand in UTF-8 format)
| |
− | * Would copy/pasting from HTML be better than PDF?
| |
− | * For Person curation it would be good to be able to faithfully store and display appropriate foreign characters (e.g. Chinese characters, Danish characters, etc.)
| |
− | * Mangolassi script called "get_summary_characters.pl" located here: /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters
| |
− | ** Juancarlos will modify script to take a data type code as an argument on the command line and return all Postgres tables (and their respective PGIDs) that have special characters, e.g.
| |
− | *** $ ./get_summary_characters.pl exp
| |
− | *** $ ./get_summary_characters.pl int
| |
− | *** $ ./get_summary_characters.pl grg
| |
− | ** or could pass just the datatype + field (postgres table). e.g.
| |
− | *** $ ./get_summary_characters.pl pic_description
| |
− | ** Juancarlos will email everyone once it's ready. It's ready, email sent. Script is at /home/postgres/work/pgpopulation/oa_general/20210411_unicode_html/get_summary_characters.pl Symlink this to your directory and run it from there, it will create files in the directory you are at when running it.
| |
− | * Action items:
| |
− | ** Juancarlos will update the "get_summary_characters.pl" script as described above
| |
− | ** Curators should use the "get_summary_characters.pl" to look for (potentially) bad characters in their OAs/Postgres tables
| |
− | ** Need to perform bulk (automated) replacement of existing HTML entities into corresponding UTF-8 characters
| |
− | ** Curators will need to work with Juancarlos for each OA to modify the dumper
| |
− | ** Juancarlos will write (or append to existing) Postgres/OA dumping scripts to:
| |
− | *** 1) Convert UTF-8 characters to HTML entities in ACE files
| |
− | *** 2) Convert special quote and hyphen characters into simple versions that don't need special handling
| |
− | | |
− | === CeNGEN pictures ===
| |
− | * Model change went in to accommodate images from the CeNGEN project
| |
− | * Want gene page images for CeNGEN data; have the specifications for such images been worked out? Maybe not yet
| |
− | * Raymond and Daniela will work with data producers to acquire images when ready
| |
− | | |
− | === Supplement opportunities ===
| |
− | * Money available for software development to "harden" existing software
| |
− | * Might be possible to make Eduardo's single cell analysis tools more sustainable
| |
− | * Could make WormiCloud adapted to Alliance?
| |
− | * Put Noctua on more stable production footing? (GO cannot apply as they are in final year of existing grant)
| |
− | | |
− | === Student project for Textpresso ===
| |
− | * Create tool to allow user to submit text and return a list of similar papers
| |
− | * Use cases:
| |
− | ** curator wants an alert to find papers similar to what they've curated
| |
− | ** look for potential reviewers of a paper based on similar text content
| |
− | | |
− | | |
− | == April 22, 2021 ==
| |
− | | |
− | === LinkML hackathon ===
| |
− | * Need to consider who works on what and how to coordinate
| |
− | * Need to practice good Git practice
| |
− | ** Merge main branch into local branch before merging back into main branch to make sure everything works
| |
− | * How will we best handle AceDB hash structures? likely use something like Mark QT demonstrated
| |
− | ** Do we have any/many hash-within-hash structures? #Molecular_change is used as a hash and tags within that model all reference the #Evidence hash
| |
− | * GO annotation extensions offer an interesting challenge
| |
− | | |
− | === IWM workshop ===
| |
− | * Need to submit a workshop schedule (who speaks about what and when) by next Thursday April 29th
| |
− | * An initial idea was to promote data in ACEDB that may be underutilized or many users may be unaware of
| |
− | ** An example might be transcription factor data: the ?Transcription_factor class and the modENCODE TF data
| |
− | ** Single cell data and tools? CeNGEN, Eduardo's single cell tools
| |
− | ** RNA-Seq FPKM values for genes and related data; Wen will write script to pull out FPKM values from SRA data and send to Magdalena
| |
− | * In addition to WB data types, we will cover Alliance, AFP, and community curation
| |
− | * Google doc for workshop here: https://docs.google.com/document/d/1H9ARhBRMKBNuOhjyxVQ_1o6cysvpppI7uA-TJrO_UZ4/edit?usp=sharing
| |
− | | |
− | === WB Progress Report ===
| |
− | * Due April 30th
| |
− | * There will be two documents: progress and plans
| |
− | * Place text in the appropriate places (don't write as a single integrated unit)
| |
− | * Paul S will put together a Google doc
| |
− | * We CAN include Alliance harmonization efforts
| |
− | * 2020 Progress report: https://docs.google.com/document/d/1f3ettnkvwoKKiaAA4TSrpSQPEF7FmVVn6u2UdflA_So/edit?usp=sharing
| |
− | * Last year milestone was WS276; we will compare to WS280
| |
− | * Google "WormBase Grants" folder: https://drive.google.com/drive/folders/1p8x9tEOfZ4DQvTcPSdNR5-JoPJu--ZAu?usp=sharing
| |
− | * 2021 Progress Report document here: https://docs.google.com/document/d/13E9k5JvDpUN4kWnrTm4M2iphnAJSTpk02ZiGl8O6bM4/edit?usp=sharing
| |
− | | |
− | | |
− | == April 29, 2021 ==
| |
− | | |
− | === IWM Workshop Schedule ===
| |
− | * Schedule format due today (April 29th)
| |
− | * [https://docs.google.com/document/d/1H9ARhBRMKBNuOhjyxVQ_1o6cysvpppI7uA-TJrO_UZ4/edit#bookmark=id.jrjo4xhfnh7b Tentative schedule here]
| |
− | * Format proposal is 4, 15-minute talks followed by 30 minutes of open discussion / Q&A
| |
− | * Still need someone to speak (~15 minutes) about the Alliance
| |
− | | |
− | === WB Progress Report ===
| |
− | * 2021 documents in [https://drive.google.com/drive/folders/1p8x9tEOfZ4DQvTcPSdNR5-JoPJu--ZAu?usp=sharing this Google Drive folder]
| |
− | * Note: there is one [https://docs.google.com/document/d/13E9k5JvDpUN4kWnrTm4M2iphnAJSTpk02ZiGl8O6bM4/edit?usp=sharing 2021 "Progress" document] and a second (separate) [https://docs.google.com/document/d/1j0HkCwuimK6DD-ui1tAkYMNpLRhxR9xb1FdSDZXFXCI/edit?usp=sharing "Future Plans" document]
| |
− | * Existing future plans text has been moved to the "Future Plans" document
| |
− | | |
− | === OpenBiosystems RNAi clone IDs ===
| |
− | * User looking to map Open Biosystems RNAi clone names to WB clone names
| |
− | * We may need to get a mapping file from Open Biosystems
| |
− | | |
− | === FPKM data ===
| |
− | * Wen has produced a csv file of FPKM values; can generate as part of the SPELL pipeline
| |
− | * May be better to generate at Hinxton
| |
− | | |
− | === OA Dumpers ===
| |
− | * Daniela and Juancarlos have been working on the Picture OA and Expr OA dumpers
| |
− | * Inconsistencies have accumulated for all OA dumpers as each has been made separately
| |
− | * Juancarlos is working on a generalized, modular way to handle dumping
| |
− | * Should we handle historical genes in the same way across OAs?
| |
− | ** Sure, but we need the "Historical_gene" tag in the respective ACEDB model
| |
− | ** Decision: we will continue to only dump historical genes for specific OAs, with a plan to maybe make consistent across OAs in the future
| |
− | * Could we retroactively deal with paper-gene connections? We could possibly look in Postgres history tables to see which genes had been replaced previously (by Kimberly)
| |
− | | |
− | === Gene name ambiguities ===
| |
− | * Jae noticed that some gene names associated with multiple WBGene IDs (e.g. one public name is the same as another gene's other name) have the same references attached
| |
− | * May require updating the paper-gene connections for some of these
| |
− | * One example is cep-1 gene. It associates with 3 diff WBgeneID and sharing papers in the reference widget.
| |
− | | |
− | === NIH Supplement for AI readiness ===
| |
− | * Could we set up curation for neural circuits using a knowledge graph (e.g. GO-CAM)?
| |
− | ** Maybe we could convert the anatomy function model to LinkML -> OWL statements?
| |
− | ** Maybe setup a graphical curation interface?
| |
− | * Transcriptional regulation
| |
− | ** Would be good to establish a common model (for the Alliance?)
| |
− | ** CeNGEN project produced lots of predictions of TF binding sites based on single-cell expression data; Eduardo: these models should be able to be regenerated each time new data sets are published, but this requires greater integration in a central, sustainable resource
| |
− | * Paul S can send a link for the supplement
| |
− | | |
− | === Variant First Pass Pipeline ===
| |
− | * Valerio: Are there any existing pipelines to make allele-paper and/or strain-paper associations?
| |
− | * Not sure, should ask Karen
| |