Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
Line 7: Line 7:
 
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2013|2013 Meetings]]
  
= 2013 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_2014|2014 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_January_2013|January]]
+
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_February_2013|February]]
+
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_March_2013|March]]
+
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_April_2013|April]]
+
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_May_2013|May]]
+
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_June_2013|June]]
+
[[WormBase-Caltech_Weekly_Calls_2021|2021 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_July_2013|July]]
+
[[WormBase-Caltech_Weekly_Calls_2022|2022 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_August_2013|August]]
+
[[WormBase-Caltech_Weekly_Calls_2023|2023 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_September_2013|September]]
+
==April 18th, 2024==
 +
*NNC pipeline being switched off locally and moving into the Alliance ABC.
  
 +
==April 11th, 2024==
 +
*Caltech WS293 ace files ready for the upload
  
 +
==April 4th, 2024==
 +
* Continued discussion on sustainability
 +
* CZI, single cell RNAseq for Alliance -> anything happening will be few months down the road
 +
** Data is still going to SPELL and enrichment analysis
 +
** Peter Roy asking about expression profile of a condition and find similar expression profiles (SPELL like analysis) but SPELL cannot currently deal with scRNAseq data. Wen says it is possible (regarding each cell group as an experiment). Can try loading the into SPELL. Doe it improve the function of SPELL? Only 5-10 datasets. These data are a bit different from bulk RNAseq.
 +
* Textpresso: good to have a presentation for other MODs to show Textpresso capabilities? Yes. Maybe during sprint review
 +
* Michael's presentation on LLMs - Named Entity Recognition (NER)
  
== October 3, 2013 ==
+
==March 14, 2024==
  
 +
=== TAGC debrief ===
  
=== Development environment for Raymond ===
+
==February 22, 2024==
*Todd is working on it; will do today
 
  
 +
===NER with LLMs===
  
=== NAR Manuscript ===
+
* Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
*Todd will send around for comments today
 
  
 +
* Is this similar to the FlyBase system? Recording of presentation  https://drive.google.com/drive/folders/1S4kZidL7gvBH6SjF4IQujyReVVRf2cOK
  
=== WormBase Ontology Browser (WOBr) ===
+
* Textpresso server is kaput. Services need to be transferred onto Alliance servers.
*We have a stable server for the information the browser queries: gene ontology associations, gene association files
 
*Raymond was trying to keep up with the AMIGO 2 development upgrades
 
*Raymond will soon be getting other ontology info into the browser
 
*Takes about an hour to load data into the browser
 
  
 +
* There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.
  
=== Movie objects stored as URLs ===
+
* Alliance curation status form development needs use cases. ref https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls#February_15.2C_2024
*There are 209 RNAi objects referencing 513 movies via URLs
 
*The URLs are for RNAi.org pages that have the movies; we don't have the movie files themselves
 
*Daniela will add a Database tag to the ?Movie class
 
*URL suffixes/endings will act as the RNAi.org Accession number that will go into the respective Movie objects
 
  
  
  
=== Movie curation ===  
+
==February 15, 2024==
*Movie files are stored on Canopus
 
*Anyone importing new movies will need to get their movies into the appropriate directories on Canopus
 
*Currently movie files are all from three large scale movie datasets, so names are currently unique
 
*Daniela will now sort movies according to WBPaper ID or WBPerson ID (depending on the source) and place them in directories accordingly
 
*Movie files will then be organized according to these directories as ?Picture objects are already
 
  
 +
=== Literature Migration to the Alliance ABC ===
 +
==== Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)? ====
 +
===== Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic =====
 +
*Facet for topic
 +
*Facet for automatic assertion
 +
**neural network method
 +
*Facet for confidence level
 +
**High
 +
*Facet for manual assertion
 +
**author assertion
 +
***ACKnowledge method
 +
**professional biocurator assertion
 +
***curation tools method - NULL
  
=== Changing "Biological Process" to "Biological Topic" ===  
+
===== Manually validate paper - topic flags without curating =====
*Changing the class name in the web code is laborious and not good practice
+
*Facet for topic
*We will keep the ?WBProcess class name, but only refer to them as "Biological Topics" on the website
+
*Facet for manual assertion
*A comment will be added to the models.wrm file explaining the discrepancy
+
**professional biocurator assertion
 +
***ABC - no data
  
 +
===== View all topic and entity flags for a given paper and validate, if needed =====
 +
* Search ABC with paper identifier
 +
* Migrate to Topic and Entity Editor
 +
* View all associated data
 +
* Manually validate flags, if needed
  
=== Database OA ===  
+
=== PDF Storage ===
*Karen and Juancarlos will construct a ?Database class OA
+
* At the Alliance PDFs will be stored in Amazon s3
*Curators can enter new databases or edit existing databases there
+
* We are not planning to formally store back-up copies elsewhere
*There is a static file for the web code that provides URL constructors for links out to those databases and their objects
+
* Is this okay with everyone?
*In the past, the file has been manually updated to accommodate new databases or updates to URLs, for example
 
*There needs to be a mechanism in place to edit or replace the file to correspond to the Database OA data
 
  
 +
==February 8, 2024==
 +
* TAGC
 +
** Prominent announcement on the Alliance home page?
  
=== Unfolded Protein Response curation - Pilot topic curation ===
+
* Fixed login on dockerized system (dev). Can everybody test their forms?
*Already an ER UPR Stress WikiPathway and a Mitochondrial UPR WIkiPathway
 
*Do we create one parent pathway/topic, or do we keep them separate?
 
*As we work on a topic, we will assess how much material there is to cover and whether or not we need to be more specific (or more broad) in topic
 
  
 +
==February 1, 2024==
 +
* Paul will ask Natalia to take care of pending reimbursements
 +
* Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.
  
 +
==January 25, 2024==
  
== October 10, 2013 ==
+
=== Curator Info on Curation Forms ===
 +
* Saving curator info using cookies in dockerized forms. Can we deploy to prod?
  
 +
=== ACKnowledge Author Request - WBPaper00066091 ===
 +
* I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
  
=== ?Expr_pattern and ?Movie model changes ===
+
* The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.  
*Daniela was proposing changes to the models
 
*Daniela can/should coordinate with Wen for microarray/tiling array data/quantitative expression data
 
* For now Daniela will request only Tiling Array and Microarray tags. Wen will put in the additional model changes later on.
 
*Does the text entry need to be indexed (?Text vs Text) for DB_info?
 
  
 +
* I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
  
=== Cataloging controlled vocabularies used in OA and models ===
+
* Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.
*Karen generated Wiki page: http://wiki.wormbase.org/index.php/OA_forms,_tables,_scripts,_etc#OA_form_dropdown_lists
 
*Tables list Postgres tables and OA forms and the controlled vocabularies (drop down values that dump to Text fields in the .ACE file)
 
*Paul Davis wanted a list of all of the controlled vocabularies we use during curation
 
*We will want to create a Wiki page that clearly outlines a model-change protocol that people should follow when changing models and refer to this page
 
  
 +
=== Update on NN Classification via the Alliance ===
 +
* Use of primary/not primary/not designated flag to filter papers
 +
* Secondary filter on papers with at least C. elegans as species
 +
* Finalize sources (i.e. evidence) for entity and topic tags on papers
 +
* Next NN clasification scheduled for ~March
  
=== WikiPathways ===
+
* We decided to process all papers (even non-elegans species) and have filters on species after processing.
*WormBase approval process needs to be flushed out
+
* NNC html pages will show NNC values together with species.  
*We can annotate edges in a pathway with text: references/citations, text from article, WormBase object referenes, etc.
+
* Show all C. elegans papers first and other species in a separate bin.
*We want to link all relevant WormBase objects to the pathway where possible
 
*We want to establish a graded spectrum of confidence for each pathway component/annotation/edge
 
*Confidence or rigor of annotation can span across evidence types: Review reference only (low score), review figure/table, primary research evidence article, primary research evidence figure/table within article, WormBase data object (e.g. interaction, etc.) (high score)
 
  
 +
=== Travel Reimbursements ===
 +
* Still waiting on October travel reimbursement (Kimberly)
 +
* Still waiting on September and October travel reimbursements (Wen)
  
=== Topic curation: Unfolded Protein Response ===
+
=== UniProt ===
*Mitochondrial UPR pathway an official WormBase topic
+
* Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
*WikiPathway curation would be best while curating a paper; update on WikiPathways site for other curators to see
+
* Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
*We can add annotations (papers, figures, tables, WB data objects) to edges in pathways
+
* Stavros escalates the issue on Hinxton Standup.
*While curating a data type, there are many more aspects of the biology from the paper that inform the curator of the pathway
+
* Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.
*Regular editing and updating of WikiPathways will keep other curators updated on what still needs adding to the pathway
 
*We should continue to focus on how this pipeline can be improved for future curation
 
  
 +
==January 18, 2024==
 +
* OA showing different names highlighted when logging in the OA, now fixed on staging
  
  
== October 17, 2013 ==
+
==January 11, 2024==
 +
* Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
 +
** Curators should make sure that, when pasting special characters, the duplicate function works
 +
* OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
 +
** If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
 +
* Chris tested on staging and production the phenotype form and the data are still going to tazendra
 +
** Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
 +
** Raymond: simply set up forwarding at our end?
 +
* AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
 +
* Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
 +
* Valerio would like to use an alliancegenome.org email address for the openAI account
 +
* New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
 +
** note: please move shared files that you own to new Alliance Google Drive.  Here is the link to the information that Chris Mungall sent:  For more instructions see the video and SOP here:https://agr-jira.atlassian.net/browse/SCRUM-925?focusedCommentId=40674
 +
* Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
 +
* Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
 +
* Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
 +
* It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
 +
* Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.
  
 
+
==January 4, 2024==
=== Variation Nameserver & GeneACE ===
+
* ACKnowlegde pipeline help desk question:
* Caltech will get JSON dump of the entire database/nameserver from Hinxton?
+
** Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
* There are issues with the synchronization of the Variation Nameserver and GeneACE. Karen is asking MaryAnn about it, and we'll decide whether our daily script should look at both or just GeneAce
+
* Citace upload, current deadline: Tuesday January 9th
* Frequency of dump: daily if possible (it should be)
+
** All processes (dumps, etc.) will happen on the cloud machine
 
+
** Curators need to deposit their files in the appropriate locations for Wen
 
+
* Micropublication pipeline
=== WormMine ===
+
** Ticketing system confusion
* JD has built a new version for WS239 including FASTA sequences (excluding CDS sequences) and RNAi objects
+
** Karen and Kimberly paper ID pipeline; may need sorting out of logistics
* JD has asked for curators to review the changes before pushing it to the live site
 
* The Intermine grant is up for renewal; they requested testimonials from curators; we can send them to JD
 
 
 
 
 
=== Topic Curation ===
 
* Still working on Unfolded Protein Response (UPR) for WS241
 
* Chris will send summary of UPR curation status (including gene list) to Hinxton
 
* Hinxton will work on touching up/curating gene models for UPR-relevant genes in ''C. elegans'' and homologs in other nematode species
 
* Hinxton suggested covering ncRNAs for WS242; we probably want to focus on a particular subtopic (one type of ncRNA with biogenesis and mechanism of action)
 
 
 
 
 
=== WikiPathways ===
 
* We need to establish a good way to annotate indirect associations (e.g. RNAi leads to gene expression change)
 
* SBML and SBGN exist but may not be entirely compatible with the WikiPathway arrows/components
 
 
 
 
 
=== WormBase Ontology Browser ===
 
* Raymond is waiting for development environment update from Todd
 
* Raymond is working on incorporating WormBase-specific ontologies into the browser
 
* Taxon-specific tag in model, could be useful for species-specific life stages
 
 
 
 
 
=== Release Schedule Details Wiki Page ===
 
* Would be good to establish a time during the cycle in which the web team is receptive to requests for changes
 
* Google Docs page [https://docs.google.com/a/wormbase.org/drawings/d/10wG5sicEj3SITlNompQgMBrwtUG7HJyiwhOepLfWdKk/edit here]
 
 
 
 
 
=== Database OA ===
 
* Current mechanism of updating Database URLs/URIs/constructors involves manually editing a Database flat file (on GitHub) read by the website https://github.com/WormBase/website/blob/staging/root/templates/config/external_urls
 
* Karen considering the development of a Database OA
 
* Issue is with denormalization of ACEDB to flat file
 
* If within the release cycle, the Database OA would need to be dumped and the info overwrite the flat file, but then it needs to be committed on GitHub and pushed to production
 
* Also a problem is coordinating with the Hinxton Database info
 
* Although laborious, may still be better to manually edit the flat file
 

Latest revision as of 16:04, 18 April 2024

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

2022 Meetings

2023 Meetings

April 18th, 2024

  • NNC pipeline being switched off locally and moving into the Alliance ABC.

April 11th, 2024

  • Caltech WS293 ace files ready for the upload

April 4th, 2024

  • Continued discussion on sustainability
  • CZI, single cell RNAseq for Alliance -> anything happening will be few months down the road
    • Data is still going to SPELL and enrichment analysis
    • Peter Roy asking about expression profile of a condition and find similar expression profiles (SPELL like analysis) but SPELL cannot currently deal with scRNAseq data. Wen says it is possible (regarding each cell group as an experiment). Can try loading the into SPELL. Doe it improve the function of SPELL? Only 5-10 datasets. These data are a bit different from bulk RNAseq.
  • Textpresso: good to have a presentation for other MODs to show Textpresso capabilities? Yes. Maybe during sprint review
  • Michael's presentation on LLMs - Named Entity Recognition (NER)

March 14, 2024

TAGC debrief

February 22, 2024

NER with LLMs

  • Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
  • Textpresso server is kaput. Services need to be transferred onto Alliance servers.
  • There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.


February 15, 2024

Literature Migration to the Alliance ABC

Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)?

Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic
  • Facet for topic
  • Facet for automatic assertion
    • neural network method
  • Facet for confidence level
    • High
  • Facet for manual assertion
    • author assertion
      • ACKnowledge method
    • professional biocurator assertion
      • curation tools method - NULL
Manually validate paper - topic flags without curating
  • Facet for topic
  • Facet for manual assertion
    • professional biocurator assertion
      • ABC - no data
View all topic and entity flags for a given paper and validate, if needed
  • Search ABC with paper identifier
  • Migrate to Topic and Entity Editor
  • View all associated data
  • Manually validate flags, if needed

PDF Storage

  • At the Alliance PDFs will be stored in Amazon s3
  • We are not planning to formally store back-up copies elsewhere
  • Is this okay with everyone?

February 8, 2024

  • TAGC
    • Prominent announcement on the Alliance home page?
  • Fixed login on dockerized system (dev). Can everybody test their forms?

February 1, 2024

  • Paul will ask Natalia to take care of pending reimbursements
  • Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.

January 25, 2024

Curator Info on Curation Forms

  • Saving curator info using cookies in dockerized forms. Can we deploy to prod?

ACKnowledge Author Request - WBPaper00066091

  • I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
  • The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.
  • I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
  • Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.

Update on NN Classification via the Alliance

  • Use of primary/not primary/not designated flag to filter papers
  • Secondary filter on papers with at least C. elegans as species
  • Finalize sources (i.e. evidence) for entity and topic tags on papers
  • Next NN clasification scheduled for ~March
  • We decided to process all papers (even non-elegans species) and have filters on species after processing.
  • NNC html pages will show NNC values together with species.
  • Show all C. elegans papers first and other species in a separate bin.

Travel Reimbursements

  • Still waiting on October travel reimbursement (Kimberly)
  • Still waiting on September and October travel reimbursements (Wen)

UniProt

  • Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
  • Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
  • Stavros escalates the issue on Hinxton Standup.
  • Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.

January 18, 2024

  • OA showing different names highlighted when logging in the OA, now fixed on staging


January 11, 2024

  • Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
    • Curators should make sure that, when pasting special characters, the duplicate function works
  • OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
    • If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
  • Chris tested on staging and production the phenotype form and the data are still going to tazendra
    • Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
    • Raymond: simply set up forwarding at our end?
  • AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
  • Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
  • Valerio would like to use an alliancegenome.org email address for the openAI account
  • New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
  • Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
  • Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
  • Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
  • It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
  • Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.

January 4, 2024

  • ACKnowlegde pipeline help desk question:
    • Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
  • Citace upload, current deadline: Tuesday January 9th
    • All processes (dumps, etc.) will happen on the cloud machine
    • Curators need to deposit their files in the appropriate locations for Wen
  • Micropublication pipeline
    • Ticketing system confusion
    • Karen and Kimberly paper ID pipeline; may need sorting out of logistics