Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
Line 7: Line 7:
 
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2013|2013 Meetings]]
  
= 2013 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_2014|2014 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_January_2013|January]]
+
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_February_2013|February]]
+
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_March_2013|March]]
+
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_April_2013|April]]
+
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_May_2013|May]]
+
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_June_2013|June]]
+
[[WormBase-Caltech_Weekly_Calls_2021|2021 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_July_2013|July]]
+
[[WormBase-Caltech_Weekly_Calls_2022|2022 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_August_2013|August]]
+
[[WormBase-Caltech_Weekly_Calls_2023|2023 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_September_2013|September]]
 
  
 +
==March 14, 2024==
  
 +
=== TAGC debrief ===
  
== October 3, 2013 ==
+
==February 22, 2024==
  
 +
===NER with LLMs===
  
=== Development environment for Raymond ===
+
* Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
*Todd is working on it; will do today
 
  
 +
* Is this similar to the FlyBase system? Recording of presentation  https://drive.google.com/drive/folders/1S4kZidL7gvBH6SjF4IQujyReVVRf2cOK
  
=== NAR Manuscript ===
+
* Textpresso server is kaput. Services need to be transferred onto Alliance servers.
*Todd will send around for comments today
 
  
 +
* There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.
  
=== WormBase Ontology Browser (WOBr) ===
+
* Alliance curation status form development needs use cases. ref https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls#February_15.2C_2024
*We have a stable server for the information the browser queries: gene ontology associations, gene association files
 
*Raymond was trying to keep up with the AMIGO 2 development upgrades
 
*Raymond will soon be getting other ontology info into the browser
 
*Takes about an hour to load data into the browser
 
  
  
=== Movie objects stored as URLs ===
 
*There are 209 RNAi objects referencing 513 movies via URLs
 
*The URLs are for RNAi.org pages that have the movies; we don't have the movie files themselves
 
*Daniela will add a Database tag to the ?Movie class
 
*URL suffixes/endings will act as the RNAi.org Accession number that will go into the respective Movie objects
 
  
 +
==February 15, 2024==
  
 +
=== Literature Migration to the Alliance ABC ===
 +
==== Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)? ====
 +
===== Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic =====
 +
*Facet for topic
 +
*Facet for automatic assertion
 +
**neural network method
 +
*Facet for confidence level
 +
**High
 +
*Facet for manual assertion
 +
**author assertion
 +
***ACKnowledge method
 +
**professional biocurator assertion
 +
***curation tools method - NULL
  
=== Movie curation ===  
+
===== Manually validate paper - topic flags without curating =====
*Movie files are stored on Canopus
+
*Facet for topic
*Anyone importing new movies will need to get their movies into the appropriate directories on Canopus
+
*Facet for manual assertion
*Currently movie files are all from three large scale movie datasets, so names are currently unique
+
**professional biocurator assertion
*Daniela will now sort movies according to WBPaper ID or WBPerson ID (depending on the source) and place them in directories accordingly
+
***ABC - no data
*Movie files will then be organized according to these directories as ?Picture objects are already
 
  
 +
===== View all topic and entity flags for a given paper and validate, if needed =====
 +
* Search ABC with paper identifier
 +
* Migrate to Topic and Entity Editor
 +
* View all associated data
 +
* Manually validate flags, if needed
  
=== Changing "Biological Process" to "Biological Topic" ===  
+
=== PDF Storage ===
*Changing the class name in the web code is laborious and not good practice
+
* At the Alliance PDFs will be stored in Amazon s3
*We will keep the ?WBProcess class name, but only refer to them as "Biological Topics" on the website
+
* We are not planning to formally store back-up copies elsewhere
*A comment will be added to the models.wrm file explaining the discrepancy
+
* Is this okay with everyone?
  
 +
==February 8, 2024==
 +
* TAGC
 +
** Prominent announcement on the Alliance home page?
  
=== Database OA ===
+
* Fixed login on dockerized system (dev). Can everybody test their forms?
*Karen and Juancarlos will construct a ?Database class OA
 
*Curators can enter new databases or edit existing databases there
 
*There is a static file for the web code that provides URL constructors for links out to those databases and their objects
 
*In the past, the file has been manually updated to accommodate new databases or updates to URLs, for example
 
*There needs to be a mechanism in place to edit or replace the file to correspond to the Database OA data
 
  
 +
==February 1, 2024==
 +
* Paul will ask Natalia to take care of pending reimbursements
 +
* Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.
  
=== Unfolded Protein Response curation - Pilot topic curation ===
+
==January 25, 2024==
*Already an ER UPR Stress WikiPathway and a Mitochondrial UPR WIkiPathway
 
*Do we create one parent pathway/topic, or do we keep them separate?
 
*As we work on a topic, we will assess how much material there is to cover and whether or not we need to be more specific (or more broad) in topic
 
  
 +
=== Curator Info on Curation Forms ===
 +
* Saving curator info using cookies in dockerized forms. Can we deploy to prod?
  
 +
=== ACKnowledge Author Request - WBPaper00066091 ===
 +
* I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
  
== October 10, 2013 ==
+
* The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.
  
 +
* I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
  
=== ?Expr_pattern and ?Movie model changes ===
+
* Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.
*Daniela was proposing changes to the models
 
*Daniela can/should coordinate with Wen for microarray/tiling array data/quantitative expression data
 
* For now Daniela will request only Tiling Array and Microarray tags. Wen will put in the additional model changes later on.
 
*Does the text entry need to be indexed (?Text vs Text) for DB_info?
 
  
 +
=== Update on NN Classification via the Alliance ===
 +
* Use of primary/not primary/not designated flag to filter papers
 +
* Secondary filter on papers with at least C. elegans as species
 +
* Finalize sources (i.e. evidence) for entity and topic tags on papers
 +
* Next NN clasification scheduled for ~March
  
=== Cataloging controlled vocabularies used in OA and models ===
+
* We decided to process all papers (even non-elegans species) and have filters on species after processing.  
*Karen generated Wiki page: http://wiki.wormbase.org/index.php/OA_forms,_tables,_scripts,_etc#OA_form_dropdown_lists
+
* NNC html pages will show NNC values together with species.  
*Tables list Postgres tables and OA forms and the controlled vocabularies (drop down values that dump to Text fields in the .ACE file)
+
* Show all C. elegans papers first and other species in a separate bin.
*Paul Davis wanted a list of all of the controlled vocabularies we use during curation
 
*We will want to create a Wiki page that clearly outlines a model-change protocol that people should follow when changing models and refer to this page
 
  
 +
=== Travel Reimbursements ===
 +
* Still waiting on October travel reimbursement (Kimberly)
 +
* Still waiting on September and October travel reimbursements (Wen)
  
=== WikiPathways ===
+
=== UniProt ===
*WormBase approval process needs to be flushed out
+
* Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
*We can annotate edges in a pathway with text: references/citations, text from article, WormBase object referenes, etc.
+
* Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
*We want to link all relevant WormBase objects to the pathway where possible
+
* Stavros escalates the issue on Hinxton Standup.
*We want to establish a graded spectrum of confidence for each pathway component/annotation/edge
+
* Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.
*Confidence or rigor of annotation can span across evidence types: Review reference only (low score), review figure/table, primary research evidence article, primary research evidence figure/table within article, WormBase data object (e.g. interaction, etc.) (high score)
 
  
 +
==January 18, 2024==
 +
* OA showing different names highlighted when logging in the OA, now fixed on staging
  
=== Topic curation: Unfolded Protein Response ===
 
*Mitochondrial UPR pathway an official WormBase topic
 
*WikiPathway curation would be best while curating a paper; update on WikiPathways site for other curators to see
 
*We can add annotations (papers, figures, tables, WB data objects) to edges in pathways
 
*While curating a data type, there are many more aspects of the biology from the paper that inform the curator of the pathway
 
*Regular editing and updating of WikiPathways will keep other curators updated on what still needs adding to the pathway
 
*We should continue to focus on how this pipeline can be improved for future curation
 
  
 +
==January 11, 2024==
 +
* Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
 +
** Curators should make sure that, when pasting special characters, the duplicate function works
 +
* OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
 +
** If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
 +
* Chris tested on staging and production the phenotype form and the data are still going to tazendra
 +
** Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
 +
** Raymond: simply set up forwarding at our end?
 +
* AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
 +
* Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
 +
* Valerio would like to use an alliancegenome.org email address for the openAI account
 +
* New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
 +
** note: please move shared files that you own to new Alliance Google Drive.  Here is the link to the information that Chris Mungall sent:  For more instructions see the video and SOP here:https://agr-jira.atlassian.net/browse/SCRUM-925?focusedCommentId=40674
 +
* Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
 +
* Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
 +
* Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
 +
* It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
 +
* Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.
  
 
+
==January 4, 2024==
== October 17, 2013 ==
+
* ACKnowlegde pipeline help desk question:
 
+
** Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
 
+
* Citace upload, current deadline: Tuesday January 9th
=== Variation Nameserver & GeneACE ===
+
** All processes (dumps, etc.) will happen on the cloud machine
* Caltech will get JSON dump of the entire database/nameserver from Hinxton?
+
** Curators need to deposit their files in the appropriate locations for Wen
* There are issues with the synchronization of the Variation Nameserver and GeneACE. Karen is asking MaryAnn about it, and we'll decide whether our daily script should look at both or just GeneAce
+
* Micropublication pipeline
* Frequency of dump: daily if possible (it should be)
+
** Ticketing system confusion
 
+
** Karen and Kimberly paper ID pipeline; may need sorting out of logistics
 
 
=== WormMine ===
 
* JD has built a new version for WS239 including FASTA sequences (excluding CDS sequences) and RNAi objects
 
* JD has asked for curators to review the changes before pushing it to the live site
 
* The Intermine grant is up for renewal; they requested testimonials from curators; we can send them to JD
 
 
 
 
 
=== Topic Curation ===
 
* Still working on Unfolded Protein Response (UPR) for WS241
 
* Chris will send summary of UPR curation status (including gene list) to Hinxton
 
* Hinxton will work on touching up/curating gene models for UPR-relevant genes in ''C. elegans'' and homologs in other nematode species
 
* Hinxton suggested covering ncRNAs for WS242; we probably want to focus on a particular subtopic (one type of ncRNA with biogenesis and mechanism of action)
 
 
 
 
 
=== WikiPathways ===
 
* We need to establish a good way to annotate indirect associations (e.g. RNAi leads to gene expression change)
 
* SBML and SBGN exist but may not be entirely compatible with the WikiPathway arrows/components
 
 
 
 
 
=== WormBase Ontology Browser ===
 
* Raymond is waiting for development environment update from Todd
 
* Raymond is working on incorporating WormBase-specific ontologies into the browser
 
* Taxon-specific tag in model, could be useful for species-specific life stages
 
 
 
 
 
=== Release Schedule Details Wiki Page ===
 
* Would be good to establish a time during the cycle in which the web team is receptive to requests for changes
 
* Google Docs page [https://docs.google.com/a/wormbase.org/drawings/d/10wG5sicEj3SITlNompQgMBrwtUG7HJyiwhOepLfWdKk/edit here]
 
 
 
 
 
=== Database OA ===
 
* Current mechanism of updating Database URLs/URIs/constructors involves manually editing a Database flat file (on GitHub) read by the website
 
* Karen considering the development of a Database OA
 
* Issue is with denormalization of ACEDB to flat file
 
* If within the release cycle, the Database OA would need to be dumped and the info overwrite the flat file, but then it needs to be committed on GitHub and pushed to production
 
* Also a problem is coordinating with the Hinxton Database info
 
* Although laborious, may still be better to manually edit the flat file
 

Latest revision as of 18:18, 14 March 2024

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

2022 Meetings

2023 Meetings


March 14, 2024

TAGC debrief

February 22, 2024

NER with LLMs

  • Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
  • Textpresso server is kaput. Services need to be transferred onto Alliance servers.
  • There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.


February 15, 2024

Literature Migration to the Alliance ABC

Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)?

Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic
  • Facet for topic
  • Facet for automatic assertion
    • neural network method
  • Facet for confidence level
    • High
  • Facet for manual assertion
    • author assertion
      • ACKnowledge method
    • professional biocurator assertion
      • curation tools method - NULL
Manually validate paper - topic flags without curating
  • Facet for topic
  • Facet for manual assertion
    • professional biocurator assertion
      • ABC - no data
View all topic and entity flags for a given paper and validate, if needed
  • Search ABC with paper identifier
  • Migrate to Topic and Entity Editor
  • View all associated data
  • Manually validate flags, if needed

PDF Storage

  • At the Alliance PDFs will be stored in Amazon s3
  • We are not planning to formally store back-up copies elsewhere
  • Is this okay with everyone?

February 8, 2024

  • TAGC
    • Prominent announcement on the Alliance home page?
  • Fixed login on dockerized system (dev). Can everybody test their forms?

February 1, 2024

  • Paul will ask Natalia to take care of pending reimbursements
  • Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.

January 25, 2024

Curator Info on Curation Forms

  • Saving curator info using cookies in dockerized forms. Can we deploy to prod?

ACKnowledge Author Request - WBPaper00066091

  • I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
  • The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.
  • I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
  • Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.

Update on NN Classification via the Alliance

  • Use of primary/not primary/not designated flag to filter papers
  • Secondary filter on papers with at least C. elegans as species
  • Finalize sources (i.e. evidence) for entity and topic tags on papers
  • Next NN clasification scheduled for ~March
  • We decided to process all papers (even non-elegans species) and have filters on species after processing.
  • NNC html pages will show NNC values together with species.
  • Show all C. elegans papers first and other species in a separate bin.

Travel Reimbursements

  • Still waiting on October travel reimbursement (Kimberly)
  • Still waiting on September and October travel reimbursements (Wen)

UniProt

  • Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
  • Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
  • Stavros escalates the issue on Hinxton Standup.
  • Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.

January 18, 2024

  • OA showing different names highlighted when logging in the OA, now fixed on staging


January 11, 2024

  • Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
    • Curators should make sure that, when pasting special characters, the duplicate function works
  • OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
    • If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
  • Chris tested on staging and production the phenotype form and the data are still going to tazendra
    • Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
    • Raymond: simply set up forwarding at our end?
  • AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
  • Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
  • Valerio would like to use an alliancegenome.org email address for the openAI account
  • New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
  • Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
  • Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
  • Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
  • It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
  • Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.

January 4, 2024

  • ACKnowlegde pipeline help desk question:
    • Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
  • Citace upload, current deadline: Tuesday January 9th
    • All processes (dumps, etc.) will happen on the cloud machine
    • Curators need to deposit their files in the appropriate locations for Wen
  • Micropublication pipeline
    • Ticketing system confusion
    • Karen and Kimberly paper ID pipeline; may need sorting out of logistics