Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
Line 13: Line 13:
 
[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
= 2016 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_January_2016|January]]
+
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_February_2016|February]]
+
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_March_2016|March]]
+
[[WormBase-Caltech_Weekly_Calls_2021|2021 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_April_2016|April]]
+
[[WormBase-Caltech_Weekly_Calls_2022|2022 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_May_2016|May]]
+
[[WormBase-Caltech_Weekly_Calls_2023|2023 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_June_2016|June]]
+
==April 18th, 2024==
 +
*NNC pipeline being switched off locally and moving into the Alliance ABC.
  
[[WormBase-Caltech_Weekly_Calls_July_2016|July]]
+
==April 11th, 2024==
 +
*Caltech WS293 ace files ready for the upload
  
[[WormBase-Caltech_Weekly_Calls_August_2016|August]]
+
==April 4th, 2024==
 +
* Continued discussion on sustainability
 +
* CZI, single cell RNAseq for Alliance -> anything happening will be few months down the road
 +
** Data is still going to SPELL and enrichment analysis
 +
** Peter Roy asking about expression profile of a condition and find similar expression profiles (SPELL like analysis) but SPELL cannot currently deal with scRNAseq data. Wen says it is possible (regarding each cell group as an experiment). Can try loading the into SPELL. Doe it improve the function of SPELL? Only 5-10 datasets. These data are a bit different from bulk RNAseq.
 +
* Textpresso: good to have a presentation for other MODs to show Textpresso capabilities? Yes. Maybe during sprint review
 +
* Michael's presentation on LLMs - Named Entity Recognition (NER)
  
[[WormBase-Caltech_Weekly_Calls_September_2016|September]]
+
==March 14, 2024==
  
[[WormBase-Caltech_Weekly_Calls_October_2016|October]]
+
=== TAGC debrief ===
  
 +
==February 22, 2024==
  
== November 17, 2016 ==
+
===NER with LLMs===
*lineage tool good for featuring in AGR
 
*AGR curator call - SGD simpler, fewer datatype, impressed with triaging and organization of paper curation prioritization, curation is paper-by-paper, author response form is minimal - no specific author comments captured, just check boxes
 
*SGD moving spell to the cloud, and will give us an amazon image, into which we will be able to load WB data
 
*Caltech has an amazon cloud already for use for SPELL, but not enough for all our data
 
*SPELL built locally on textpresso, once built, data loaded up; users download a lot
 
  
*Post GO meeting
+
* Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
Pascal, David Hill and Kimberly piloting a pathway annotation - looking at PAINT annotations to create templates for curation of the signaling pathway in NOCTUA and look for ontology and other issues that need to be addressed - using the TOR pathway, will be useful in the AGR
 
  
*Strain requests will be sent from disease OA to Mary Ann
+
* Is this similar to the FlyBase system? Recording of presentation  https://drive.google.com/drive/folders/1S4kZidL7gvBH6SjF4IQujyReVVRf2cOK
*Strains included in nightly_genace.pl
 
  
*Next Thursday = Thanksgiving
+
* Textpresso server is kaput. Services need to be transferred onto Alliance servers.
*AGR meeting Dec 6, Paul S.  
 
  
== November 3, 2016 ==
+
* There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.
  
=== Micropublication update ===
+
* Alliance curation status form development needs use cases. ref https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls#February_15.2C_2024
  
* First Nemametrix micro in for WS257
 
* Two lined up for WS258
 
* Karen and Daniela in touch with e-publishing platforms (HighWire [http://home.highwire.org], Collaborative knowledge foundation[http://coko.foundation]), to set up collaborations for the Micropublication: biology e-journal [http://www.micropublicationbiology.org]
 
* Discussing putting worm breeder's gazette articles into micropublications
 
* Collaborative knowledge foundation is a non-profit publisher, some former member of Highwire
 
* Would be good to have a faster way to display micropublications (standard WB pipeline needs at least 4 months)
 
** Could use content management systems (CMSs)
 
* Maybe will propose a workshop for the 2017 international ''C. elegans'' meeting
 
  
=== Intellectual (Person) Lineage ===
 
* Example page: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/worm_lineage.cgi?action=lineage&twonumber=two363
 
* Example: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/worm_lineage.cgi?action=lineage&twonumber=two625
 
* Still a stand alone page
 
* Will make live once incorporated feedback
 
* Will display as widget on a WBPerson page
 
* May want to display or point to the lineage display on WB homepage
 
* Data relies on self-reporting connections
 
* Not entirely clear what the direction of the arrows means
 
* Can click and hold on a node to open up the graph for that person
 
* Help documentation still needs to be written up
 
  
=== Tim Schedl visiting WB tomorrow ===
+
==February 15, 2024==
* Will come to WB CIT around 11:30am
 
  
== November 10, 2016 ==
+
=== Literature Migration to the Alliance ABC ===
 +
==== Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)? ====
 +
===== Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic =====
 +
*Facet for topic
 +
*Facet for automatic assertion
 +
**neural network method
 +
*Facet for confidence level
 +
**High
 +
*Facet for manual assertion
 +
**author assertion
 +
***ACKnowledge method
 +
**professional biocurator assertion
 +
***curation tools method - NULL
  
=== Update from AGR portal use case working group ===
+
===== Manually validate paper - topic flags without curating =====
* Preparing document for the Dec. 6th leadership meeting
+
*Facet for topic
* Need to address several questions about prototype development process
+
*Facet for manual assertion
* Communication has been insufficient, leading to a lag in use case implementation and possibly off-target expectations
+
**professional biocurator assertion
* Working group would like to know:
+
***ABC - no data
** What are the boundaries of responsibility for the use case working group?
 
** Will there be a data integration working group to collect data as required for a use case?
 
** What group or groups does the use case working group need to hand off tasks to?
 
** What overall decisions about technologies need to be made about AGR before implementing use cases?
 
* Working on an idea to show comparisons between transcript and protein isoforms
 
* Would be great to be able to have a common display of phenotypes, using a ribbon-style and a standard ontology slim
 
  
=== Person Lineage graph ===
+
===== View all topic and entity flags for a given paper and validate, if needed =====
* current view [http://juancarlos.wormbase.org/resources/person/WBPerson363#02--10 Low complexity person], [http://juancarlos.wormbase.org/resources/person/WBPerson625#02--10 High complexity person]
+
* Search ABC with paper identifier
* ~31,000 orphans in DB; ~14,000 orphans with an email address
+
* Migrate to Topic and Entity Editor
* Do we need to distinguish between "real" WB people and those simply created as a co-author?
+
* View all associated data
* Will email ~14,000 people to ask them to provide person connections
+
* Manually validate flags, if needed
  
=== Micropublications ===
+
=== PDF Storage ===
* Karen and Daniela spoke to person at HighWire
+
* At the Alliance PDFs will be stored in Amazon s3
* Working on publication platform, getting cost estimates
+
* We are not planning to formally store back-up copies elsewhere
* Met with people at Collaborative Knowledge Foundation
+
* Is this okay with everyone?
* Similar in scope to HighWire, but have more modern software
 
* Will probably write a grant together with Collaborative Knowledge Foundation
 
* Also planning to apply for a Big Data to Knowledge grant
 
* Added two pubications from nemametrix, adding one more today
 
* Need a window of time to handle submissions given the turn around time for reviewers
 
* "Data not shown" references could be handled as micropublication
 
* Tim Schedl suggested doing a workshop at the 2017 IWM for micropublications
 
** The Future of Scholarly Communication
 
** Who else would be involved? GSA (WormBook, G3)? JoVE?
 
  
=== Wild isolate library (Erik Anderson) ===
+
==February 8, 2024==
* CeNDR (C. elegans Natural Diversity Resource)
+
* TAGC
* https://elegansvariation.org/
+
** Prominent announcement on the Alliance home page?
* Want to have links out to their database from WormBase
 
* WormBase already has 250 strains; many more that are not in WB
 
* Has 342 strains in total
 
* They would like to have integrated views of their database in WB; may be too difficult to have an integrated view
 
* WB will take in all of their strains
 
* CGC has most but not all of their strains
 
* Can discuss on next WB site-wide call
 
  
=== WormBase IWM Workshop ===
+
* Fixed login on dockerized system (dev). Can everybody test their forms?
* Ranjana will send in a workshop proposal
 
  
=== 2017 IWM ===
+
==February 1, 2024==
* What do we want to communicate/emphasize about WormBase at the 2017 IWM?
+
* Paul will ask Natalia to take care of pending reimbursements
* Maybe explain and discuss AGR; what it means to our users
+
* Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.
* What do we want to ask our user community for?
+
 
 +
==January 25, 2024==
 +
 
 +
=== Curator Info on Curation Forms ===
 +
* Saving curator info using cookies in dockerized forms. Can we deploy to prod?
 +
 
 +
=== ACKnowledge Author Request - WBPaper00066091 ===
 +
* I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
 +
 
 +
* The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.
 +
 
 +
* I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
 +
 
 +
* Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.
 +
 
 +
=== Update on NN Classification via the Alliance ===
 +
* Use of primary/not primary/not designated flag to filter papers
 +
* Secondary filter on papers with at least C. elegans as species
 +
* Finalize sources (i.e. evidence) for entity and topic tags on papers
 +
* Next NN clasification scheduled for ~March
 +
 
 +
* We decided to process all papers (even non-elegans species) and have filters on species after processing.
 +
* NNC html pages will show NNC values together with species.
 +
* Show all C. elegans papers first and other species in a separate bin.
 +
 
 +
=== Travel Reimbursements ===
 +
* Still waiting on October travel reimbursement (Kimberly)
 +
* Still waiting on September and October travel reimbursements (Wen)
 +
 
 +
=== UniProt ===
 +
* Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
 +
* Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
 +
* Stavros escalates the issue on Hinxton Standup.
 +
* Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.
 +
 
 +
==January 18, 2024==
 +
* OA showing different names highlighted when logging in the OA, now fixed on staging
 +
 
 +
 
 +
==January 11, 2024==
 +
* Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
 +
** Curators should make sure that, when pasting special characters, the duplicate function works
 +
* OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
 +
** If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
 +
* Chris tested on staging and production the phenotype form and the data are still going to tazendra
 +
** Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
 +
** Raymond: simply set up forwarding at our end?
 +
* AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
 +
* Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
 +
* Valerio would like to use an alliancegenome.org email address for the openAI account
 +
* New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
 +
** note: please move shared files that you own to new Alliance Google Drive.  Here is the link to the information that Chris Mungall sent:  For more instructions see the video and SOP here:https://agr-jira.atlassian.net/browse/SCRUM-925?focusedCommentId=40674
 +
* Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
 +
* Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
 +
* Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
 +
* It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
 +
* Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.
 +
 
 +
==January 4, 2024==
 +
* ACKnowlegde pipeline help desk question:
 +
** Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
 +
* Citace upload, current deadline: Tuesday January 9th
 +
** All processes (dumps, etc.) will happen on the cloud machine
 +
** Curators need to deposit their files in the appropriate locations for Wen
 +
* Micropublication pipeline
 +
** Ticketing system confusion
 +
** Karen and Kimberly paper ID pipeline; may need sorting out of logistics

Latest revision as of 16:04, 18 April 2024

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

2022 Meetings

2023 Meetings

April 18th, 2024

  • NNC pipeline being switched off locally and moving into the Alliance ABC.

April 11th, 2024

  • Caltech WS293 ace files ready for the upload

April 4th, 2024

  • Continued discussion on sustainability
  • CZI, single cell RNAseq for Alliance -> anything happening will be few months down the road
    • Data is still going to SPELL and enrichment analysis
    • Peter Roy asking about expression profile of a condition and find similar expression profiles (SPELL like analysis) but SPELL cannot currently deal with scRNAseq data. Wen says it is possible (regarding each cell group as an experiment). Can try loading the into SPELL. Doe it improve the function of SPELL? Only 5-10 datasets. These data are a bit different from bulk RNAseq.
  • Textpresso: good to have a presentation for other MODs to show Textpresso capabilities? Yes. Maybe during sprint review
  • Michael's presentation on LLMs - Named Entity Recognition (NER)

March 14, 2024

TAGC debrief

February 22, 2024

NER with LLMs

  • Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
  • Textpresso server is kaput. Services need to be transferred onto Alliance servers.
  • There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.


February 15, 2024

Literature Migration to the Alliance ABC

Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)?

Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic
  • Facet for topic
  • Facet for automatic assertion
    • neural network method
  • Facet for confidence level
    • High
  • Facet for manual assertion
    • author assertion
      • ACKnowledge method
    • professional biocurator assertion
      • curation tools method - NULL
Manually validate paper - topic flags without curating
  • Facet for topic
  • Facet for manual assertion
    • professional biocurator assertion
      • ABC - no data
View all topic and entity flags for a given paper and validate, if needed
  • Search ABC with paper identifier
  • Migrate to Topic and Entity Editor
  • View all associated data
  • Manually validate flags, if needed

PDF Storage

  • At the Alliance PDFs will be stored in Amazon s3
  • We are not planning to formally store back-up copies elsewhere
  • Is this okay with everyone?

February 8, 2024

  • TAGC
    • Prominent announcement on the Alliance home page?
  • Fixed login on dockerized system (dev). Can everybody test their forms?

February 1, 2024

  • Paul will ask Natalia to take care of pending reimbursements
  • Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.

January 25, 2024

Curator Info on Curation Forms

  • Saving curator info using cookies in dockerized forms. Can we deploy to prod?

ACKnowledge Author Request - WBPaper00066091

  • I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
  • The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.
  • I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
  • Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.

Update on NN Classification via the Alliance

  • Use of primary/not primary/not designated flag to filter papers
  • Secondary filter on papers with at least C. elegans as species
  • Finalize sources (i.e. evidence) for entity and topic tags on papers
  • Next NN clasification scheduled for ~March
  • We decided to process all papers (even non-elegans species) and have filters on species after processing.
  • NNC html pages will show NNC values together with species.
  • Show all C. elegans papers first and other species in a separate bin.

Travel Reimbursements

  • Still waiting on October travel reimbursement (Kimberly)
  • Still waiting on September and October travel reimbursements (Wen)

UniProt

  • Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
  • Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
  • Stavros escalates the issue on Hinxton Standup.
  • Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.

January 18, 2024

  • OA showing different names highlighted when logging in the OA, now fixed on staging


January 11, 2024

  • Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
    • Curators should make sure that, when pasting special characters, the duplicate function works
  • OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
    • If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
  • Chris tested on staging and production the phenotype form and the data are still going to tazendra
    • Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
    • Raymond: simply set up forwarding at our end?
  • AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
  • Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
  • Valerio would like to use an alliancegenome.org email address for the openAI account
  • New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
  • Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
  • Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
  • Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
  • It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
  • Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.

January 4, 2024

  • ACKnowlegde pipeline help desk question:
    • Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
  • Citace upload, current deadline: Tuesday January 9th
    • All processes (dumps, etc.) will happen on the cloud machine
    • Curators need to deposit their files in the appropriate locations for Wen
  • Micropublication pipeline
    • Ticketing system confusion
    • Karen and Kimberly paper ID pipeline; may need sorting out of logistics