Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
Line 1: Line 1:
 +
= Previous Years =
 +
 
[[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2011|2011 Meetings]]
  
==2011 Meetings==
+
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]
 
 
[[WormBase-Caltech_Weekly_Calls_February_2011|February]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_March_2011|March]]
 
 
 
  
 +
[[WormBase-Caltech_Weekly_Calls_2013|2013 Meetings]]
  
==April 7, 2011==
+
[[WormBase-Caltech_Weekly_Calls_2014|2014 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
  
Transgene Model
+
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
*On Wiki
 
*Sent out to people
 
*Have a look; report any concerns
 
*Can follow on BitBucket; search for transgene; link to Wiki
 
*No objections at Caltech; Karen will send to Paul Davis
 
*Changes to ACE dumping script; Karen will talk to Juancarlos
 
*Changes needed in OA (softer deadline than dump)
 
  
 +
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
Interactions
+
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
*Murky genetic interaction curation?
 
*Err on the side of generality/trusting author statements
 
*When in doubt, curate as "genetic interaction"
 
*Chris is working on decision tree/pipeline for curation
 
*Kimberly working on Physical Interaction model
 
  
 +
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
BioGRID meeting at Princeton in May
+
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
*Call in
 
*What will Rose propose?
 
  
 +
[[WormBase-Caltech_Weekly_Calls_2021|2021 Meetings]]
  
Expression Pattern Curation (Daniela/Wen)
+
[[WormBase-Caltech_Weekly_Calls_2022|2022 Meetings]]
*Daniela sent out picture page for review
 
*Expr Pattern OA wiki is in place:
 
**http://wiki.wormbase.org/index.php/Expression_Pattern
 
*As soon as Juancarlos is done with the modularization will start working on the code.
 
*In the meanwhile Daniela will curate expression pattern writing .ace files
 
*Expr_pattern OA should be ready by the next upload (May26th). (I really doubt this, parsing in data, writing dumpers, and checking it take a long time.  Picture and Interaction each probably took longer than 2 months, and we're not starting Expr until May at the earliest -- Juancarlos)
 
  
 +
[[WormBase-Caltech_Weekly_Calls_2023|2023 Meetings]]
  
Patch file/Interbuild (Raymond)
 
*Developed good patch file
 
*Tested patch file to update WS224 to WS225 - seems OK
 
*Less than 5 minutes for upload
 
*Testing now should be done by Todd/OICR team
 
  
 +
==March 14, 2024==
  
Uma started
+
=== TAGC debrief ===
*Working on concise descriptions of gene classes
 
*Karen has reviewed with Uma; Uma is reading papers
 
*Discussing details of descriptions
 
*Inconsistencies/discrepancies of gene class names
 
*>2400 gene classes
 
*Can work on generating formula for this curation
 
*Arun can help with automation
 
*May need to get Uma an interface to enter data into postgres
 
*Adapt concise description CGI for her? (probably write a whole new interface depending on goal -- Juancarlos)
 
*Gene class name and a text field
 
*Using Textpresso/WormMart output; sentence saver?
 
  
 +
==February 22, 2024==
  
eggNOG data into citace?
+
===NER with LLMs===
*Who's going to handle the data? curate?
 
*Michael? OK
 
  
 +
* Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
  
 +
* Is this similar to the FlyBase system? Recording of presentation  https://drive.google.com/drive/folders/1S4kZidL7gvBH6SjF4IQujyReVVRf2cOK
  
==April 14, 2011==
+
* Textpresso server is kaput. Services need to be transferred onto Alliance servers.
  
Gene Class Descriptions
+
* There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.
*Concerns about maintenance and redundancy
 
*Uma here for ~ 3 months
 
*How many gene classes have alleles?
 
*How many are named by phenotype rather than just molecular data?
 
*How is this different from gene concise descriptions?
 
*Should it be a summary of all gene concise descriptions of the class?
 
*Things currently focused on:
 
**using WormMart to look at genes in a class
 
**pulls out all concise descriptions
 
**look at similarities
 
**interesting things to highlight
 
*Gene concise descriptions vs class descriptions
 
**Gene-centric vs Class-centric
 
**Consolidating/pooling all concise descriptions from individual genes?
 
*Going for maintenance-free statements
 
*Potentially building an interface
 
*Richard Durbin: development vs behavior?
 
*Prioritization?
 
*Focus on phenotype-based classes like UNC?
 
*Factors for prioritization:
 
**Numbers of genes curated
 
**molecular vs phenotype-based
 
**Amount of info currently available?
 
**Historical points
 
**Most actively worked currently? (most mentioned in last year's publications?)
 
*Uma and Karen could communicate with Kimberly and Ranjana about
 
*What is most efficient for Uma to focus on?
 
*Uma can look at gene class description makes sense
 
*Skip gene classes for which only one gene exists
 
*GO term stats on each class?
 
  
 +
* Alliance curation status form development needs use cases. ref https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls#February_15.2C_2024
  
Papers missing from Textpresso
 
*Issue: Genetics papers for GSA markup are missing from SVM analysis
 
*Juancarlos' file on caprica
 
*Discrepancy between papers on Textpresso and those gone through SVM
 
*SVM doesn't pick up GSA papers
 
*Generate a filtering to detect which ones have been missed by SVM
 
*Michael looking into reasons why the pipeline isn't working
 
*Tazendra vs Textpresso discrepancies?
 
*Ruihua will process 56 missing papers retroactively
 
*Still working on how to avoid this in the future
 
  
  
 +
==February 15, 2024==
  
==April 21, 2011==
+
=== Literature Migration to the Alliance ABC ===
 +
==== Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)? ====
 +
===== Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic =====
 +
*Facet for topic
 +
*Facet for automatic assertion
 +
**neural network method
 +
*Facet for confidence level
 +
**High
 +
*Facet for manual assertion
 +
**author assertion
 +
***ACKnowledge method
 +
**professional biocurator assertion
 +
***curation tools method - NULL
  
 +
===== Manually validate paper - topic flags without curating =====
 +
*Facet for topic
 +
*Facet for manual assertion
 +
**professional biocurator assertion
 +
***ABC - no data
  
WormMart
+
===== View all topic and entity flags for a given paper and validate, if needed =====
*Stable for IWM?
+
* Search ABC with paper identifier
*WormMart presentation for WormBase workshop
+
* Migrate to Topic and Entity Editor
**Query examples
+
* View all associated data
**Data content
+
* Manually validate flags, if needed
**Features
 
**Plans for the next year; what data to be made available
 
**Discuss stability?
 
  
 +
=== PDF Storage ===
 +
* At the Alliance PDFs will be stored in Amazon s3
 +
* We are not planning to formally store back-up copies elsewhere
 +
* Is this okay with everyone?
  
Igor's machine: elbrus
+
==February 8, 2024==
*Can have problems
+
* TAGC
*Input info has changed for WS225
+
** Prominent announcement on the Alliance home page?
*RNAi script stopped working
 
*Migration issues
 
**Decide on priorities
 
**Who will maintain what?
 
**Migrate things over to newer machines, locally
 
*On elbrus now:
 
**RNAi scripts (sequence mapping); Migrating to EBI
 
**Anatomy ontology browser (optional)
 
**Microarray tool (Wen will take over)
 
**Get-Sequence CGI (how many people use it?)
 
*Meet with Norie at IWM?
 
*We should write everything down first
 
  
 +
* Fixed login on dockerized system (dev). Can everybody test their forms?
  
BioGRID-WormBase data exchange:
+
==February 1, 2024==
*Questions about how to best do this
+
* Paul will ask Natalia to take care of pending reimbursements
*How to handle/organize genetic interactions
+
* Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.
*BioGRID meeting in May; will not be sufficient to solve the genetic interaction issues
 
*Still working on genetic interaction organization scheme
 
  
 +
==January 25, 2024==
  
SPELL/Microarray
+
=== Curator Info on Curation Forms ===
*Discuss at IWM?
+
* Saving curator info using cookies in dockerized forms. Can we deploy to prod?
*Patching changes into WS225
 
  
 +
=== ACKnowledge Author Request - WBPaper00066091 ===
 +
* I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
  
ModENCODE Meeting
+
* The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.
*Gary Williams should go
 
  
 +
* I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
  
Picture Page
+
* Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.
*New developments
 
*Elsevier publishing; how to handle them? ($17.5 per figure!!!)
 
  
 +
=== Update on NN Classification via the Alliance ===
 +
* Use of primary/not primary/not designated flag to filter papers
 +
* Secondary filter on papers with at least C. elegans as species
 +
* Finalize sources (i.e. evidence) for entity and topic tags on papers
 +
* Next NN clasification scheduled for ~March
  
Informatics Resources Assessment
+
* We decided to process all papers (even non-elegans species) and have filters on species after processing.
*Can we develop plan to determine informatics resources available
+
* NNC html pages will show NNC values together with species.
*Long-term plans for sustainable informatics resources
+
* Show all C. elegans papers first and other species in a separate bin.
  
 +
=== Travel Reimbursements ===
 +
* Still waiting on October travel reimbursement (Kimberly)
 +
* Still waiting on September and October travel reimbursements (Wen)
  
 +
=== UniProt ===
 +
* Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
 +
* Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
 +
* Stavros escalates the issue on Hinxton Standup.
 +
* Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.
  
==April 28, 2011==
+
==January 18, 2024==
 +
* OA showing different names highlighted when logging in the OA, now fixed on staging
  
Cecilia - Person report
 
*"AKA" ("Also Known As") added manually
 
**"also publishes as" created automatically based on verified author-person connections
 
**Over populating AKA manually is redundant
 
**Populating unique or new AKAs manually are still necessary
 
*Verified names happen automatically
 
**Author-Person connections happen with a script that Cecilia runs weekly manually
 
**Creating connections if an author name matches to exactly 1 person's name / aka (if 0 or 2+ matches then no connection occurs)
 
**Connection verifications happen weekly with a script that Cecilia also runs manually based on other verified authors in the paper sharing a Lineage or Laboratory
 
**Manual verification are emailed monthly to Persons who verify.
 
*Connection to person in GSA markup pipeline
 
**Automatic if script unambiguously identifies one individual person
 
**Karen (/ Daniela ?) will keep manual touch with Cecilia to create people for GSA because people links are necessary for the URLs.
 
*Extracting person info from lab web pages, papers, worm meeting registration?
 
**This takes varying amounts of time for questionable benefit
 
**Concensus at meeting was that lab website + PI website + papers + meetings are good sources. 
 
*WormBase policy? Should unverified person info be included?
 
**Raymond had question about whether it's a good idea in general to create people without explicit contact with the person via email.
 
**Juancarlos agrees that it's better data if they explicitly contact, but it's not necessarily better in a practical sense and anyone can spoof an email.
 
*Address? Remove details irrelevant to mailing address? Institute name?
 
**Leave it as is
 
*Prioritize name and e-mail verification for new paper/person connections?
 
**Juancarlos + Cecilia + Raymond will talk to Paul about priorities
 
  
 +
==January 11, 2024==
 +
* Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
 +
** Curators should make sure that, when pasting special characters, the duplicate function works
 +
* OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
 +
** If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
 +
* Chris tested on staging and production the phenotype form and the data are still going to tazendra
 +
** Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
 +
** Raymond: simply set up forwarding at our end?
 +
* AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
 +
* Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
 +
* Valerio would like to use an alliancegenome.org email address for the openAI account
 +
* New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
 +
** note: please move shared files that you own to new Alliance Google Drive.  Here is the link to the information that Chris Mungall sent:  For more instructions see the video and SOP here:https://agr-jira.atlassian.net/browse/SCRUM-925?focusedCommentId=40674
 +
* Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
 +
* Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
 +
* Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
 +
* It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
 +
* Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.
  
Karen - Abby, Aldrin Montana - Google Summer of Code
+
==January 4, 2024==
*Aldrin didn't get accepted
+
* ACKnowlegde pipeline help desk question:
*asks if he could work with us anyway?
+
** Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
*11 weeks during summer, full time
+
* Citace upload, current deadline: Tuesday January 9th
*would work remotely
+
** All processes (dumps, etc.) will happen on the cloud machine
*any ideas? modENCODE, microarray, pre-canned queries, Reactome (assigning confidence values to pathways)
+
** Curators need to deposit their files in the appropriate locations for Wen
*we can collect more ideas
+
* Micropublication pipeline
*has coding experience (js, Perl, CSS), just not much with bioinformatics, he wants to learn
+
** Ticketing system confusion
 +
** Karen and Kimberly paper ID pipeline; may need sorting out of logistics

Revision as of 18:18, 14 March 2024

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

2022 Meetings

2023 Meetings


March 14, 2024

TAGC debrief

February 22, 2024

NER with LLMs

  • Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
  • Textpresso server is kaput. Services need to be transferred onto Alliance servers.
  • There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.


February 15, 2024

Literature Migration to the Alliance ABC

Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)?

Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic
  • Facet for topic
  • Facet for automatic assertion
    • neural network method
  • Facet for confidence level
    • High
  • Facet for manual assertion
    • author assertion
      • ACKnowledge method
    • professional biocurator assertion
      • curation tools method - NULL
Manually validate paper - topic flags without curating
  • Facet for topic
  • Facet for manual assertion
    • professional biocurator assertion
      • ABC - no data
View all topic and entity flags for a given paper and validate, if needed
  • Search ABC with paper identifier
  • Migrate to Topic and Entity Editor
  • View all associated data
  • Manually validate flags, if needed

PDF Storage

  • At the Alliance PDFs will be stored in Amazon s3
  • We are not planning to formally store back-up copies elsewhere
  • Is this okay with everyone?

February 8, 2024

  • TAGC
    • Prominent announcement on the Alliance home page?
  • Fixed login on dockerized system (dev). Can everybody test their forms?

February 1, 2024

  • Paul will ask Natalia to take care of pending reimbursements
  • Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.

January 25, 2024

Curator Info on Curation Forms

  • Saving curator info using cookies in dockerized forms. Can we deploy to prod?

ACKnowledge Author Request - WBPaper00066091

  • I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
  • The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.
  • I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
  • Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.

Update on NN Classification via the Alliance

  • Use of primary/not primary/not designated flag to filter papers
  • Secondary filter on papers with at least C. elegans as species
  • Finalize sources (i.e. evidence) for entity and topic tags on papers
  • Next NN clasification scheduled for ~March
  • We decided to process all papers (even non-elegans species) and have filters on species after processing.
  • NNC html pages will show NNC values together with species.
  • Show all C. elegans papers first and other species in a separate bin.

Travel Reimbursements

  • Still waiting on October travel reimbursement (Kimberly)
  • Still waiting on September and October travel reimbursements (Wen)

UniProt

  • Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
  • Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
  • Stavros escalates the issue on Hinxton Standup.
  • Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.

January 18, 2024

  • OA showing different names highlighted when logging in the OA, now fixed on staging


January 11, 2024

  • Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
    • Curators should make sure that, when pasting special characters, the duplicate function works
  • OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
    • If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
  • Chris tested on staging and production the phenotype form and the data are still going to tazendra
    • Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
    • Raymond: simply set up forwarding at our end?
  • AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
  • Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
  • Valerio would like to use an alliancegenome.org email address for the openAI account
  • New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
  • Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
  • Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
  • Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
  • It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
  • Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.

January 4, 2024

  • ACKnowlegde pipeline help desk question:
    • Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
  • Citace upload, current deadline: Tuesday January 9th
    • All processes (dumps, etc.) will happen on the cloud machine
    • Curators need to deposit their files in the appropriate locations for Wen
  • Micropublication pipeline
    • Ticketing system confusion
    • Karen and Kimberly paper ID pipeline; may need sorting out of logistics