Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
Line 1: Line 1:
 +
= Previous Years =
 +
 
[[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]
  
 
[[WormBase-Caltech_Weekly_Calls_2011|2011 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2011|2011 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2013|2013 Meetings]]
  
==2012 Meetings==
+
[[WormBase-Caltech_Weekly_Calls_2014|2014 Meetings]]
 
 
 
 
[[WormBase-Caltech_Weekly_Calls_January_2012|January]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_February_2012|February]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_March_2012|March]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_April_2012|April]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_May_2012|May]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_June_2012|June]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_July_2012|July]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_August_2012|August]]
 
 
 
 
 
 
 
== August 2, 2012 ==
 
 
 
Grant updates
 
*Topics
 
**Diseases
 
**Worm Phenotype ontology; attempt applying to other species? Some has been done already (e.g. C. briggsae)
 
***Benefit of curating phenotypes in other species? Particularly useful for genes not in C. elegans, for example
 
**Textpresso for nematodes
 
***>5000 papers now
 
***Complete set of papers by ~Labor Day
 
**Anatomy ontology
 
***Anatomy page is hub for data
 
****We should strive for user friendly display of information
 
****Cell functions, 10 or 100 highest/preferentially expressed genes, cell connections, cell signals
 
***Upcoming challenge - male/female/hermaphrodite divergence
 
***Anatomy with respect to life stage (e.g. life-stage-specific cells)
 
***Multiple species
 
***[http://www.obofoundry.org/wiki/index.php/UBERON:Main_Page Uberon] framework can be adapted for multiple (nematode) species
 
**Gene Function
 
***RNAi, Allele-phenotype, Transgene-phenotype etc.
 
**Interactions
 
***Integrated interaction model, genetic interaction ontology
 
***Can we estimate how many interactions are left to curate? Can use OA/Postgres, SVM, and first-pass author forms etc. to estimate
 
**Gene expression and Pictures
 
***Need to update Gene expression model to accommodate Epic dataset (John Murray), 3D movies (Bill Mohler), and single-molecule FISH (van Oudenarden et al)
 
***Itai Yania dataset (embryo expression across several nematode species)
 
***Expression SVM won't catch isolated tissue/cells expression analysis or microarray data
 
***Incorporating the [http://caltech.wormbase.org/virtualworm/ virtual worm] and [http://browser.openworm.org/ browser]
 
**Microarray and SPELL
 
***Will incorporate microarray and RNA-Seq data sets for other species
 
***Should let users download search results more easily (for single genes, for example)
 
***Need to change SPELL database to incorporate new species
 
***Users should be able to run clustering on data
 
***Co-expression correlation; should recalculate each build (with flexible significance thresholds)
 
***Provide Cytoscape view of genes connected by co-expression
 
**Pathways and Processes
 
***Plan to work with Wikipathways
 
***Vocabularies and annotation schemes like Systems Biology Graphical Notation (SBGN)
 
***Trying to get data into BioPAX (Biological PAthway eXchange) format
 
***BioPAX too detail oriented? Very biochemical?
 
***Some databases dump BioPAX format, but won't read it in
 
**Paper and curation pipeline
 
**Concise description progress; coverage
 
***Re-annotation efforts?
 
***New concise description curation interface for easier writing and updating
 
**Annotating genes in the more expressive GO format
 
***How would our data models need to change to curate with the new expressive GO
 
**SVMs
 
**Include collaborations?
 
***GSA markup; encouraging other journals to adopt?
 
***Web page links; electronic text books, etc.
 
***Can automate linking, but can't support manual QC without more (financial) support
 
**Supporting links to WormBase (in general)
 
***WormAtlas, for example
 
**Google-like entity info (e.g. George Washington) displayed on side of search results page
 
***We provide short write up of genes to Google?
 
***Google-funded? Google.org
 
**Transcriptional regulatory networks (TRNs)
 
***Gene regulation curation
 
***Limited number of data for Position weight matrices (PWMs) and TF-binding sites
 
***Consolidation of TF-binding/target-gene data into one place (ChIP-Seq/modENCODE data, PWMs, Gene regulation interactions)
 
***How to best visualize the available data?
 
***We can design a new visual scheme for TRNs
 
***We will curate enhancers?
 
*Suggestions for future
 
**Better integration across data types
 
**How the OA can evolve and what it can be used for?
 
 
 
 
 
== August 9, 2012 ==
 
 
 
 
 
[http://mangolassi.caltech.edu/~postgres/cgi-bin/svm_results.cgi SVM Analysis Form]
 
*Sandbox version available for testing
 
*Data stored on Postgres
 
*Daniela can show how to use
 
*SVM flags main papers and supplemental documents; should they be grouped into a single document or kept separate?
 
**Depends on curator
 
**Should have a direct (unambiguous) link to supplemental documents
 
*Can flag false positive papers
 
*Can query for papers on a batch-per-batch (by SVM analysis date) basis
 
*False negatives are automatically annotated as such when an SVM-negative paper is curated for the respective data type in the OA
 
*Curators CAN check SVM-negatives if they want to, but are not required to
 
*Can query if a specific paper (or papers) has been flagged (by SVM) for certain data types
 
*Proposed OA field to capture what supplemental document the data came from, if from supplement
 
  
 +
[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
  
Grant
+
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
*People can add new ideas/visions for future development of WormBase
 
**Visualization, integration, graphs, etc.
 
**How do we visualize complex information?
 
**Do we need to group data types for visualization? E.G. Transcriptional regulatory networks vs genome browsing
 
*Scaling?
 
*Dependency on ACEDB?
 
  
 +
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
== August 16, 2012 ==
+
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
  
Helpdesk
+
[[WormBase-Caltech_Weekly_Calls_2021|2021 Meetings]]
*GitHub link to e-mail?
 
*When an issue is submitted via the website, GitHub generates a unique e-mail which can be replied to. That e-mail thread is then included in the GitHub issue
 
*Who should close an issue? Ultimately, an issue/ticket should be closed by whichever WormBase staff addresses or resolves the issue
 
**If the issue is not closed by the one who resolves it, the help desk officer should follow up to check
 
*When is an issue resolved? May depend on the nature of the issue and on what has to be done
 
*Need project management tools? Redmine? Something else?
 
  
 +
[[WormBase-Caltech_Weekly_Calls_2022|2022 Meetings]]
  
Large scale projects
+
[[WormBase-Caltech_Weekly_Calls_2023|2023 Meetings]]
*Scripts and documentation used to deal with/handle large scale data should be put into GitHub
 
*Example, Itai Yanai expression data; just another microarray paper, but with new oligo sets?
 
*Need to store enough info to reproduce curation
 
*Store all large scale data sets and scripts, documentation, etc. on a single computer with regular backup
 
  
 +
==April 18th, 2024==
 +
*NNC pipeline being switched off locally and moving into the Alliance ABC.
  
 +
==April 11th, 2024==
 +
*Caltech WS293 ace files ready for the upload
  
== August 23, 2012 ==
+
==April 4th, 2024==
 +
* Continued discussion on sustainability
 +
* CZI, single cell RNAseq for Alliance -> anything happening will be few months down the road
 +
** Data is still going to SPELL and enrichment analysis
 +
** Peter Roy asking about expression profile of a condition and find similar expression profiles (SPELL like analysis) but SPELL cannot currently deal with scRNAseq data. Wen says it is possible (regarding each cell group as an experiment). Can try loading the into SPELL. Doe it improve the function of SPELL? Only 5-10 datasets. These data are a bit different from bulk RNAseq.
 +
* Textpresso: good to have a presentation for other MODs to show Textpresso capabilities? Yes. Maybe during sprint review
 +
* Michael's presentation on LLMs - Named Entity Recognition (NER)
  
 +
==March 14, 2024==
  
Transgene tables
+
=== TAGC debrief ===
*Not available on new site
 
*Broken on legacy site
 
*Integrated transgenes' location (which chromosome?)
 
*Static page was proposed for new site
 
*We (Caltech) could possibly put on the new WormBase Support section (which we have write-access to)
 
  
 +
==February 22, 2024==
  
Finding all labs in a region (user request)
+
===NER with LLMs===
*How to best identify all labs in, for example, England, Asia, South America, Canada, etc.
 
*Can search using patterns for e-mail address (not optimal)
 
*Maybe better search with physical mailing address, but need all country codes, and country-continent affiliations
 
*We can possibly create a script to generate a table every release
 
*Change data model to have a "Country" tag? And then programmatically assign continents based on Country tag
 
*Juancarlos can extract PI address info from Postgres and setup a CGI for future search
 
** Form at http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=ContinentPIs
 
  
 +
* Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
  
SVM Tool and Data Type Statistics
+
* Is this similar to the FlyBase system? Recording of presentation  https://drive.google.com/drive/folders/1S4kZidL7gvBH6SjF4IQujyReVVRf2cOK
*Chris would like to get Interaction numbers (how complete are we with curation)
 
*SVM Tool needs some tweaks:
 
**All papers are broken up into several documents, if there is supplementary material
 
**If a curator searches for all "Positive" papers, all papers that have at least ONE "Positive" document will be returned
 
**Conversely, if a curator searches for all "Negative" papers, all papers that have at least ONE "Negative" document will be returned
 
**This is a problem, since we would want all papers in which, ALL documents are "Negative"
 
*What are the benefits/drawbacks of separating a paper into multiple documents?
 
**Kimberly likes to have separate documents to reduce amount of data type searching to be done once we have SVM results
 
**Keeping multiple documents may complicate the search procedure, unless we can change the query process
 
*Juancarlos will create a filter step in the SVM tool such that the user/curator can specify if they would like to search on the basis of "whole" paper versus individual document
 
*In this way, searching for "whole" "Positive" papers will only return and display "Positive" whole papers for which at least one document in the paper is "Positive"; searching for "Negative" papers will return all papers for which ALL documents are "Negative"
 
*Searching on the basis of individual documents will work as it does now: displaying all papers as individual documents with their respective SVM results
 
*Juancarlos will also add a filter for "Primary" vs "Not Primary" vs "Not Designated"
 
  
 +
* Textpresso server is kaput. Services need to be transferred onto Alliance servers.
  
== August 30, 2012 ==
+
* There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.
  
Transgenes
+
* Alliance curation status form development needs use cases. ref https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls#February_15.2C_2024
*Transgenes have now transgene IDs
 
**each curator should check dumpers for next upload (in 2 months) on mangolassi and see that everything looks fine.
 
**Each curator should also make sure that all the transgene objects they have in their curation pipeline are converted into IDs. E.g. Kimberly had a bunch of transgenes that were not converted.
 
**implement transgene ontology in GO?
 
  
  
Dead genes
 
*How do we deal with dead genes?
 
*Juancarlos mentioned that on Tazendra if dead genes are mapped, then those maps are reflected; there is no mapping otherwise.
 
*Wen normally replaces dead genes into new ones. She will double check the scripts and see what is happening when a dead gene is found.
 
  
 +
==February 15, 2024==
  
Grant
+
=== Literature Migration to the Alliance ABC ===
*by tomorrow evening everyone should finish to write up his part on the google docs as Paul will remove it from there.
+
==== Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)? ====
*each curator should estimate the effort that is going to take for his own data type in terms of time, important for budgeting.
+
===== Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic =====
*everyone should add real references at the bottom of the document. In the text only 'Author, Date'
+
*Facet for topic
 +
*Facet for automatic assertion
 +
**neural network method
 +
*Facet for confidence level
 +
**High
 +
*Facet for manual assertion
 +
**author assertion
 +
***ACKnowledge method
 +
**professional biocurator assertion
 +
***curation tools method - NULL
  
 +
===== Manually validate paper - topic flags without curating =====
 +
*Facet for topic
 +
*Facet for manual assertion
 +
**professional biocurator assertion
 +
***ABC - no data
  
SVM on other nematodes
+
===== View all topic and entity flags for a given paper and validate, if needed =====
*Daniela and Yuling are trying SVM on other nematodes (Pacificus as first trial) to estimate how many papers could be positive for otherexpr and to see if SVM could be used for triage for other species.
+
* Search ABC with paper identifier
 +
* Migrate to Topic and Entity Editor
 +
* View all associated data
 +
* Manually validate flags, if needed
  
 +
=== PDF Storage ===
 +
* At the Alliance PDFs will be stored in Amazon s3
 +
* We are not planning to formally store back-up copies elsewhere
 +
* Is this okay with everyone?
  
Papers with no PMIDs
+
==February 8, 2024==
*James pointed out that many papers (for other nematodes) did not have PMIDs
+
* TAGC
*Paul suggested to check in agricultural databases e.g. Agricola
+
** Prominent announcement on the Alliance home page?
  
 +
* Fixed login on dockerized system (dev). Can everybody test their forms?
  
Canopus died -R.I.P.
+
==February 1, 2024==
*Canopus did not have any backup but everyone knew
+
* Paul will ask Natalia to take care of pending reimbursements
*The new Canopus will be used as server 
+
* Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.
*Raymond bought a new HD
 
  
 +
==January 25, 2024==
  
Nikhil Bhatla join the meeting from MIT with Xiaodong
+
=== Curator Info on Curation Forms ===
*Nikhil started curating neural connectivity in wormweb.org
+
* Saving curator info using cookies in dockerized forms. Can we deploy to prod?
*he shared some ideas to implement some data types in Wormbase
 
**expression -> nikhil started curating images of expression in the neuralnet portal -e.g. http://wormweb.org/neuralnet#c=AVA&m=1. Daniela will get in touch with him and see how we can set up a crosstalk with pictures in Wormbase
 
**loss of function studies. Raymond will talk to Nikhil and see how he can contribute to curation. We have part of loss of function curation in anatomy function -site of action, direct manipulation of anatomical part, ablation.
 
**gain of function-> we have some gain of function in overexpression. Karen mentioned she annotates overexpression and phenotypes and when possible she associates with cell.
 
**physiology -> stimulus dependent response. e.g. to a given stimulus there is a Ca2+ increase. We should be able to capture that info, it is worth thinking about it.
 
Raymond mentioned that some of these info are already present in phenotype, Xiaodong said some are also in gene regulation. Paul suggested that we need to tie everything together and have a more unified view, more data integration of what we already have.
 
**Correlation physiology? How to implement correlation physiology data (see Hendricks et al, 2012 http://www.ncbi.nlm.nih.gov/pubmed/22722842)
 
**Nikhil mentioned it would be useful for instance to be able to have a query such as: display all cells that respond to a certain stimulus, e.g. increase of Temperature.
 
  
 +
=== ACKnowledge Author Request - WBPaper00066091 ===
 +
* I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
  
Flybase
+
* The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.  
*Susan contacted Ranjana about our disease pipeline. They would like to use the pipeline for Flybase. They were talking about having GO-style curation with evidence codes for human disease relevance tag.
 
  
 +
* I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
  
QCFast for GO
+
* Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.
*James will give a demo next week at our regular group meeting. The QCFast for GO is coming along very well. The QCFast for GO will be the precursor of the next generation curation interface.
 
  
 +
=== Update on NN Classification via the Alliance ===
 +
* Use of primary/not primary/not designated flag to filter papers
 +
* Secondary filter on papers with at least C. elegans as species
 +
* Finalize sources (i.e. evidence) for entity and topic tags on papers
 +
* Next NN clasification scheduled for ~March
  
Database paper
+
* We decided to process all papers (even non-elegans species) and have filters on species after processing.
*Kimberly is revising the Database paper. She is including author accuracy and identification of data types. It can vary from 70% (Raymond) to 98% (Juancarlos, Gary, Daniela) (not Juancarlos, I don't have afp_ data -- J)
+
* NNC html pages will show NNC values together with species.  
 +
* Show all C. elegans papers first and other species in a separate bin.
  
 +
=== Travel Reimbursements ===
 +
* Still waiting on October travel reimbursement (Kimberly)
 +
* Still waiting on September and October travel reimbursements (Wen)
  
 +
=== UniProt ===
 +
* Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
 +
* Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
 +
* Stavros escalates the issue on Hinxton Standup.
 +
* Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.
  
== September 6, 2012 ==
+
==January 18, 2024==
 +
* OA showing different names highlighted when logging in the OA, now fixed on staging
  
Canopus
 
*Will become a server (heavy outside access)
 
*Raymond will take precautions to restore properly; should take a week or less
 
*Used for:
 
**Picture curation
 
**Virtual worm web page & FTP
 
  
 +
==January 11, 2024==
 +
* Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
 +
** Curators should make sure that, when pasting special characters, the duplicate function works
 +
* OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
 +
** If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
 +
* Chris tested on staging and production the phenotype form and the data are still going to tazendra
 +
** Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
 +
** Raymond: simply set up forwarding at our end?
 +
* AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
 +
* Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
 +
* Valerio would like to use an alliancegenome.org email address for the openAI account
 +
* New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
 +
** note: please move shared files that you own to new Alliance Google Drive.  Here is the link to the information that Chris Mungall sent:  For more instructions see the video and SOP here:https://agr-jira.atlassian.net/browse/SCRUM-925?focusedCommentId=40674
 +
* Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
 +
* Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
 +
* Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
 +
* It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
 +
* Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.
  
Grant Writing
+
==January 4, 2024==
*Approach & future plans - do we need to discuss and add more to proposal?
+
* ACKnowlegde pipeline help desk question:
*Virtual worm access to data
+
** Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
*Browsing data
+
* Citace upload, current deadline: Tuesday January 9th
**Browse genes/proteins by class/GO annotation
+
** All processes (dumps, etc.) will happen on the cloud machine
*Intermine queries
+
** Curators need to deposit their files in the appropriate locations for Wen
**Queries can be saved and used again later (available to other users)
+
* Micropublication pipeline
**Results can be stored and displayed on web pages
+
** Ticketing system confusion
*Displaying large scale expression (microarray) data (from SPELL)
+
** Karen and Kimberly paper ID pipeline; may need sorting out of logistics
**SPELL data currently stored in relational (MySQL?) form
 
**We could extend the WormBase web app to interact with SPELL data
 
**RESTful-compliant data serving would be best
 

Latest revision as of 16:04, 18 April 2024

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

2022 Meetings

2023 Meetings

April 18th, 2024

  • NNC pipeline being switched off locally and moving into the Alliance ABC.

April 11th, 2024

  • Caltech WS293 ace files ready for the upload

April 4th, 2024

  • Continued discussion on sustainability
  • CZI, single cell RNAseq for Alliance -> anything happening will be few months down the road
    • Data is still going to SPELL and enrichment analysis
    • Peter Roy asking about expression profile of a condition and find similar expression profiles (SPELL like analysis) but SPELL cannot currently deal with scRNAseq data. Wen says it is possible (regarding each cell group as an experiment). Can try loading the into SPELL. Doe it improve the function of SPELL? Only 5-10 datasets. These data are a bit different from bulk RNAseq.
  • Textpresso: good to have a presentation for other MODs to show Textpresso capabilities? Yes. Maybe during sprint review
  • Michael's presentation on LLMs - Named Entity Recognition (NER)

March 14, 2024

TAGC debrief

February 22, 2024

NER with LLMs

  • Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
  • Textpresso server is kaput. Services need to be transferred onto Alliance servers.
  • There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.


February 15, 2024

Literature Migration to the Alliance ABC

Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)?

Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic
  • Facet for topic
  • Facet for automatic assertion
    • neural network method
  • Facet for confidence level
    • High
  • Facet for manual assertion
    • author assertion
      • ACKnowledge method
    • professional biocurator assertion
      • curation tools method - NULL
Manually validate paper - topic flags without curating
  • Facet for topic
  • Facet for manual assertion
    • professional biocurator assertion
      • ABC - no data
View all topic and entity flags for a given paper and validate, if needed
  • Search ABC with paper identifier
  • Migrate to Topic and Entity Editor
  • View all associated data
  • Manually validate flags, if needed

PDF Storage

  • At the Alliance PDFs will be stored in Amazon s3
  • We are not planning to formally store back-up copies elsewhere
  • Is this okay with everyone?

February 8, 2024

  • TAGC
    • Prominent announcement on the Alliance home page?
  • Fixed login on dockerized system (dev). Can everybody test their forms?

February 1, 2024

  • Paul will ask Natalia to take care of pending reimbursements
  • Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.

January 25, 2024

Curator Info on Curation Forms

  • Saving curator info using cookies in dockerized forms. Can we deploy to prod?

ACKnowledge Author Request - WBPaper00066091

  • I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
  • The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.
  • I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
  • Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.

Update on NN Classification via the Alliance

  • Use of primary/not primary/not designated flag to filter papers
  • Secondary filter on papers with at least C. elegans as species
  • Finalize sources (i.e. evidence) for entity and topic tags on papers
  • Next NN clasification scheduled for ~March
  • We decided to process all papers (even non-elegans species) and have filters on species after processing.
  • NNC html pages will show NNC values together with species.
  • Show all C. elegans papers first and other species in a separate bin.

Travel Reimbursements

  • Still waiting on October travel reimbursement (Kimberly)
  • Still waiting on September and October travel reimbursements (Wen)

UniProt

  • Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
  • Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
  • Stavros escalates the issue on Hinxton Standup.
  • Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.

January 18, 2024

  • OA showing different names highlighted when logging in the OA, now fixed on staging


January 11, 2024

  • Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
    • Curators should make sure that, when pasting special characters, the duplicate function works
  • OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
    • If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
  • Chris tested on staging and production the phenotype form and the data are still going to tazendra
    • Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
    • Raymond: simply set up forwarding at our end?
  • AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
  • Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
  • Valerio would like to use an alliancegenome.org email address for the openAI account
  • New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
  • Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
  • Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
  • Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
  • It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
  • Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.

January 4, 2024

  • ACKnowlegde pipeline help desk question:
    • Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
  • Citace upload, current deadline: Tuesday January 9th
    • All processes (dumps, etc.) will happen on the cloud machine
    • Curators need to deposit their files in the appropriate locations for Wen
  • Micropublication pipeline
    • Ticketing system confusion
    • Karen and Kimberly paper ID pipeline; may need sorting out of logistics