Latest revision as of 16:04, 18 April 2024

Previous Years

April 18th, 2024

NNC pipeline being switched off locally and moving into the Alliance ABC.

April 11th, 2024

Caltech WS293 ace files ready for the upload

April 4th, 2024

Continued discussion on sustainability
CZI, single cell RNAseq for Alliance -> anything happening will be few months down the road
- Data is still going to SPELL and enrichment analysis
- Peter Roy asking about expression profile of a condition and find similar expression profiles (SPELL like analysis) but SPELL cannot currently deal with scRNAseq data. Wen says it is possible (regarding each cell group as an experiment). Can try loading the into SPELL. Doe it improve the function of SPELL? Only 5-10 datasets. These data are a bit different from bulk RNAseq.
Textpresso: good to have a presentation for other MODs to show Textpresso capabilities? Yes. Maybe during sprint review
Michael's presentation on LLMs - Named Entity Recognition (NER)

March 14, 2024

TAGC debrief

February 22, 2024

NER with LLMs

Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)

Is this similar to the FlyBase system? Recording of presentation https://drive.google.com/drive/folders/1S4kZidL7gvBH6SjF4IQujyReVVRf2cOK

Textpresso server is kaput. Services need to be transferred onto Alliance servers.

There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.

Alliance curation status form development needs use cases. ref https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls#February_15.2C_2024

February 15, 2024

Literature Migration to the Alliance ABC

Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)?

Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic

Facet for topic
Facet for automatic assertion
- neural network method
Facet for confidence level
- High
Facet for manual assertion
- author assertion
  - ACKnowledge method
- professional biocurator assertion
  - curation tools method - NULL

Manually validate paper - topic flags without curating

Facet for topic
Facet for manual assertion
- professional biocurator assertion
  - ABC - no data

View all topic and entity flags for a given paper and validate, if needed

Search ABC with paper identifier
Migrate to Topic and Entity Editor
View all associated data
Manually validate flags, if needed

PDF Storage

At the Alliance PDFs will be stored in Amazon s3
We are not planning to formally store back-up copies elsewhere
Is this okay with everyone?

February 8, 2024

TAGC
- Prominent announcement on the Alliance home page?

Fixed login on dockerized system (dev). Can everybody test their forms?

February 1, 2024

Paul will ask Natalia to take care of pending reimbursements
Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.

January 25, 2024

Curator Info on Curation Forms

Saving curator info using cookies in dockerized forms. Can we deploy to prod?

ACKnowledge Author Request - WBPaper00066091

I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.

The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.

I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.

Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.

Update on NN Classification via the Alliance

Use of primary/not primary/not designated flag to filter papers
Secondary filter on papers with at least C. elegans as species
Finalize sources (i.e. evidence) for entity and topic tags on papers
Next NN clasification scheduled for ~March

We decided to process all papers (even non-elegans species) and have filters on species after processing.
NNC html pages will show NNC values together with species.
Show all C. elegans papers first and other species in a separate bin.

Travel Reimbursements

Still waiting on October travel reimbursement (Kimberly)
Still waiting on September and October travel reimbursements (Wen)

UniProt

Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
Stavros escalates the issue on Hinxton Standup.
Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.

January 18, 2024

OA showing different names highlighted when logging in the OA, now fixed on staging

January 11, 2024

Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
- Curators should make sure that, when pasting special characters, the duplicate function works
OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
- If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
Chris tested on staging and production the phenotype form and the data are still going to tazendra
- Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
- Raymond: simply set up forwarding at our end?
AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
Valerio would like to use an alliancegenome.org email address for the openAI account
New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
- note: please move shared files that you own to new Alliance Google Drive. Here is the link to the information that Chris Mungall sent: For more instructions see the video and SOP here:https://agr-jira.atlassian.net/browse/SCRUM-925?focusedCommentId=40674
Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.

January 4, 2024

ACKnowlegde pipeline help desk question:
- Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
Citace upload, current deadline: Tuesday January 9th
- All processes (dumps, etc.) will happen on the cloud machine
- Curators need to deposit their files in the appropriate locations for Wen
Micropublication pipeline
- Ticketing system confusion
- Karen and Kimberly paper ID pipeline; may need sorting out of logistics

Difference between revisions of "WormBase-Caltech Weekly Calls"

Latest revision as of 16:04, 18 April 2024

Contents

Previous Years

April 18th, 2024

April 11th, 2024

April 4th, 2024

March 14, 2024

TAGC debrief

February 22, 2024

NER with LLMs

February 15, 2024

Literature Migration to the Alliance ABC

Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)?

Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic

Manually validate paper - topic flags without curating

View all topic and entity flags for a given paper and validate, if needed

PDF Storage

February 8, 2024

February 1, 2024

January 25, 2024

Curator Info on Curation Forms

ACKnowledge Author Request - WBPaper00066091

Update on NN Classification via the Alliance

Travel Reimbursements

UniProt

January 18, 2024

January 11, 2024

January 4, 2024

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 11: / Line 11: @@
 [[WormBase-Caltech_Weekly_Calls_2014|2014 Meetings]]
+[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
-= 2015 Meetings =
+[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
-[[WormBase-Caltech_Weekly_Calls_January_2015|January]]
+[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
-[[WormBase-Caltech_Weekly_Calls_February_2015|February]]
+[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
-[[WormBase-Caltech_Weekly_Calls_March_2015|March]]
+[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
-[[WormBase-Caltech_Weekly_Calls_April_2015|April]]
+[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
-[[WormBase-Caltech_Weekly_Calls_May_2015|May]]
+[[WormBase-Caltech_Weekly_Calls_2021|2021 Meetings]]
-[[WormBase-Caltech_Weekly_Calls_June_2015|June]]
+[[WormBase-Caltech_Weekly_Calls_2022|2022 Meetings]]
+[[WormBase-Caltech_Weekly_Calls_2023|2023 Meetings]]
-== July 2, 2015 ==
+==April 18th, 2024==
+*NNC pipeline being switched off locally and moving into the Alliance ABC.
-=== Discussion Topics from IWM ===
+==April 11th, 2024==
-* Explaining job posting options via the forum in new Worm Breeder's Gazette article
+*Caltech WS293 ace files ready for the upload
-* Display of CRISPR data
-** Alleles with multiple lesions (one name, many mutations), need to be curated and mapped
-* Ontology term enrichment analysis, using ontologies other than gene ontology
-** Discussed on GO call yesterday; we can/should follow up with Paul Thomas
-** Would be good to have a single central tool/resource for enrichment analysis
-** PantherDB vs DAVID
-=== WormBook Chapters ===
+==April 4th, 2024==
-* Paul S will review over next week and will provide feedback
+* Continued discussion on sustainability
+* CZI, single cell RNAseq for Alliance -> anything happening will be few months down the road
+** Data is still going to SPELL and enrichment analysis
+** Peter Roy asking about expression profile of a condition and find similar expression profiles (SPELL like analysis) but SPELL cannot currently deal with scRNAseq data. Wen says it is possible (regarding each cell group as an experiment). Can try loading the into SPELL. Doe it improve the function of SPELL? Only 5-10 datasets. These data are a bit different from bulk RNAseq.
+* Textpresso: good to have a presentation for other MODs to show Textpresso capabilities? Yes. Maybe during sprint review
+* Michael's presentation on LLMs - Named Entity Recognition (NER)
-=== Outreach ===
+==March 14, 2024==
-* Sending out e-mails to all labs/PIs reminding about new data forms
-* Could also do more personalized outreach to a smaller subset of PI's/labs
-** Could focus on PIs not at the IWM
-=== Anatomy ===
+=== TAGC debrief ===
-* Embryonic development, cell division timing
-** Sulston timing
-** Waterson datasets
-** Zhao cell lineage timing datasets
-** Bao lab?
-* New EM reconstructions from David Hall, Scott Emmons, etc.
-* Neuronal connectivity, collaborative database with Scott Emmons and colleagues
-=== Citace upload ===
+==February 22, 2024==
-* Curators submit data to Wen on Tuesday, July 28th
-=== Taking over Gene Orienteer ===
+===NER with LLMs===
-* Xiaodong and Sibyl working on
-=== RNASeq data ===
+* Wrote scripts and configured an LLM for Named Entity Recognition. Trained an LLM on gene names and diseases. Works well so far (F1 ~ 98%, Accuracy ~ 99.9%)
-* Gary Williams only using high quality data, taking care of all curation (including meta data)
-* Public archive of rejected datasets?
-=== WormGuides ===
+* Is this similar to the FlyBase system? Recording of presentation  https://drive.google.com/drive/folders/1S4kZidL7gvBH6SjF4IQujyReVVRf2cOK
-* Bill Mohler et al working on desktop application
+* Textpresso server is kaput. Services need to be transferred onto Alliance servers.
-== July 9, 2015 ==
+* There are features on Textpresso, such as link to PDF, that are desirable to curators but should be blocked from public access.
-=== Expression Pattern ===
+* Alliance curation status form development needs use cases. ref https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls#February_15.2C_2024
-* Certain/uncertain qualifiers not annotated before some date
-* ~3,000 ?Expr_pattern objects without that annotation/tag
-* Daniela work on bringing up to date, hopefully won't take long
-=== Expression Clusters to Anatomy & Life Stage annotations ===
-* Many large scale datasets with tissue-specific expression data
-* Much of what is in SPELL is not annotated to ?Anatomy or ?Life_stage terms
-* A goal: make expression data queryable via ?Anatomy terms/pages
-* Wen will make the model change proposal
-* We may not want to show explicitly in widget
-* There is a need for a condensed display of expression data (per gene)
-* Some datasets, like the EPIC data, explicitly mention each embryonic cell name
-* Need for a condensed ontology browser per gene/anatomy and gene/life stage
-=== Proteomic analysis ===
-* Encyclopedia of Proteomic Dynamics, contacted Wen to share data
-* Wen will meet/discuss with group soon to determine what the goals are
-* It isn't clear what format the data has
-* Should include Gary Williams on discussions as he already processes Mass Spectrometry data
-=== External Databases ===
+==February 15, 2024==
-* To what extent can we take care of the data and display of other lab/publication databases
-* Many authors want to share and make links to their database/website via WormBase
-* What is the best way to handle large scale dataset sharing requests that don't necessarily (for the time being) fit our data model
-* We can take advantage of the "External Links" display on WBPaper pages to link out the the external databases affiliated with the paper, including a link to our FTP site with shared data files, maybe?
-* At least a stop gap measure until we can properly model the data
-=== Cis-regulatory site nomenclature? ===
+=== Literature Migration to the Alliance ABC ===
-* Barbara Meyer's lab published many "rex" (Recruitment Elements on X) sites, numbered sequentially
+==== Use Cases for Searches and Validation in the ABC (or, what are your common actions in the curation status form)? ====
-* Tim Schedl wondering about others' thoughts/opinions on how to, possibly, standardize the names of cis-regulatory elements
+===== Find papers with a high confidence NN classification for a given topic that have also been flagged positive by an author in a community curation pipeline and that haven’t been curated yet for that topic =====
-* Could be like gene names, without dash, e.g. "rex1", "rex2"
+*Facet for topic
-* We may want to try "WBsf-" prefix, on all element names like "WBsf-rex1", although may be only used in-house
+*Facet for automatic assertion
+**neural network method
+*Facet for confidence level
+**High
+*Facet for manual assertion
+**author assertion
+***ACKnowledge method
+**professional biocurator assertion
+***curation tools method - NULL
-=== Phenotypes ===
+===== Manually validate paper - topic flags without curating =====
-* Were there any conclusions about phenotype lookup from the Allele-Phenotype form?
+*Facet for topic
-* Chris spoke with Harald Hutter and others at the meeting about how to improve the lookup for phenotypes
+*Facet for manual assertion
-* Would be good to provide an explicit option to see phenotypes of related (or allele-affiliated) genes, perhaps by shared GO-term annotation
+**professional biocurator assertion
-* Need to think more on how to best compress display of phenotypes on gene pages as well
+***ABC - no data
-* We do already provide links to the Variation and Gene pages (with Phenotypes displayed) in the term information box of the form
+===== View all topic and entity flags for a given paper and validate, if needed =====
+* Search ABC with paper identifier
+* Migrate to Topic and Entity Editor
+* View all associated data
+* Manually validate flags, if needed
-== July 16, 2015 ==
+=== PDF Storage ===
+* At the Alliance PDFs will be stored in Amazon s3
+* We are not planning to formally store back-up copies elsewhere
+* Is this okay with everyone?
-=== Anatomy term page expression ===
+==February 8, 2024==
-* Raymond and Juancarlos are working on a display of genes that may be exclusively expressed in that anatomy object
+* TAGC
+** Prominent announcement on the Alliance home page?
-=== Construct/Transgene curation ===
+* Fixed login on dockerized system (dev). Can everybody test their forms?
-* Karen trying to make the curation of constructs & transgenes easier
-* May consider merging the transgene and construct OA's
-* Possibly add a construct/transgene request functionality in other OA's
-** Would those need multiple input fields?
-** Karen would take care of the details
-=== Molecule model ===
+==February 1, 2024==
-* Exogenous/endogenous tags issue
+* Paul will ask Natalia to take care of pending reimbursements
-* Scraping data from external chemical databases versus adding biologically relevant data from papers
+* Dockerized system slow pages (OA and FPKMMine). Will monitor these pages in the future. Will look for timeouts in the nginx logs.
-* We pull data from, e.g. CHEBI, but not all molecules fall under their purview, e.g. proteins
-=== Micropublication ===
+==January 25, 2024==
-* Promotion and outreach
-* Micropublications discoverable in PubMed?
-* Publisher = WormBase? Caltech?
-* Minimal standards for publication?
+=== Curator Info on Curation Forms ===
+* Saving curator info using cookies in dockerized forms. Can we deploy to prod?
-== July 23, 2015 ==
+=== ACKnowledge Author Request - WBPaper00066091 ===
+* I am more than willing to assist; however, the task exceeds the capabilities of the normal flagging process.
-=== Worm model for autism ===
+* The paper conducts an analysis of natural variations within 48 wild isolates. To enhance the reliability of the variant set, I utilized the latest variant calling methods along with a custom filtering approach. The resulting dataset comprises 1,957,683 unique variants identified using Clair3. Additionally, Sniffles2 was used to identify indels of >30 bp, which numbered in the thousands to tens of thousands for most wild isolates. It is worth noting that variants identified with Sniffles2 have less reliable nucleotide positions in the genome.
-* Would want to take human variations implicated in autism; look for orthologous genes in C. elegans/nematodes and find/make synonymous mutations
-* Prioritize based on worm phenotypes
-* Generally applies to human disease variants
-=== Database Migration ===
+* I am reaching out to inquire whether WormBase would be interested in incorporating this dataset. An argument in favor is the higher quality of my data. However, I am mindful of the potential substantial effort involved for WormBase, and it is unclear whether this aligns with your priorities.
-* Thomas Down leaving WormBase in September
-* Moving ahead with Datomic
-* Good starting use-case for Datomic is querying Datomic-version of GeneACE
-* Need to make sure documentation for migration to Datomic is available and comprehensible
-* Point-people at each site: Sibyl @ OICR, Juancarlos @ Caltech
-* Now need to work out the mechanics of curating into Datomic
-=== WormBase ParaSite ===
+* Should WormBase decide to use my variant data set, I am more than willing to offer my assistance.
-* Reciprocal searches (WB <-> PS) are working well
-=== Microarray datasets & modSeek ===
+=== Update on NN Classification via the Alliance ===
-* Some earlier datasets were re-processed (log-transformed, or re-annotated into original replicates instead of averaged results)
+* Use of primary/not primary/not designated flag to filter papers
-* Need to try out different methods of processing raw-data (WB usually only takes in processed data)
+* Secondary filter on papers with at least C. elegans as species
-* One pipeline can feed data into SPELL and modSeek
+* Finalize sources (i.e. evidence) for entity and topic tags on papers
-* It's difficult to establish/determine gold standards for assessing process performance
+* Next NN clasification scheduled for ~March
-=== WormBook chapter reviewers ===
+* We decided to process all papers (even non-elegans species) and have filters on species after processing.
-* Send reviewer suggestions to Paul ASAP
+* NNC html pages will show NNC values together with species.
+* Show all C. elegans papers first and other species in a separate bin.
-=== C. elegans proteome in UniProt ===
+=== Travel Reimbursements ===
-* Not a complete correspondence between WormBase and UniProt
+* Still waiting on October travel reimbursement (Kimberly)
-* Cases: UniProt has entry for a protein that differs by one or two amino acids from WormBase
+* Still waiting on September and October travel reimbursements (Wen)
-** Made from translations of what cDNAs etc. have been submitted
-** Partial data, e.g. partial cDNAs translated
-* Anything we can do to achieve greater consistency?
-* Protein data sets are important
-* Hinxton can use disrepancies as a flag to check on the gene/protein models
-* Would be good to have more reciprocal linkage between UniProt and WormBase
-* AVR-15, UniProt have two additional entries compared to Wormbase, differing in only 1 or 2 amino acids
-* Should we pick up different entries from UniProt and store/display the data; how to reconcile?
-* Possible use case: enter a UniProt ID into the BLAST/BLAT tool to identify WormBase matches
-=== Gene Orienteer Data ===
+=== UniProt ===
-* Sibyl and Xiaodong looking at data and scripts from Gene Orienteer
+* Jae found some genes without uniProt IDs, but the genes are there on uniProt but without WBGene IDs.
+* Wen reached to Stavros and Chris to investigate WormBase and AGR angles.
+* Stavros escalates the issue on Hinxton Standup.
+* Mark checks Build scripts and WS291 results. After that, he contacted UniProt and he's working with them to figure this out.
-=== Precanned queries for exclusive expression ===
+==January 18, 2024==
-* Raymond & Juancarlos working on final details
+* OA showing different names highlighted when logging in the OA, now fixed on staging
-* Intent is to display genes that may be specifically/exclusively expressed in e.g. an anatomy term
-=== Embryonic developmental timing ===
-* Sulston, Murray timing data sets for wild type embryonic cell division timing
-* Mutant data sets are coming in as well
-=== Genetic Interaction Ontology (GIO) ===
+==January 11, 2024==
-* Latest version of the GIO complete
+* Duplicate function in OA was not working when using special characters. Valerio debugged and is now fixed.
-* Juancarlos and Chris built a "genetic interaction calculator" to determine interaction types from quantitative phenotype inequalities
+** Curators should make sure that, when pasting special characters, the duplicate function works
-** http://mangolassi.caltech.edu/~azurebrd/cgi-bin/forms/gi_calculator.cgi
+* OA showing different names highlighted when logging in the OA, Valerio will debug and check what IP address he sees
-* Sending out to other MODs, etc.
+** If you want to bookmark an OA url for your datatype and user, log on once, and bookmark that page (separately for prod and dev)
-* Seems that although there is buy in conceptually, most curators can't afford the time for such detailed curation
+* Chris tested on staging and production the phenotype form and the data are still going to tazendra
+** Chris will check with Paulo. Once it is resolved we need to take everything that is on tazendra and put it on the cloud with different PGIDs
+** Raymond: simply set up forwarding at our end?
+* AI working group: Valerio is setting up a new account for open AI -paid membership for ChatGPT4. We can also use Microsoft Edge copilot (temporary?)
+* Chris getting ready to deploy a 7.0.0. public release - February 7th. Carol wanted to push out monthly releases. This will include WS291. For subsequent releases the next several releases will be WS 291 until WS292 is available.
+* Valerio would like to use an alliancegenome.org email address for the openAI account
+* New alliance drive: https://drive.google.com/drive/folders/0AFkMHZOEQxolUk9PVA
+** note: please move shared files that you own to new Alliance Google Drive.  Here is the link to the information that Chris Mungall sent:  For more instructions see the video and SOP here:https://agr-jira.atlassian.net/browse/SCRUM-925?focusedCommentId=40674
+* Alliance logo and 50 word description for TAGC> Wen will talk to the outreach WG
+* Name server. Manuel working on this, Daniela and Karen will reach out to him and let him know that down the road micropublication would like to use the name server API to generate IDs in bulk
+* Karen asking about some erroneous IDs used in the name server. Stavros says that this is not a big deal because the "reason" is not populating the name server
+* It would be good to be able to have a form to capture additional fields for strains and alleles (see meeting minutes August 31st 2023. https://wiki.wormbase.org/index.php/WormBase-Caltech_Weekly_Calls_2023#August_31st.2C_2023). This may happen after Manuel is done with the authentication.
+* Michael: primary flag with Alliance. Kimberly talked about this with the blue team. They will start bringing that over all papers and fix the remaining 271 items later.
-=== Phenotype (ontology) display ===
+==January 4, 2024==
-* Problems with display of phenotypes (and other annotations) on WormBase, as pointed out by several people at the IWM
+* ACKnowlegde pipeline help desk question:
-* Karen would like to start creating allele concise descriptions
+** Help Desk: Question about Author Curation to Knowledgebase (Zeng Wanxin) [Thu 12/14/2023 5:48 AM]
-* We need compact, intelligently ordered annotation lists, not just alphabetical lists of ontology annotations
+* Citace upload, current deadline: Tuesday January 9th
-* It would be good to show ancestors for relatedness and order
+** All processes (dumps, etc.) will happen on the cloud machine
-* Chris working on Python script to display all annotations in the context of the entire ontology
+** Curators need to deposit their files in the appropriate locations for Wen
-* We will need to see if this approach is feasible/beneficial
+* Micropublication pipeline
+** Ticketing system confusion
-=== PATO-style EQ (Entitiy-Quality) phenotype annotations ===
+** Karen and Kimberly paper ID pipeline; may need sorting out of logistics
-* It is clear that some phenotype annotations require details, e.g. "drug sensitivity" annotations should have the drug involved
-** This drug/molecule annotation should be present in the details if not directly in the term itself
-* Raises the issue of a number of cases where we need PATO-style EQ annotations, not just explicit phenotype terms for all possible scenarios
-* This would be helpful in annotating embryonic timing and identity phenotype datasets
-== July 30, 2015 ==
-=== Wen Chen helped Wen Chen ===
-* Wen Chen (lab) has list of genes to analyze
-* Wen Chen (WB) helped process the list
-* Would be good to have a simple CGI to process a list of genes in a variety of ways
-** http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/fraqmine.cgi
-** For GeneTissueLifeStage and GeneConciseDescription more datatypes easily slotted in if curator makes a file
-* Is this redundant with WormMine?
-** Not for data that doesn't exist (in WormMine) yet; more agile: could be up and running within a matter of days
-=== Interconnections between WormBase and FlyBase ===
-* We could create more inter-connectivity between the two databases
-* Sharing concise descriptions of genes
-* Would be good for FlyBase and WB curators (Xiaodong?) to talk about where the links should exist at each site
-== August 6, 2015 ==
-=== WormMine ===
-*prioritize new data types into WormMine
-**RNAi phenotype, interactions, human disease...
-**WormMine wiki page: http://wiki.wormbase.org/index.php/WormMine
-=== WormMart machine ===
-*Wen wants to use the machine when WormMart retires
-=== UniProt/wormbase gene class ===
-*need to talk to UniProt C.elegans curator
-=== Raymond, Chris and Juancarlos are working on phenotype viewer ===
-=== James: list of genes, enrich in what tissues ===
-*python code
-*biotype ontology, tissue expression from postgres as input
-== August 13, 2015 ==
-=== Phenotype term annotation summary graph ===
-Goal: Provides an ontology-relationship-aware summary view of a gene's phenotype annotations.
-Prototype link
-aex-3 (fewer phenotypes)
-existing phenotype widget <http://www.wormbase.org/species/c_elegans/gene/WBGene00000086#-b-3>
-summary graph <http://131.215.12.204/~azurebrd/cgi-bin/amigo.cgi?action=annotSummaryGraph&focusTermId=WBGene00000086>
-daf-2 (lots of phenotypes)
-phenotype widget <http://www.wormbase.org/species/c_elegans/gene/WBGene00000898#-b-3>
-summary <http://131.215.12.204/~azurebrd/cgi-bin/amigo.cgi?action=annotSummaryGraph&focusTermId=WBGene00000898>
- Proposed development procedure:
-* standalone prototyping, commenting and improvements within the group.
-* implementation as a widget on dev site (juancarlos.wormbase.org), more testing and soliciting comments from selected end users.
-* committing to main site for general use.
- Outline of graph processing:
-To gather information:
-* WOBr query to collect all phenotypes annotated to the gene of interest.
-* WOBr query to collect all transitive relationships of the phenotypes from (1) towards the ontology root.
-To simplify and to control graph size:
-* Remove all nodes (phenotype terms) that are not directly annotated with or at branching points where two branches of annotations merge (LCA lowest common ancestor, if you will).
-* Scale node size according to annotation count (includes inferred annotations).
-* Limit appearance of label to nodes above a given size (roughly big enough to hold term name).
-* Show annotation counts in mouse-over bubble, add hyperlink to term pages to each node
-=== International Biocuration Conference ===
-Propose to submit paper on Community Curation
-* Mary Ann happy to lead.
-* Daniela on board.