WormBase-Caltech Weekly Calls October 2020
From WormBaseWiki
Jump to navigationJump to searchContents
October 1, 2020
Gene association file formats on FTP
- For example, current production release ONTOLOGY directory: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY/
- Our association files have format "*.wb"; is this useful or necessary?
- Other than referring to GAF in the header, it isn't clear to users what the columns refer to or what the column headers should be
- We could add a README file and/or convert to the new GAF 2.2 format which would have a more expressive file header and possibly column headers(?)
- File headers could possibly link to the format specification page
Phenotype association file idiosyncrasy
- As we've discussed previously, there is an oddity to how the phenotype association file we provide lists, or doesn't, references
- According to the GAF spec, column 6 is for reference and is required, whereas column 8 is "With (or) From" and is optional
- When we have a reference, the WBPaper ID is provided in column 6 and the WBVar ID or RNAi ID is provided in column 8
- However, when we have no reference (personal communication, e.g. from NBP allele submissions), the WBVar ID is instead put in column 6 (because we need something there), and column 8 is blank.
- This results in (1) column 6 having a mix of paper/reference IDs (good) and WBVar IDs (not good) and (2) WBVar IDs split between column 6 and 8; thus making it tedious to parse this file
- Proposed solution: Can we come up with some type of reference object ID to associate to the personal communications (or any annotations currently lacking a formal reference)?
- With the proposed solution, we can always have a reference ID in column 6 (the intended purpose of the column) and WBVar IDs for alleles can always remain consistently in column 8
- Proposal is to put WBPerson IDs in column 6 for personal communications. Chris & Karen will check if this will work.
Server space in Chen Building
- It looks like that we will not have a specific space for server computers.
October 8, 2020
Webinar Announcement
- Here is the live registration site: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/webinar.cgi
- Caltech zoom allows 300 attendees.
Descriptions from GO-CAM models
- One suggestion for the Alliance is to create a description based on a GO-CAM model
- Could also micropublish some descriptions (semi-automated?)
- Can make curators authors of micropublications for GO-CAM models/pathways
Transcription Factors in WormBase
- WormBase has a ?Transcription_factor class that is currently underutilized
- Chris spoke with Gary Williams about the status as he has done much of the work on the class
- Because transcription factors can often be complexes, it was decided to create the ?Transcription_factor class rather than simply an extension of tags to the existing ?Gene class
- The class seems reasonably complete; it's important to note that some TFs are general transcription factors, not necessarily gene-specific or sequence-specific DNA-binding TFs; it will be good to make that distinction clear to users
- Chris has compiled a Google sheet to assess the class before Gary W. leaves WB in the next couple of weeks
- The Google sheet has several tabs/worksheets, including one for the ACEDB data model (and notes about usage of tags), a summary table of associated genes, bound sequence features, existence of other protein-DNA binding data, etc.
- It would be good to make TF binding info (per gene and globally) more accessibly to our users, maybe via a new widget on gene pages (e.g. list incoming, regulating TFs and, for TF genes themselves, list potential target genes)
October 15, 2020
BioGRID data sharing
- Rose from BioGRID proposed that BioGRID curate high-throughput C. elegans interaction datasets, capturing confidence scores when available, and making those annotations available to WormBase for regular ingest
- Will need to consider a few points:
- BioGRID doesn't curate protein-DNA interactions
- We don't yet know the turn-around timeline for BioGRID curation of worm datasets; WB may be able to curate them much sooner
- Chris and Jae will work with Rose et al. to coordinate HTP curation
Enriched genes
- Some genes are considered "enriched" for an expression cluster data set even if the enrichment was in comparison to another cell or tissue (not whole animal)
- We should reconsider the ?Expression_cluster model to make sure we can appropriately model and communicate enrichment or subtypes thereof
October 22, 2020
CHEBI
- Karen spoke to CHEBI personnel on Tuesday
- CHEBI only has ~2 curators to create new entities
- CHEBI had submitted a proposal to establish pipelines to process requests from MODs
- Chemical Translation Service (CTS)
- OxO = https://www.ebi.ac.uk/spot/oxo/search
Training Webinar
- Scheduled for tomorrow at 1pm Pacific/4pm Eastern
October 29, 2020
Overview Webinar debriefing
- What's Good
- What needs improvement
- Participant requests:
A place to look for Worm methods (a public {moderated} wiki page?)
New alleles extraction pipeline
- current pipeline (on textpresso-dev) is sending data to Sanger RT system, which is being retired
- the plan is to build a new pipeline to send AFP-like alerts with new entities
- current pipeline reads alleles data from GSA and gene lists from Sanger, but I (Valerio) would need help from curators to understand how to get these data