WBConfCall 2018.07.19-Agenda and Minutes
Agenda and Minutes
SAB
To Do List:
- PomBase maybe interested in using SObA.
from draft sab report (to parse out as potential action items): Q1: What new data types (and data) are coming soon?
It would be great to focus on data types that are unique and powerful in worm, including single cell transcriptomics/sequence analysis, neural circuits, metabolomics, proteomics, ncRNA orthologs. - Single cell transcriptomics is likely to be a major growth area in the future. A major challenge in this field is to understand the relationship between cell transcriptome and cell type/anatomy/developmental stage. This is currently done using gene expression markers of known cell types, but there is no current database of this information. It would be great to organize this information such that it could be used by the single cell transcriptomics community. For human, the Human Cell Atlas community is developing many tools and databases in this area, such as the cell ontology (Richard H. Scheuermann), so it would be good to be aware of development in the wider area. - Reliable cell marker gene curation for single cell transcriptome data/FACS-transcriptome datasets. A gold-standard set of markers needs to be implemented. - Designing additional strategies to combine gene expression data with anatomy (visual presentation?) - Neural circuit datasets need a host site that will be able to accommodate an increasing number of datasets, including those generated in various mutant backgrounds. One suggestion is to revisit possible strategies for working closer with WormAtlas on this issue. [see also Q4] - Metabolomics is an increasing area of concentration in the worm community and elsewhere, and a focus on how to treat this type of data will help provide a blueprint for the AGR. - Proteomics – new types of datasets will need to be incorporated (BioID, APEX, cofractionation) – perhaps as part of an interaction viewer. - The interaction browser should have a link to Cytoscape/GeneMania for deeper interrogation of networks. - Organized information on software/algorithm for dataset analyses. - Promoter elements: Build better visualization of regulatory elements in promoters. Make it interactive so you can access sequences of elements (including exact genome coordinates) by clicking on cartoon of TF binding site (for example, a schematic of promoter etc. on the current gene model).
Q2: What data are important to have complete?
- GO process terms for all C. elegans genes (currently only 9,000 proteins have process; fill in those without data with ND so to get an idea on the current status of data completion). [See also Q6 below, revisiting orthologue and gene annotation bullets]. - Tool ‘datasets’: e.g. a comprehensive promoter/driver line list for specific cell/neuron types. - Update gene models/splicing variation with the large amount of RNA-seq data available. Explore methods for visual representation of transcript abundance.
Q3: Tradeoff of more curation versus more tools for analysis/display?
There was perceived need for more curation and more tools. Streamlining curation steps and strongly promoting self-reporting [Q7, below] could help relieve the burden of curation. - Consider steps to enhance curation efficiency (data structure-based? Clusters of data-type-based curation, eg. expression/anatomy vs GO/pathway by different curators for multiple papers). - Improve and extend curation data submission forms to reduce the rate of false self-identification. eg. for Diseases: provide specific categories to select from instead of just a click of ‘relevant to disease’.
Q4: How can we better serve our communities?
- Development of a platform for submission of custom data sets (e.g. neural network, metabolomics) from the community, to promote fast and efficient sharing of data. Define a data format for submission types and facilitate accompanying micropublication/macropublications. This could be implemented as part of the AGR, as there would presumably be wider interest from other MODs. Could provide a DOI, support a predefined set of data types (e.g. matrix, network, genome track) that would have their associated web-based visualization tool, include standard metadata for the community submission. All the submissions could be claimed as community contributed annotations. - Create a collaborating information center built on top of the WormBase people database, using social network analysis. For instance, you can (ideally automatically) identify communities working on the same disease, phenotype, or gene and make these available on the website facilitating interactions between C. elegans labs and human geneticists/clinicians. - Consider whether there are data types/sets that are old and should be archived to clean up the website and save researcher effort. Old assemblies? Early microarray or modEncode datasets whose quality has been surpassed by more recent efforts? - Conduct a survey for which datasets might be appropriate for ‘archive’ and ‘curation’ (data cleaning). - Would be nice to have a tool for converting old gene identifiers to current IDs for using old data sets to mine new updated data. - Try to streamline curation activities, more co-curating disease and phenotype, regulation, interaction and GO for example.
Q5: How can we do better outreach?
- The introduction of WormBase personnel-led in person tutorials at regional meetings and combined group meetings is viewed as a key innovation to outreach. Use of these tutorial site visits should be continued (cost should be covered by host institution) with 1-2 WormBase members providing training. - Prominently advertise availability of and logistics for tutorial site visits on WormBase. - Prepare more “how to” videos to accompany tools (especially for WormMine) on the website, and possibly including one for paper curations. Consider placing links to these videos on the relevant page in addition to the collective “tools” page. - Consider presentations at non-C. elegans meetings to increase use by human/vertebrate biologists.
Q6: How can we better relate C. elegans knowledge to human genetics and health.
- Emphasize the importance of ‘gene function/pathway’ models for diseases rather than over-focusing on the precise phenotype match. - Better curation of gene information (many annotated nematode-specific genes may not be nematode-specific). - Revisit the ‘orthologue’ category. Reannotate genes based on homology searches using more powerful tools such as Hidden Markov Models threaded to structural databases (i.e. Phyre2) to search for homology based on structural features rather than simply using multiple sequence alignment. Orthologues have been found using this method that were missed using PSI-BLAST, and more examples are likely to be buried in the genome. - Use publications on disease-associated genes to prioritize curation of WormBase curation of the ‘human disease’. - Provide curation lists of C. elegans human disease models, including genes engineered to carry disease associated alleles identified in human patients.
Q7: How can we improve our community input to curation?
Strategies are needed to improve the response rate for curation requests. - Stress the reasons/needs/benefits of community curation in outreach sessions at lab visits and regional meetings. - Research what drives users to contribute e.g. highlight the benefits that are driving contributions - use in dissemination for grants - give recognition - Promote high value community curated data prominently. - Prepopulate curation forms as much as possible, including submitting author name. - Encourage authors to highlight/summarize the key information for curation. - Ask/allow senior authors to approve/edit curation prior to their publication at WormBase. - Consider contacting authors prior to publication with papers already deposited to the bioArchive. - Provide a table for authors to fill before submitting a paper (example - flybase https://wiki.flybase.org/wiki/FlyBase:Author_Reagent_Table_(ART), so the information will be collected in advance. - Continue to apply pressure to journal publishers to adopt mandatory curation submission policies.
Q8: How to best take advantage of current flux at NIH with respect to data science and funding?
- Develop a novel, unifying vision of how human disease research can take advantage of MOD data e.g. how would a future human MOD work and how would it make use of MOD data? What types of research queries would this support? How would specific types of disease researchers use and benefit from such as future resource - e.g. genetic risk of disease (human genetics), cancer genomics, clinical genetics. - Advocacy at the Alliance for all model systems/demonstrate leadership to push the field forward. - Education for how to best use the databases for patient-oriented users. Consider presentations at NIH.
Q9: How should we proceed with our backend database migration, given the Alliance?
- Continue with the database migration. So far it has really been seamless from the user perspective, and other AGR members might come on board in future and adopt Datomic. - Ensure as much as practical that duplication of work is reduced, maybe by including external review of development priorities and software design decisions (e.g. could another MOD developer in the AGR review?). - Take it as an opportunity to consolidate/standardize data format.
Q10: How do we ensure that the C. elegans and broader worm communities are served as the Alliance proceeds?
- Revamp the WormBase front page. Use it to highlight the basic system and recent advances in the field - for advocacy to broader scientific community. Could include news on recent papers - science/videos- a frequently updated home page. See examples such as FlyBase). - Identifying community specific needs and make sure they continue to be added to the site - gene models, etc. - Continuously highlight the importance of the C. elegans model. - Where possible (e.g. for different splice forms) incorporate confidence measures with data.
Help Desk
- Is it possible to programmaticaly (e.g. using the restAPI) fetch gene sequences with the case preserved for exons (upper) and introns (lower) as per the sequence pop-up widget? [1]
- July 5 email from Dong Tian asking about daf-16 isoforms