|
|
Line 25: |
Line 25: |
| [[WormBase-Caltech_Weekly_Calls_January_2018|January]] | | [[WormBase-Caltech_Weekly_Calls_January_2018|January]] |
| | | |
| + | [[WormBase-Caltech_Weekly_Calls_February_2018|February]] |
| | | |
− | == February 1, 2018 ==
| |
| | | |
− | === Automated gene descriptions - orthology ===
| |
− | * Some genes have human orthology mentioned in automated descriptions, even though the orthology call has not been called in DIOPT
| |
− | * WormBase uses EnsemblCompara and other methods (not aggregate method like DIOPT)
| |
− | * Orthology synchrony is a challenge; WormBase and FlyBase may need to pay special attention to orthology calls and discrepancies
| |
− | * DIOPT is purely automated, does not consider other information about orthology evidence
| |
− | * We should be clear about how the orthology calls are made
| |
| | | |
− | === Next upload === | + | == March 1, 2018 == |
− | * Unclear of exact date
| |
− | * Probably end of March
| |
− | | |
− | === SimpleMine issue ===
| |
− | * Redundant genes in input list are merged
| |
− | * Should SimpleMine provide an option to keep redundancies?
| |
− | ** Give option up front? Provide submission step to point out redundancies? Ask for choice?
| |
− | * We can default to show row-by-row correspondence, and display the number of redundant entries
| |
− | * Conclusion: Make an option for users to indicate if they want row-by-row correspondence or a merged list
| |
− | | |
− | === Cell type expression ===
| |
− | * Waterston paper
| |
− | * 40,000 random cells, clusters sequenced individually to a depth of 20,000 reads; ~1000 genes per cell; cluster data; make judgement call as to what cell types they likely are
| |
− | * For now, we can do a simple annotation: significantly expressed genes for each cell type
| |
− | * Supplemental table S5 for neurons
| |
− | * Maybe just ignore the hybrid calls like AQM/PVM, etc.
| |
− | * It may be good to isolate the single cell data from other expression data
| |
− | * We should annotate/capture the expression clusters
| |
− | * Would be good to be able to do enrichment analysis on the clusters; compare data sets
| |
− | * Data has not been placed in SPELL yet, Gary considered the data a work in progress
| |
− | * We can communicate with Waterston group; are they collecting more data?
| |
− | * Wen will take another look at the data
| |
− | * Gary W. concerned about the reported/assumed/inferred identity of the cells in the paper
| |
− | * Probably cannot curate to individual cells, but we can annotate to a higher level term
| |
− | * We want to annotate and display expression enrichment as well as presence/absence calls
| |
− | | |
− | | |
− | == February 8, 2018 ==
| |
− | | |
− | === Release schedule ===
| |
− | * Wen will ask Hinxton to update the published release schedule (for next data upload)
| |
− | | |
− | === New York Worm Meeting ===
| |
− | * Wen and Kimberly will present a WormBase tutorial on March 24
| |
− | * Wen communicated to Oliver Hobert; suggested topics:
| |
− | ** Multi-gene (batch) search tools
| |
− | ** How literature info gets into WormBase? Curation process?
| |
− | ** Should we discuss completeness?
| |
− | | |
− | === GO curation ===
| |
− | * New simple input form for Noctua, being developed at USC
| |
− | * Not very much GO curation happening at WB right now
| |
− | * Protein-2-GO pipeline
| |
− | * Do we have a good Phenotype-2-GO(Process) mapping pipeline? We have our old mappings; not very reliable; would need to spend more time expanding the worm phenotype ontology and GO to improve
| |
− | * Cellular component curation will come in from WB expression curation
| |
− | * Don't have pipeline for Interactions-2-GO
| |
− | * Textpresso Molecular Functions pipeline?
| |
− | * geneprod and catalyticact data types for molecular function pipeline
| |
− | * Textpresso can send molecular function annotations to Noctua
| |
− | * For high-level pathway curation; we should probably read WormBook chapters (or other reviews) and develop pathways (using non-experimental evidence codes)
| |
− | * We could potentially seed Noctua models from Reactome
| |
− | * We would like to have complete curation for major pathways for gene enrichment analysis
| |
− | * Roles of small molecules in Noctua models still being worked out
| |
− | | |
− | === Phenotype curation ===
| |
− | * Chris has had community curation pipeline on back burner while updating Wiki and dealing with AGR, WormMine, etc.
| |
− | * Will get back to soon; will resend email requests for newer papers sent over a year ago
| |
− | | |
− | === Expression curation ===
| |
− | * Daniela getting back to expression curation after Micropublication stuff has quieted down
| |
− | | |
− | === Gene regulation curation ===
| |
− | * April came across dataset involving regulation of siRNAs that don't seem to have gene objects in WB
| |
− | * May need to instantiate genes for these?
| |
− | | |
− | === Physical interaction curation ===
| |
− | * SVM classification; do we flag a paper as negative that has protein interactions but no interactions for C. elegans
| |
− | * Can we generate a good SVM that only identifies WB-curatable papers?
| |
− | | |
− | === Disease curation ===
| |
− | * Now curating the specific genetic entities involved in a disease model
| |
− | * Will also capture environmental conditions, treatments (e.g. ameliorates, exacerbates)
| |
− | * Curation in-line with AGR standards
| |
− | * Evidence code needed for assertions that an animal is a model of disease in which the assertion is based on background knowledge and experimental evidence, together
| |
− | * Evidence Code Ontology (ECO) is developing a new term to accommodate
| |
− | * Disease curators can use new evidence code as well as any existing codes
| |
− | * Is there a definition of a "disease model"?
| |
− | * What are the minimal criteria for considering something a disease model?
| |
− | * WB and FB curators focus on cellular phenotype and relation to the disease
| |
− | | |
− | === Expression cluster curation ===
| |
− | * 27 papers in pipeline
| |
− | * Will then work on "single-cell" RNAseq
| |
− | ** Wen, Raymond, and David should discuss
| |
− | | |
− | === April and May Worm Meetings ===
| |
− | * Midwest and Colorado meetings
| |
− | * Wen submitting abstracts
| |
− | * Wen and Kimberly can write up abstract template for New York meeting and send around to be modified for future meetings
| |
− | | |
− | === WormBook ===
| |
− | * Published last version for legacy site
| |
− | | |
− | === Papers ===
| |
− | * Daniel requested 13 (older) papers from Caltech library through inter-library process
| |
− | * Received more than half as images; would need optical character recognition (OCR) for Textpresso purposes
| |
− | * What is the state of the art of OCR now? How good is it? Can we ask Caltech library for the service?
| |
− | * Are these high priority papers? Need to check to see if worth processing
| |
− | | |
− | === AGR ===
| |
− | * Disease working group setting up a face-to-face meeting
| |
− | * Variant working group may need a face-to-face meeting as well
| |
− | * Expression working group working out initial AGR site data display mockups
| |
− | * Interaction working group; we will want to incorporate miRNA/target interactions (RNA-RNA interactions); will look at miRBase
| |
− | | |
− | | |
− | == February 15, 2018 ==
| |
− | | |
− | === Model changes ===
| |
− | * Models freeze March 2nd
| |
− | * Will need to get model changes proposed and tested by then
| |
− | | |
− | === Sys admin of Tazendra/Mangolassi ===
| |
− | * Raymond will discuss with Juancarlos to centralize
| |
− | * Need good documentation for forms, tools, etc.
| |
− | * Will be a push to put all code for tools and forms on GitHub
| |
− | | |
− | === Tazendra forms, tools bug this week ===
| |
− | * There is a dependency on Mangolassi for some tools
| |
− | * Mangolassi went down and caused problems
| |
− | * Would be good to decouple the two machines
| |
− | | |
− | === AGR ===
| |
− | * May not get an AGR all-hands face-to-face meeting before the summer
| |
− | * Working groups can decide to have face-to-face meeting
| |
− | * People should speak up if they have interest in visiting other MODs/sites; can be arranged
| |
− | * Consider what grant proposals could come out of such meetings/visits
| |
− | * Currently no ontology working group, no anatomy working group
| |
− | * Could establish a preliminary working group; reach out to relevant people
| |
− | * Anatomy working group issues may come up in expression working group
| |
− | ** Daniela will keep Raymond updated on relevant issues that come up with the expression group
| |
− | | |
− | === Ontology Browser gene lists ===
| |
− | * Chris requesting change to gene list display from WOBr
| |
− | * https://github.com/WormBase/website/issues/6190
| |
− | * Should provide WBGene IDs, not just gene public names
| |
− | ** That was the original intent, but using WBGene IDs was, for some reason, causing issues when developing the tool; will need to revisit that issue to get WBGene IDs displayed
| |
− | | |
− | | |
− | == February 22, 2018 ==
| |
− | | |
− | === Making MOD data publicly available in a central location===
| |
− | * Meets the NIH mandate for WB as a publicly-funded project and helps researchers get their data highlighted faster than waiting for the db build
| |
− | **would put in filters to avoid releasing sensitive data or incomplete curation annotations
| |
− | **would be good for journal hyperlinking project since it needs access to up to date data--see more below about project
| |
− | ** Central data repository for all data (MOD) files would be helpful to developers (and users)
| |
− | ** Does Caltech have an FTP site that could be used?
| |
− | | |
− | ===Journal Hyperlinking project goals===
| |
− | Hyperlinking project, has been in production since 2009, links bioentities in worm, yeast, and fly research articles to relevant databases, requires the latest data from WormBase, SGD, and FlyBase - could use a central repository to pull entities (name, ids, synonyms) from
| |
− | **MOD Curators check link accuracy and check for missed links (needs ongoing fte support)
| |
− | **supporting this project is not in the remit of any of the MODs, project is not sustainable without outside funding, hence finding funding outside of WB.
| |
− | **Since inception, project links bioentities in GSA papers (Genetics and G3)
| |
− | **C. elegans genes, alleles, rearrangements, strains, clones, short phenotype names, and transgenes are linked in these papers
| |
− | **Karen's (InSilico) grant goals are to
| |
− | ***expand the pipeline to other journals, specifically eLife (then to PLoS)
| |
− | ***expand to all AGR member mods in addition to SGD and FlyBase. Also bring in PomBase
| |
− | ***not planning on expanding linking beyond simple text recognition of known lexica and entities that follow a regex
| |
− | ***SBIR commercialization plan is to extend entity identification to commercial reagents and collect data for subscription-based access from biotech suppliers.
| |
− | ***need data from postgres, which is not available in geneace dump. Could possibly just dump all Postgres data into one place, Karen's developers could write scripts to process that data;
| |
− | ***** Juancarlos will setup a URL that can be used to access the data; will setup on cronjob every day at 8pm
| |
− | ***InSilico hyperlinks go through InSilico page to embedded i-frames of MOD entity page
| |
− | ****allows trackability of link access, stats that will be given to the MODS
| |
− | ****allows monitoring and resolution of links that go dead
| |
− | ****allows splash pages for silent links
| |
− | | |
− | === Alliance SAB meeting ===
| |
− | * SAB critical of:
| |
− | ** not being unified
| |
− | ** not being organized
| |
− | * Now everyone has committed
| |
− | * Concern still exists about autonomy of MODs
| |
− | ** Will each user community still be served effectively by the Alliance?
| |
− | * Organization is easier when all are committed
| |
− | * Maybe bring in a professional organizer/project manager (long term)
| |
− | * New aggressive timeline for progress
| |
− | ** April 23rd meeting; need to give material a week earlier
| |
− | ** Need year and a half plan; each working group will provide details
| |
− | * Only 2 full time Alliance staff; may need more on project; difficult for individuals to split time/effort
| |
− | * SAB member: curation involves expert decision making/analysis on issues, not just straight-forward data acquisition
| |
− | * Maybe we would have better curation consistency if individual curators focused on particular topics; became experts for certain subject matter
| |
− | * Possibility to have Alliance all-hands call in Fall
| |
− | * Working groups can have face-to-face meetings; have travel/meeting budget until July 31st then resets
| |
− | | |
− | === Automated gene descriptions ===
| |
− | * Difficult to handle genes with high information content; many ontology term annotations
| |
− | * How do we simplify descriptions? Using higher-level terms, slim terms? Gets tricky
| |
− | | |
− | === Micropublications ===
| |
− | * Michael Elowitz tweeted out micropublication stuff
| |
− | * Received feedback on Twitter; worth looking at threads, comments
| |
− | | |
− | === Community curation plan ===
| |
− | * Alliance need for community curation
| |
− | * Micropublications are a bit of a pilot for community curation
| |
− | * Need shared curation/submission forms; fit shared data models?
| |
− | | |
− | === RNAi secondary targets ===
| |
− | * WormMine and WormBase gene/RNAi pages include secondary RNAi targets but WOBr and SObA do not
| |
− | * Should we include or exclude secondary targets? We want consistency across data sets
| |
− | * No gold standard RNAi target prediction algorithm
| |
− | * We should be transparent about primary/secondary status wherever we include secondary targets in display
| |
− | * Could be addressed with phenotype display proposal
| |
− | * Should probably remove secondary targets from bulk data sets
| |