Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
m
Line 25: Line 25:
 
[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
 
[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
  
 +
[[WormBase-Caltech_Weekly_Calls_February_2018|February]]
  
== February 1, 2018 ==
 
  
=== Automated gene descriptions - orthology ===
 
* Some genes have human orthology mentioned in automated descriptions, even though the orthology call has not been called in DIOPT
 
* WormBase uses EnsemblCompara and other methods (not aggregate method like DIOPT)
 
* Orthology synchrony is a challenge; WormBase and FlyBase may need to pay special attention to orthology calls and discrepancies
 
* DIOPT is purely automated, does not consider other information about orthology evidence
 
* We should be clear about how the orthology calls are made
 
  
=== Next upload ===
+
== March 1, 2018 ==
* Unclear of exact date
 
* Probably end of March
 
 
 
=== SimpleMine issue ===
 
* Redundant genes in input list are merged
 
* Should SimpleMine provide an option to keep redundancies?
 
** Give option up front? Provide submission step to point out redundancies? Ask for choice?
 
* We can default to show row-by-row correspondence, and display the number of redundant entries
 
* Conclusion: Make an option for users to indicate if they want row-by-row correspondence or a merged list
 
 
 
=== Cell type expression ===
 
* Waterston paper
 
* 40,000 random cells, clusters sequenced individually to a depth of 20,000 reads; ~1000 genes per cell; cluster data; make judgement call as to what cell types they likely are
 
* For now, we can do a simple annotation: significantly expressed genes for each cell type
 
* Supplemental table S5 for neurons
 
* Maybe just ignore the hybrid calls like AQM/PVM, etc.
 
* It may be good to isolate the single cell data from other expression data
 
* We should annotate/capture the expression clusters
 
* Would be good to be able to do enrichment analysis on the clusters; compare data sets
 
* Data has not been placed in SPELL yet, Gary considered the data a work in progress
 
* We can communicate with Waterston group; are they collecting more data?
 
* Wen will take another look at the data
 
* Gary W. concerned about the reported/assumed/inferred identity of the cells in the paper
 
* Probably cannot curate to individual cells, but we can annotate to a higher level term
 
* We want to annotate and display expression enrichment as well as presence/absence calls
 
 
 
 
 
== February 8, 2018 ==
 
 
 
=== Release schedule ===
 
* Wen will ask Hinxton to update the published release schedule (for next data upload)
 
 
 
=== New York Worm Meeting ===
 
* Wen and Kimberly will present a WormBase tutorial on March 24
 
* Wen communicated to Oliver Hobert; suggested topics:
 
** Multi-gene (batch) search tools
 
** How literature info gets into WormBase? Curation process?
 
** Should we discuss completeness?
 
 
 
=== GO curation ===
 
* New simple input form for Noctua, being developed at USC
 
* Not very much GO curation happening at WB right now
 
* Protein-2-GO pipeline
 
* Do we have a good Phenotype-2-GO(Process) mapping pipeline? We have our old mappings; not very reliable; would need to spend more time expanding the worm phenotype ontology and GO to improve
 
* Cellular component curation will come in from WB expression curation
 
* Don't have pipeline for Interactions-2-GO
 
* Textpresso Molecular Functions pipeline?
 
* geneprod and catalyticact data types for molecular function pipeline
 
* Textpresso can send molecular function annotations to Noctua
 
* For high-level pathway curation; we should probably read WormBook chapters (or other reviews) and develop pathways (using non-experimental evidence codes)
 
* We could potentially seed Noctua models from Reactome
 
* We would like to have complete curation for major pathways for gene enrichment analysis
 
* Roles of small molecules in Noctua models still being worked out
 
 
 
=== Phenotype curation ===
 
* Chris has had community curation pipeline on back burner while updating Wiki and dealing with AGR, WormMine, etc.
 
* Will get back to soon; will resend email requests for newer papers sent over a year ago
 
 
 
=== Expression curation ===
 
* Daniela getting back to expression curation after Micropublication stuff has quieted down
 
 
 
=== Gene regulation curation ===
 
* April came across dataset involving regulation of siRNAs that don't seem to have gene objects in WB
 
* May need to instantiate genes for these?
 
 
 
=== Physical interaction curation ===
 
* SVM classification; do we flag a paper as negative that has protein interactions but no interactions for C. elegans
 
* Can we generate a good SVM that only identifies WB-curatable papers?
 
 
 
=== Disease curation ===
 
* Now curating the specific genetic entities involved in a disease model
 
* Will also capture environmental conditions, treatments (e.g. ameliorates, exacerbates)
 
* Curation in-line with AGR standards
 
* Evidence code needed for assertions that an animal is a model of disease in which the assertion is based on background knowledge and experimental evidence, together
 
* Evidence Code Ontology (ECO) is developing a new term to accommodate
 
* Disease curators can use new evidence code as well as any existing codes
 
* Is there a definition of a "disease model"?
 
* What are the minimal criteria for considering something a disease model?
 
* WB and FB curators focus on cellular phenotype and relation to the disease
 
 
 
=== Expression cluster curation ===
 
* 27 papers in pipeline
 
* Will then work on "single-cell" RNAseq
 
** Wen, Raymond, and David should discuss
 
 
 
=== April and May Worm Meetings ===
 
* Midwest and Colorado meetings
 
* Wen submitting abstracts
 
* Wen and Kimberly can write up abstract template for New York meeting and send around to be modified for future meetings
 
 
 
=== WormBook ===
 
* Published last version for legacy site
 
 
 
=== Papers ===
 
* Daniel requested 13 (older) papers from Caltech library through inter-library process
 
* Received more than half as images; would need optical character recognition (OCR) for Textpresso purposes
 
* What is the state of the art of OCR now? How good is it? Can we ask Caltech library for the service?
 
* Are these high priority papers? Need to check to see if worth processing
 
 
 
=== AGR ===
 
* Disease working group setting up a face-to-face meeting
 
* Variant working group may need a face-to-face meeting as well
 
* Expression working group working out initial AGR site data display mockups
 
* Interaction working group; we will want to incorporate miRNA/target interactions (RNA-RNA interactions); will look at miRBase
 
 
 
 
 
== February 15, 2018 ==
 
 
 
=== Model changes ===
 
* Models freeze March 2nd
 
* Will need to get model changes proposed and tested by then
 
 
 
=== Sys admin of Tazendra/Mangolassi ===
 
* Raymond will discuss with Juancarlos to centralize
 
* Need good documentation for forms, tools, etc.
 
* Will be a push to put all code for tools and forms on GitHub
 
 
 
=== Tazendra forms, tools bug this week ===
 
* There is a dependency on Mangolassi for some tools
 
* Mangolassi went down and caused problems
 
* Would be good to decouple the two machines
 
 
 
=== AGR ===
 
* May not get an AGR all-hands face-to-face meeting before the summer
 
* Working groups can decide to have face-to-face meeting
 
* People should speak up if they have interest in visiting other MODs/sites; can be arranged
 
* Consider what grant proposals could come out of such meetings/visits
 
* Currently no ontology working group, no anatomy working group
 
* Could establish a preliminary working group; reach out to relevant people
 
* Anatomy working group issues may come up in expression working group
 
** Daniela will keep Raymond updated on relevant issues that come up with the expression group
 
 
 
=== Ontology Browser gene lists ===
 
* Chris requesting change to gene list display from WOBr
 
* https://github.com/WormBase/website/issues/6190
 
* Should provide WBGene IDs, not just gene public names
 
** That was the original intent, but using WBGene IDs was, for some reason, causing issues when developing the tool; will need to revisit that issue to get WBGene IDs displayed
 
 
 
 
 
== February 22, 2018 ==
 
 
 
=== Making MOD data publicly available in a central location===
 
* Meets the NIH mandate for WB as a publicly-funded project and helps researchers get their data highlighted faster than waiting for the db build
 
**would put in filters to avoid releasing sensitive data  or incomplete curation annotations
 
**would be good for journal hyperlinking project since it needs access to up to date data--see more below about project
 
** Central data repository for all data (MOD) files would be helpful to developers (and users)
 
** Does Caltech have an FTP site that could be used?
 
 
 
===Journal Hyperlinking project goals===
 
Hyperlinking project, has been in production since 2009, links bioentities in worm, yeast, and fly research articles to relevant databases, requires the latest data from WormBase, SGD, and FlyBase - could use a central repository to pull entities (name, ids, synonyms) from
 
**MOD Curators check link accuracy and check for missed links (needs ongoing fte support)
 
**supporting this project is not in the remit of any of the MODs, project is not sustainable without outside funding, hence finding funding outside of WB.
 
**Since inception, project links bioentities in GSA papers (Genetics and G3)
 
**C. elegans genes, alleles, rearrangements, strains, clones, short phenotype names, and transgenes are linked in these papers
 
**Karen's (InSilico) grant goals are to
 
***expand the pipeline to  other journals, specifically eLife (then to PLoS)
 
***expand to all AGR member mods in addition to SGD and FlyBase. Also bring in PomBase
 
***not planning on expanding linking beyond simple text recognition of known lexica and entities that follow a regex
 
***SBIR commercialization plan is to extend entity identification to commercial reagents and collect data for subscription-based access from biotech suppliers.
 
***need data from postgres, which is not available in geneace dump.  Could possibly just dump all Postgres data into one place, Karen's developers could write scripts to process that data;
 
***** Juancarlos will setup a URL that can be used to access the data; will setup on cronjob every day at 8pm
 
***InSilico hyperlinks go through InSilico page to embedded i-frames of MOD entity page
 
****allows trackability of link access, stats that will be given to the MODS
 
****allows monitoring and resolution of links that go dead
 
****allows splash pages for silent links
 
 
 
=== Alliance SAB meeting ===
 
* SAB critical of:
 
** not being unified
 
** not being organized
 
* Now everyone has committed
 
* Concern still exists about autonomy of MODs
 
** Will each user community still be served effectively by the Alliance?
 
* Organization is easier when all are committed
 
* Maybe bring in a professional organizer/project manager (long term)
 
* New aggressive timeline for progress
 
** April 23rd meeting; need to give material a week earlier
 
** Need year and a half plan; each working group will provide details
 
* Only 2 full time Alliance staff; may need more on project; difficult for individuals to split time/effort
 
* SAB member: curation involves expert decision making/analysis on issues, not just straight-forward data acquisition
 
* Maybe we would have better curation consistency if individual curators focused on particular topics; became experts for certain subject matter
 
* Possibility to have Alliance all-hands call in Fall
 
* Working groups can have face-to-face meetings; have travel/meeting budget until July 31st then resets
 
 
 
=== Automated gene descriptions ===
 
* Difficult to handle genes with high information content; many ontology term annotations
 
* How do we simplify descriptions? Using higher-level terms, slim terms? Gets tricky
 
 
 
=== Micropublications ===
 
* Michael Elowitz tweeted out micropublication stuff
 
* Received feedback on Twitter; worth looking at threads, comments
 
 
 
=== Community curation plan ===
 
* Alliance need for community curation
 
* Micropublications are a bit of a pilot for community curation
 
* Need shared curation/submission forms; fit shared data models?
 
 
 
=== RNAi secondary targets ===
 
* WormMine and WormBase gene/RNAi pages include secondary RNAi targets but WOBr and SObA do not
 
* Should we include or exclude secondary targets? We want consistency across data sets
 
* No gold standard RNAi target prediction algorithm
 
* We should be transparent about primary/secondary status wherever we include secondary targets in display
 
* Could be addressed with phenotype display proposal
 
* Should probably remove secondary targets from bulk data sets
 

Revision as of 16:33, 1 March 2018