Difference between revisions of "WormBase-Caltech Weekly Calls"
From WormBaseWiki
Jump to navigationJump to searchLine 176: | Line 176: | ||
*Searching on the basis of individual documents will work as it does now: displaying all papers as individual documents with their respective SVM results | *Searching on the basis of individual documents will work as it does now: displaying all papers as individual documents with their respective SVM results | ||
*Juancarlos will also add a filter for "Primary" vs "Not Primary" vs "Not Designated" | *Juancarlos will also add a filter for "Primary" vs "Not Primary" vs "Not Designated" | ||
+ | |||
+ | |||
+ | == August 30, 2012 == | ||
+ | |||
+ | Transgenes | ||
+ | *Transgenes have now transgene IDs | ||
+ | **each curator should check dumpers for next upload (in 2 months) on mangolassi and see that everything looks fine. | ||
+ | **Each curator should also make sure that all the transgene objects they have in their curation pipeline are converted into IDs. E.g. Kimberly had a bunch of transgenes that were not converted. |
Revision as of 21:13, 30 August 2012
Contents
2012 Meetings
August 2, 2012
Grant updates
- Topics
- Diseases
- Worm Phenotype ontology; attempt applying to other species? Some has been done already (e.g. C. briggsae)
- Benefit of curating phenotypes in other species? Particularly useful for genes not in C. elegans, for example
- Textpresso for nematodes
- >5000 papers now
- Complete set of papers by ~Labor Day
- Anatomy ontology
- Anatomy page is hub for data
- We should strive for user friendly display of information
- Cell functions, 10 or 100 highest/preferentially expressed genes, cell connections, cell signals
- Upcoming challenge - male/female/hermaphrodite divergence
- Anatomy with respect to life stage (e.g. life-stage-specific cells)
- Multiple species
- Uberon framework can be adapted for multiple (nematode) species
- Anatomy page is hub for data
- Gene Function
- RNAi, Allele-phenotype, Transgene-phenotype etc.
- Interactions
- Integrated interaction model, genetic interaction ontology
- Can we estimate how many interactions are left to curate? Can use OA/Postgres, SVM, and first-pass author forms etc. to estimate
- Gene expression and Pictures
- Need to update Gene expression model to accommodate Epic dataset (John Murray), 3D movies (Bill Mohler), and single-molecule FISH (van Oudenarden et al)
- Itai Yania dataset (embryo expression across several nematode species)
- Expression SVM won't catch isolated tissue/cells expression analysis or microarray data
- Incorporating the virtual worm and browser
- Microarray and SPELL
- Will incorporate microarray and RNA-Seq data sets for other species
- Should let users download search results more easily (for single genes, for example)
- Need to change SPELL database to incorporate new species
- Users should be able to run clustering on data
- Co-expression correlation; should recalculate each build (with flexible significance thresholds)
- Provide Cytoscape view of genes connected by co-expression
- Pathways and Processes
- Plan to work with Wikipathways
- Vocabularies and annotation schemes like Systems Biology Graphical Notation (SBGN)
- Trying to get data into BioPAX (Biological PAthway eXchange) format
- BioPAX too detail oriented? Very biochemical?
- Some databases dump BioPAX format, but won't read it in
- Paper and curation pipeline
- Concise description progress; coverage
- Re-annotation efforts?
- New concise description curation interface for easier writing and updating
- Annotating genes in the more expressive GO format
- How would our data models need to change to curate with the new expressive GO
- SVMs
- Include collaborations?
- GSA markup; encouraging other journals to adopt?
- Web page links; electronic text books, etc.
- Can automate linking, but can't support manual QC without more (financial) support
- Supporting links to WormBase (in general)
- WormAtlas, for example
- Google-like entity info (e.g. George Washington) displayed on side of search results page
- We provide short write up of genes to Google?
- Google-funded? Google.org
- Transcriptional regulatory networks (TRNs)
- Gene regulation curation
- Limited number of data for Position weight matrices (PWMs) and TF-binding sites
- Consolidation of TF-binding/target-gene data into one place (ChIP-Seq/modENCODE data, PWMs, Gene regulation interactions)
- How to best visualize the available data?
- We can design a new visual scheme for TRNs
- We will curate enhancers?
- Suggestions for future
- Better integration across data types
- How the OA can evolve and what it can be used for?
August 9, 2012
- Sandbox version available for testing
- Data stored on Postgres
- Daniela can show how to use
- SVM flags main papers and supplemental documents; should they be grouped into a single document or kept separate?
- Depends on curator
- Should have a direct (unambiguous) link to supplemental documents
- Can flag false positive papers
- Can query for papers on a batch-per-batch (by SVM analysis date) basis
- False negatives are automatically annotated as such when an SVM-negative paper is curated for the respective data type in the OA
- Curators CAN check SVM-negatives if they want to, but are not required to
- Can query if a specific paper (or papers) has been flagged (by SVM) for certain data types
- Proposed OA field to capture what supplemental document the data came from, if from supplement
Grant
- People can add new ideas/visions for future development of WormBase
- Visualization, integration, graphs, etc.
- How do we visualize complex information?
- Do we need to group data types for visualization? E.G. Transcriptional regulatory networks vs genome browsing
- Scaling?
- Dependency on ACEDB?
August 16, 2012
Helpdesk
- GitHub link to e-mail?
- When an issue is submitted via the website, GitHub generates a unique e-mail which can be replied to. That e-mail thread is then included in the GitHub issue
- Who should close an issue? Ultimately, an issue/ticket should be closed by whichever WormBase staff addresses or resolves the issue
- If the issue is not closed by the one who resolves it, the help desk officer should follow up to check
- When is an issue resolved? May depend on the nature of the issue and on what has to be done
- Need project management tools? Redmine? Something else?
Large scale projects
- Scripts and documentation used to deal with/handle large scale data should be put into GitHub
- Example, Itai Yanai expression data; just another microarray paper, but with new oligo sets?
- Need to store enough info to reproduce curation
- Store all large scale data sets and scripts, documentation, etc. on a single computer with regular backup
August 23, 2012
Transgene tables
- Not available on new site
- Broken on legacy site
- Integrated transgenes' location (which chromosome?)
- Static page was proposed for new site
- We (Caltech) could possibly put on the new WormBase Support section (which we have write-access to)
Finding all labs in a region (user request)
- How to best identify all labs in, for example, England, Asia, South America, Canada, etc.
- Can search using patterns for e-mail address (not optimal)
- Maybe better search with physical mailing address, but need all country codes, and country-continent affiliations
- We can possibly create a script to generate a table every release
- Change data model to have a "Country" tag? And then programmatically assign continents based on Country tag
- Juancarlos can extract PI address info from Postgres and setup a CGI for future search
SVM Tool and Data Type Statistics
- Chris would like to get Interaction numbers (how complete are we with curation)
- SVM Tool needs some tweaks:
- All papers are broken up into several documents, if there is supplementary material
- If a curator searches for all "Positive" papers, all papers that have at least ONE "Positive" document will be returned
- Conversely, if a curator searches for all "Negative" papers, all papers that have at least ONE "Negative" document will be returned
- This is a problem, since we would want all papers in which, ALL documents are "Negative"
- What are the benefits/drawbacks of separating a paper into multiple documents?
- Kimberly likes to have separate documents to reduce amount of data type searching to be done once we have SVM results
- Keeping multiple documents may complicate the search procedure, unless we can change the query process
- Juancarlos will create a filter step in the SVM tool such that the user/curator can specify if they would like to search on the basis of "whole" paper versus individual document
- In this way, searching for "whole" "Positive" papers will only return and display "Positive" whole papers for which at least one document in the paper is "Positive"; searching for "Negative" papers will return all papers for which ALL documents are "Negative"
- Searching on the basis of individual documents will work as it does now: displaying all papers as individual documents with their respective SVM results
- Juancarlos will also add a filter for "Primary" vs "Not Primary" vs "Not Designated"
August 30, 2012
Transgenes
- Transgenes have now transgene IDs
- each curator should check dumpers for next upload (in 2 months) on mangolassi and see that everything looks fine.
- Each curator should also make sure that all the transgene objects they have in their curation pipeline are converted into IDs. E.g. Kimberly had a bunch of transgenes that were not converted.