ISB2014 Session2 Systems Biology
From WormBaseWiki
Jump to navigationJump to search
Session 1 - Systems Biology http://biocuration2014.events.oicr.on.ca/agenda-5 link to all ISB2014 notes: http://etherpad.wikimedia.org/p/isb2014 Help me take collaborative notes on this session! Fill out your name in the box in the top right-hand corner. Add yourself to the editors list, then edit away! I make a lot of typos. Sorry. Editors name / affiliation / twitter Abigail Cabunoc / OICR / @abbycabs 13:00 - 14:30 Session 2 - Systems Biology, Chaired by Henning Hermjakob and Fritz Roth The Great Hall, Hart House -------------------------------------------------------------------------- 13:00 - 13:20 Reactome Knowledgebase of reactions, pathways and biological processes Robin Haw OICR, Canada Abstract The Reactome Knowledgebase is an open access, curated and peer-reviewed database of human biolog- ical pathways and processes, can be freely used and distributed by all members of the biological research community. Geneticists, genomics and proteomics researchers, clinicians, molecular biologists, bioin- formaticians and systems biologists use Reactome to interpret high-throughput experimental datasets, to develop novel algorithms for data mining and visualization, and to build predictive models of normal and abnormal pathways. The Reactome curation system draws upon the expertise of independent re- searchers who author precise machine-readable descriptions of human pathways under the guidance of a team of curators. Pathway modules are extensively checked to ensure factual accuracy and compliance with the data model, and a system of evidence tracking ensures that all assertions are backed by the pri- mary literature. Recent extensions of our data model accommodate the annotation of disease processes, allowing us to represent the altered biological behaviour of mutant variants frequently found in cancer, and to describe the mode of action and specificity of anti-cancer therapeutics. Reactome pathways currently cover a third of the translated portion of the genome, and are available on our web site for browsing, downloading, and manipulation by in-house and third party online analysis tools. Pathway data can be ex- ported in several formats including SBML, BioPAX and derived interactions. To increase protein coverage and associated annotations, we have extended our protein coverage by offering a network of ”functional interactions” (FIs) predicted by a conservative machine-learning approach. We offer several analytical tools built upon the Reactome FI network and have begun to demonstrate the network’s usefulness for the analysis of genome-scale datasets in human disease research. Notes Reactome Knowledgebase - storage framework for system bio modelling Reaction graph series of reactions & relationships reaction graphs come a long to organize biological pathways data viz, analysis, integration, systems bio Reactome open source, open access database 15K human pthwys curation system - relies of expert knowledge human & machine readable from literature link outs to other databases Goal: Provide diff pathway based tools and ways to explore datasets reaction network database node: bio molecule (proteins, macro mol complexes, small molecules, disease variant, etc...) edge: conversion of one bio molecule - reaction reactome data model: flexible many different types To improve viz representation - system bio graph representation - BGN molecules have particular shapes/icons google maps style pathway diagrams panel wiht pathway diagram (zoom in/out, navigate around, click on entities, overlap experimental datasets) click on element - info displayed in panel below details - link otu to biomodels (linking pathways to known models) context sensitive information structural info (pdb) PSICQUIC - iteraction data from STRING....actually it was IntAct - changed the slide not the title..oops. Pathway browser - framework for data integration and analysis in the website compare pathways between human and model orgs pathway mapping and enrichment analysis Open data standards if you dont' want to use the website- download hte pathway open access interchange formal (SBML represents models of biochem pathways reactions and networks biopax - language - exchange upload to cytoscape e.g. click on download from pathway browser - get different open standards loose a little bit of data (layout) SBGN - can download and edit layout Network Module based analysis of disease OMICS datasets functional interaction plugin aid with hi-throughput datasets construct sub-network based on set of genes query source of interactions functional enrichment analysis to annotate these models large datasets: filter down to handful of mutated pathways generate biological hypothesis Future work: reactome - increase number ofcurated proteins supplement normal pthwys with variante reactions - disease states improve annotation consistency work with biocuration community for ontology support SBGN, SBML, BioPAX, PSI Enhance web resources Service framework - don't curate models. Integrate pathways into resources all data and sortware are oepn to public Q & A Q: Disease networks- which disease turned out ot be easy? can you actually cover the disease states? A: the flexibility of hte model helps ups capture the disease. Still challenges with holding info long term and visualizing. clinicians wnat a different view than biologists. So far, we havne't met too many difficult challenges. Capture mutations- capture relationships Q: How many pieces of evidence do you need for a step to be included in a pthway? Do you need multiple pieces to support that or cna it be a single study? A: It would have to be multiple references. Each reaction should have literature citation. The pathwya itself should have multiple references. Pathway and disease state. Q: What organisms reactome covers? Toosl for comparative pathways? A: Main focus: human bio pathways. Other groups curating arabidopsis manual. Computationally: provide for 19 model orgs (mouse, rat, etc). We do provide access to bacteria, pathways vis tools, image analysis tools, species comparison tools. show model org pathway side by side. -------------------------------------------------------------------------- 13:20 - 13:40 Verification of Systems Biology Research in the Age of Collaborative Competition Sam Ansari PMI, Switzerland Abstract sbv IMPROVER (systems biology verification - Industrial Methodology for PROcess VErification in Re- search) is a challenge-based program with a specific focus on the verification of industrial research pro- cesses related to systems biology. The first challenge (Diagnostic Signature) was designed to determine to what extent transcriptomic data can be used for phenotype prediction and to identify best-performing computational methods. The second challenge (Species Translation) was designed to address the ex- tent to which biological effects of stimulus-induced perturbations in rats translate to those in humans. In the current challenge (Network Verification) we provide the community with network models of molec- ular events contributing to the Chronic Obstructive Pulmonary Disease (COPD). These models of key biological processes include access to underlying scientific literature citations that have been expertly curated to provide mechanistic substantiation for each molecular relationship represented. The scientific community will be encouraged in the review of the relationships between molecular entities and to make improvements on the represented biology covering fundamental processes involved in respiratory dis- ease. Web-based graphical interfaces are used to visualize the biological relationships. Crowdsourcing principles enable participants to annotate these relationships based on literature evidences. A text analyt- ics web service can be used by participants to assist with the creation of OpenBEL compliant knowledge statements given evidence lines from references. Best performers in the crowd-verification phase will be invited to a 3-day event to resolve controversies with subject matter experts, finalizing and publishing the network models. The resulting models will represent the current status of biological knowledge within the defined boundaries. For some period following conclusion of the challenge, the published models will remain available for continuou s use and expansion by the scientific community. This work resulted from a scientific collaboration between Philip Morris International (PMI) and IBM’s Thomas J. Watson Research Center on a project funded by PMI. Notes https://www.sbvimprover.com/ project dirven by scientists - IBM watson team & another team... Testable blocks Watson Team - DREAM challenges Crowdsourcing - backbone of system bio Why do we need IMPROVER facing proteomics nad more omics methods same time, vivid area in literature where new knowledge coming in (curation, hi throughput) methodology - many different processes. a lot of noise in this area to come up with the best approach 2011 study - how do we verify methodology/data? most publications self-assess locgical - each study has a special context unfortunatly methodology does not apply to other cases looking for independent verification 3 challenges: crowdsourcing, IBM watson team 1. diagnostic signature challenge processing of gene expression dta to predict the phenotype prepare test dataset so participants can train and create classifiers test set created with known outcome phenotypes (e.g. diseases - lunch cancer) 54 scientific teams 2. translatability - species translation challenge better understanding of what processes are translatable between orgs. which networks translate well orthologous databases? outcome: 2013, 28 teams gathering to come up wiht publication - what models worked and verified translate between species 3. Network verification challenge 4 years activity to understand disease mechanisms come up wiht netowrk models - help us to find suitable biomarkers for product testing focused on respiratory/cardiovascular diseases used unpublished knowledge used networks as starting point for this challenge verifying what is displayed in these networks 5 phase approache 1. prepare data select articles, datasets, combine knowledge, curate, feed to networks (4-5 years) represented in network - causalbionet 2. crowdsource - interface website allow us ot review the networks modify/vote for networks space to exchange/discuss important: easy way for participants to review the netowrk/code prepare traiing material with 'game rules' and language easy to understand based on gaming rules - reputation, leader board 3. wrapping up after challenge and contributions stop point where do we have clear understandings? lots of ppl voted - frozen contradiction subject for /another/ meeting - face to face (nice location -reward!) review all the feedback from networks - select sub-networks we need face to face come to conclusion experts come together once we have meeting: come to conclusion, update networks we have. start from scratch again iterations Boundary conditions of netowrks focus on lung and cardio tissue canonical mechanisms included human data and edges prioritiezed initators of and responses to biological signaling included 50 network models selected cell fate, cell proliferation, cell stress, tissue repaire angiogenesis, inflamation make it easier to find network where you have expertise how do we code this knowledge? bel - communication standards coding of teh knowledge full text articles - manually curated, assembled in network (bel) BEL - semantic triple allows subject, object, function, nested functions, flexible ontology tool fixed nomenclature - type of relationships evidence based collect # of evidences - translate evidence text into this coding comes close to written language - human readable & machine readable preferred coding language released for open access - hope others will see the benefits Prepare whole interace where participants reigster and vote self-moderated reputation system scoring based - at a node or edge with high score - if truth verified, then much higher scoring can focus energies on approving or rejecting if my contribution led to an event in this edge - my score is higher leader board - see how high score tricks so not discouraged by high scores keep work up and running prepared video: training material for users to get a quick start Q & A: Q: Generate the model - it has value bc evidence. static. Have you thought about reachign a place where you want this to be live model? A: General RDF and bel converter. Explort and load to cytoscape - lose some level of information (evidence tags), but there is this possibility. We intent to share it through this verification platform so users cna contribute and download. -------------------------------------------------------------------------- 13:40 - 14:00 Linking tissues to phenotypes using gene expression profiles Anika Oellrich Wellcome Trust Sanger Institute, UK Abstract Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3,500) of these diseases are still without identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human disease. Targeted modifications led to a vast amount of model organism data. However, this data is scattered across different databases, preventing an integrated view and miss- ing out on context information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease, and how species differ. Here, we present an inte- grated data resource combining tissue expression with phenotypes in mouse lines, and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases a systems level approach is required to understand how perturbations to gene-networks connecting mul- tiple tissues leads to a phenotype. Automated evaluation of the predicted tissue-phenotype associations reveals that 72-76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55-64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between total body fat ab- normalities and genes expressed in the brain which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue-phenotype associa- tions can improve the detection of a known disease-gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1 rises from the 7th best candidate to t he top hit when the associated tissues are taken into consideration. Data accessible via: http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list. Notes Started looking into this - human heritable disease without identified cause still 3.5K disease numerous effort number of obstacles - identifying disease genes number of times disease occurs rarely occuring - hard to establish signs and symptoms and genetics connection between gene nad phenotype multiplicity - one gnee mult. phenotypes other way around also true several layers involved individual several organs organs, several tissues tissues, several cells try to do this automatically, but not always spacial connection between genotype to phenotype data resources: expression and phenotype data: sanger mouse genetics phenotype data - mouse genome db barcode gene intersection of all the data resources 21 adult tissues, 5K genes connect tissue to phenotype 1. data download 2. data harmonization 3. association - tissue and phenotye 1. association rules 2. hypergeometric distribution Paper - poster here #8 4. evaluation - (automated & manual) 5. phenodigm 3K tissue-phenotype associations phenotype not associated with one particular tissue gene expressed in different tissues - same phenotype or combination of differnet genes don't limit to only one tissue per phenotype Evaluation: automated 75% cover associated needed curated set Entity-Quality statements available for mouse phenotypes contain tissue info manual evaluation of 'no tissue match' cases genes in brain - total fat abnormalities genes in spleen - liver hypoplasia also saw - don't obtain observed tissue which should have been there could the results be useful to any other studies? tissue-phenotype associattions improve disease-gene discovery phenodigm associations available online - sanger.ac.uk/resources/databases/phenodigm/phenotype/list http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list Q & A Q: in measuring association, there's significance and strength of hte association. p-value? A: software applied - hard to get p-values. Don't get entire range of p-values which you would require for testing. If we export the associations and say anything above a certain threshold is okay, we do evaluation afterwards. Better assessment in future: what should p-value be. Just experiment ot see if we can get issue-phenotype association. -------------------------------------------------------------------------- 14:00 - 14:10 UbiGRID: a database resource for protein and genetic interactions of the ubiquitin-proteasome system Andrew Chatr-Aryamontri IRIC - University of Montreal, Canada Abstract The covalent attachment of ubiquitin to substrate proteins controls the stability, interactions, activity and/or localization of much of the proteome. The canonical ubiquitination cascade proceeds by a three-step process: activation of ubiquitin as a thioester by an E1 enzyme, transfer to an E2 enzyme as a thioester intermediate, and conjugation to a lysine residue (or N-terminal amino group) on the substrate, or to lysine residues of a previously conjugated ubiquitin moiety to build ubiquitin chains. The fate of the ubiquitinated substrate is determined by interactions with a variety of ubiquitin binding domains, which can direct the substrate for degradation by the 26S proteasome or alter substrate localization, interactions or activity. Fi- nally, the extent of substrate ubiquitination is dynamically controlled by a host of deubiquitinating enzymes. Despite the fundamental role played by the ubiquitin-proteasome system (UPS) in the regulation of many cellular processes and in human diseases, its therapeutic potential has been largely underexplored. This is due to insufficient understanding of the interdependence and redundancy between the different sys- tem components and an incomplete mapping of substrate-UPS system interactions. Nonetheless, the development and commercialization of Bortezomib (Velcade), a non-reversible proteasome inhibitor for the treatment of refractory multiple myeloma, has demonstrated that the UPS can be successfully mod- ulated for therapeutic benefit. Given the broad effects of the UPS on the proteome, and its consequent connections to many disease states, the UPS is currently under intense investigation in the biotechnology and pharmaceutical sectors. To facilitate understanding of the UPS and its links to disease, the UbiGRID curation project aims to comprehensively curate the biomedical literature for genetic and protein interac- tions of all UPS genes/proteins in humans, budding yeast and other model species. UbiGRID will serve as a centralized resource for three types of data: (i) an annotated reference set of UPS components or- ganized into functional classes; (ii) comprehensive curation of genetic and protein interactions for all UPS genes; (iii) the annotation of ubiquitinated residues derived from mass spectrometry datasets. We have recently completed curation of the entire biomedical literature for interactions of the UPS in human (82,218 interactions derived from 8,377 publications) and budding yeast (42,672 interactions derived from 2,472 publications). An updated analysis of the network properties and disease connections of the UPS will be presented. UbiGRID datasets are freely accessible through a dedicated web interface and available for download in tabular or PSI-MI formats. Notes Ubiquitin-proteasome system (UPS) multistep process controls stability/activity of most proteome highly hierarchical weak interactions Disease - link system to human diseases can virtualize the mutations affect the system over degradation of protein - loss of function e.g. parkin Consequence- small molecule inhibitors /// SORRY LOST WIFI FOR A BIT UBiGRID curation project annotate ubiquitination sites, drafting members of UPS (gene list) human- 1K genes for more than 50% of genes, no functional annotations conservation of UPS the closing interface, system and proteome, more conservation core of work: annotating interactions 20K publications - annotating interactions resulted in more than 80K interactions (human) 20K physical/genetic interactions ubiquitylome - ubiquitination profiles UPS network features close relevance to other groups knowledge base for human UPS disease biology - better understanding of relationship between different genes microarray data, proteomics data Future plans: dedicated web interface UbiGRID on monthly bases and more Q & A Q: do you feel you have achieved complete coverage of the domain? A: Took awhile - always moving back and forth - list of genes and anotation. Many more genes than expected. Surprises by changes inlist of gnees. hoping last iteration - mostly just maintenance curation. But never say never. -------------------------------------------------------------------------- 14:10 - 14:20 Yeast – why it simply has a lot to say about human disease Selina Dwight Stanford University, USA Abstract Science has long approached a complex system by exploring a simpler system that exhibits similar func- tionality. What happens when the complex system itself becomes more penetrable? Human gene prod- ucts have been known for decades to have counterparts in S. cerevisiae, and detailed knowledge about these S. cerevisiae genes has provided clues to basic cellular functions in humans and other higher or- ganisms. In the past decade, however, technological advances in genomic sequencing and other methods have allowed more direct studies in human cells. Contrary to what might be expected, the result is not that budding yeast is less consequential as a model organism, but that increasing information about hu- man genes verifies and expands the connections between the two organisms, allowing human research to more readily leverage yeast knowledge. For instance, identification of regions in human genes with similarity to yeast prion-like domains led to the discovery of a mutation in an evolutionarily conserved residue in a gene that co-segregates with ALS. Conversely, increasing knowledge about human biologi- cal processes also suggests areas for further study in yeast. As an example, mutation of mitochondrial DNA associated with aging, disease, and anticancer agents has prompted studies of pathways in yeast that sense the nutritional state of the cell and affect mitochondrial function. Lastly, while yeast may be a relatively simple system, yeast research has witnessed its own advances in technology that serve to further its relevance as a model organism - such as chemogenomic profiling of systematic deletion strains to identify potential drug targets and the mechanisms of action by which compounds exert their effects on disease. As researchers continue to elucidate functional homology between yeast and humans, and to employ yeast as a tool for discovery, it becomes increasingly important to highlight and connect this infor- mation. The Saccharomyces Genome Database (SGD, w ww.yeastgenome.org) is building a new class of information called ”Species Connections” towards this goal through the incorporation of OMIM homologs and phenotypes, phenologs, drug interactions, and functional homology studies. Notes Yeast - it simply has a lot to say about human disease Yeast provide with beer and wine and bread influencial in the office too- a lot to say about human disease article in news: kids who can't cry symptom + genomic sequencing 8 different cases, mutation in ame gene identify main character in teh diease far less interesting without plot gene with no bio context, less informative Yeast cna provide missing context richness of data associated with it - multi dimensional story main observation: representation of /something/ stitching together 6 differnet datatypes of yeast genomic representation correlations between gene variation, gene expression, metabolite providers gene in context - clues to missing story SGD provides several ways to connect human genes, yeast genes and richness of yeast data orthologues drug related phenotypes literature: associate yeast genes with papers which describe human counterparts full text searchabel by keywords collect ino: ability for human gene to substitute yeast gene in cell drug phenotypes - search for shared phenotyes in orthologs searchable connections search for human gene - identify human dieases adn orthologs search other way, too many genes in yeast can help tell human story nearly 50% of all yeast genes have one or more human orthologs and function in yeast maybe we can begin to tell a more complete story about human disease SGD Powering connections Q & A Q: it's 50 or 60% of human genes that have at least domain level homology A: go yeast! Q: Wondering if SGD has plans to curate signalling pathways in yeast? A: We do have some pathway info. We are looking to incorporate more. On horizon. Q: Pooling all this data - yeast coming soon. A lot of times similar types are in the same pathway A: plans to get some mamallian phenotype data in. Link to omim disease data. -------------------------------------------------------------------------- 14:20 - 14:30 Visualization and analysis of data using Atlas of Cancer Signalling Networks (ACSN) and NaviCell tools for integrative systems biology of cancer Inna Kuperstein Institut Curie, France Abstract Poster 11 - further discussion https://acsn.curie.fr/ Studying reciprocal regulations between cancer-related pathways is essential for understanding signalling rewiring during cancer evolution and in response to treatments. To allow systematic analysis of cancer signalling, the knowledge about cell mechanisms dispersed in scientific literature can be collectively rep- resented in the form of comprehensive maps of signalling networks amenable for computational analytical methods. The Atlas of Cancer Signalling Networks (ACSN, http://acsn.curie.fr) is a resource of cancer signalling tools with interactive web-based environment for navigation, curation and data visualisation supported by a user friendly Google Maps-based tool NaviCell (http://navicell.curie.fr). Construction and update of ACSN involves manual mining of molecular biology literature and participation of the experts in the corresponding fields. ACSN covers major mechanisms involved in cancer progression systemat- ically represented in the form of comprehensive interconnected maps. Cell signalling mechanisms are depicted on the maps in great detail, together creating a seamless map of molecular interactions in can- cer. The content of ACSN is visually presented in the form of a global ’geographic-like’ molecular map browsable using the Google Maps engine and semantic zooming. The associated blog provides a forum for commenting and curating the ACSN maps content. ACSN and NaviCell are a systems biology tool for integration and visualization of cancer molecular profiles generated by high through-put techniques as genome, transcriptome, proteome or analysing results from drug screenings or synthetic interaction studies. Integration and analysis of these data in the context of ACSN helps in interpretation, understand- ing the biological significance of the data and rationalising the scientific hypothesis. This network-based approach is a framework form deciphering complex molecular characteristics of cancers, improving pa- tients stratification, predicting respon ses and resistance to cancer drugs and proposing new treatment strategies. Notes Cancer - complicated disease multiple signals regulation pathways in different cancers - frequently mutated in cancer presented together atlas of cancer signalling network manually curated by expert layout of maps, curated by more experts global map of processes - similar to global map divided to modular maps - processes Atlas is huge standards and tools - SBGN syntax - visual syntax integrate with other groups community effort Open source - accessed via website currently huge map with detailed ongoing project all interconnected map created navigation tool NaviCell: google map engine + blog (wordpress) very user friendly - fiends at hospital can use it associated with blog - each entity annotated and posted on blog mutation data vis using ACSN high throughput data - cancer research 'omics typical and simple thing to vis: list of genes mutation data, e.g. list of oncogenes most frequently mutated accross cancers Expression data - every gene may have multiple functions in cell ACSN 'staining' for module activity visualization of phosphoproteome data - ACSN Since networks interconnected - signalling networks for intervention strategy design synthetic lethality - case of negative genetic interactions OCSANA- tools that calculates based on structural analysis and data Conlcusions ACSN - resource of cnacer signalling knowledge Q & A: Q: do you provide your data in BioPAX? A: Each map and global map downloadable in several formats. including biopax Q: License? A: Open source Q: When mapping postulation onto proteins - multiple sites? A: map together - each entity representated multiple times, every entity correspondign phosphorylation data