ISB2014 Session2 Systems Biology

From WormBaseWiki
Jump to: navigation, search
Session 1 - Systems Biology
http://biocuration2014.events.oicr.on.ca/agenda-5
link to all ISB2014 notes:
    http://etherpad.wikimedia.org/p/isb2014
Help me take collaborative notes on this session! Fill out your name in the box in the top right-hand corner. Add yourself to the editors list, then edit away!
I make a lot of typos. Sorry.
Editors

    name / affiliation / twitter

    Abigail Cabunoc / OICR / @abbycabs

13:00 - 14:30
Session 2 - Systems Biology, Chaired by Henning Hermjakob and Fritz Roth
The Great Hall, Hart House
--------------------------------------------------------------------------
13:00 - 13:20
Reactome Knowledgebase of reactions, pathways and biological processes
Robin Haw
OICR, Canada
Abstract
The Reactome Knowledgebase is an open access, curated and peer-reviewed database of human biolog- ical pathways and processes, can be freely used and distributed by all members of the biological research community. Geneticists, genomics and proteomics researchers, clinicians, molecular biologists, bioin- formaticians and systems biologists use Reactome to interpret high-throughput experimental datasets, to develop novel algorithms for data mining and visualization, and to build predictive models of normal and abnormal pathways. The Reactome curation system draws upon the expertise of independent re- searchers who author precise machine-readable descriptions of human pathways under the guidance of a team of curators. Pathway modules are extensively checked to ensure factual accuracy and compliance with the data model, and a system of evidence tracking ensures that all assertions are backed by the pri- mary literature. Recent extensions of our data model accommodate the annotation of disease processes, allowing us to represent the altered biological behaviour of mutant variants frequently found in cancer, and to describe the mode of action and specificity of anti-cancer therapeutics. Reactome pathways currently cover a third of the translated portion of the genome, and are available on our web site for browsing, downloading, and manipulation by in-house and third party online analysis tools. Pathway data can be ex- ported in several formats including SBML, BioPAX and derived interactions. To increase protein coverage and associated annotations, we have extended our protein coverage by offering a network of ”functional interactions” (FIs) predicted by a conservative machine-learning approach. We offer several analytical tools built upon the Reactome FI network and have begun to demonstrate the network’s usefulness for the analysis of genome-scale datasets in human disease research. 
Notes
Reactome Knowledgebase - storage framework for system bio modelling

    Reaction graph 

    series of reactions & relationships

    reaction graphs

    come a long to organize biological pathways

    data viz, analysis, integration, systems bio

    Reactome

    open source, open access database

    15K human pthwys

    curation system - relies of expert knowledge 

    human & machine readable from literature

    link outs to other databases

    Goal: Provide diff pathway based tools and ways to explore datasets

    reaction network database

    node: bio molecule (proteins, macro mol complexes, small molecules, disease variant, etc...)

    edge: conversion of one bio molecule - reaction

    reactome data model: flexible

    many different types 

    To improve viz representation - system bio graph representation - BGN

    molecules have particular shapes/icons

    google maps style pathway diagrams

    panel wiht pathway diagram (zoom in/out, navigate around, click on entities, overlap experimental datasets)

    click on element - info displayed in panel below

    details - link otu to biomodels (linking pathways to known models)

    context sensitive information

    structural info (pdb)

    PSICQUIC - iteraction data from STRING....actually it was IntAct - changed the slide not the title..oops.

    Pathway browser - framework for data integration and analysis in the website

    compare pathways between human and model orgs

    pathway mapping and enrichment analysis

    Open data standards

    if you dont' want to use the website-  download hte pathway

    open access interchange formal (SBML 

    represents models of biochem pathways reactions and networks

    biopax - language - exchange

    upload to cytoscape

    e.g. click on download from pathway browser - get different open standards

    loose a little bit of data (layout)

    SBGN - can download and edit layout

    Network Module based analysis of disease OMICS datasets

    functional interaction plugin

    aid with hi-throughput datasets

    construct sub-network based on set of genes

    query source of interactions

    functional enrichment analysis to annotate these models

    large datasets: filter down to handful of mutated pathways

    generate biological hypothesis

    Future work:

    reactome - increase number ofcurated proteins

    supplement normal pthwys with variante reactions  - disease states

    improve annotation consistency

    work with biocuration community for ontology support

    SBGN, SBML, BioPAX, PSI

    Enhance web resources

    Service framework - don't curate models. Integrate pathways into resources

    all data and sortware are oepn to public

Q & A
Q: Disease networks- which disease turned out ot be easy? can you actually cover the disease states?
A: the flexibility of hte model helps ups capture the disease. Still challenges with holding info long term and visualizing. clinicians wnat a different view than biologists. So far, we havne't met too many difficult challenges. Capture mutations- capture relationships
Q: How many pieces of evidence do you need for a step to be included in a pthway? Do you need multiple pieces to support that or cna it be a single study?
A: It would have to be multiple references. Each reaction should have literature citation. The pathwya itself should have multiple references. Pathway and disease state.
Q: What organisms reactome covers? Toosl for comparative pathways?
A: Main focus: human bio pathways. Other groups curating arabidopsis manual. Computationally: provide for 19 model orgs (mouse, rat, etc). We do provide access to bacteria, pathways vis tools, image analysis tools, species comparison tools. show model org pathway side by side.
--------------------------------------------------------------------------
13:20 - 13:40
Verification of Systems Biology Research in the Age of Collaborative Competition
Sam Ansari
PMI, Switzerland
Abstract
sbv IMPROVER (systems biology verification - Industrial Methodology for PROcess VErification in Re- search) is a challenge-based program with a specific focus on the verification of industrial research pro- cesses related to systems biology. The first challenge (Diagnostic Signature) was designed to determine to what extent transcriptomic data can be used for phenotype prediction and to identify best-performing computational methods. The second challenge (Species Translation) was designed to address the ex- tent to which biological effects of stimulus-induced perturbations in rats translate to those in humans. In the current challenge (Network Verification) we provide the community with network models of molec- ular events contributing to the Chronic Obstructive Pulmonary Disease (COPD). These models of key biological processes include access to underlying scientific literature citations that have been expertly curated to provide mechanistic substantiation for each molecular relationship represented. The scientific community will be encouraged in the review of the relationships between molecular entities and to make improvements on the represented biology covering fundamental processes involved in respiratory dis- ease. Web-based graphical interfaces are used to visualize the biological relationships. Crowdsourcing principles enable participants to annotate these relationships based on literature evidences. A text analyt- ics web service can be used by participants to assist with the creation of OpenBEL compliant knowledge statements given evidence lines from references. Best performers in the crowd-verification phase will be invited to a 3-day event to resolve controversies with subject matter experts, finalizing and publishing the network models. The resulting models will represent the current status of biological knowledge within the defined boundaries. For some period following conclusion of the challenge, the published models will remain available for continuou
s use and expansion by the scientific community. This work resulted from a scientific collaboration between Philip Morris International (PMI) and IBM’s Thomas J. Watson Research Center on a project funded by PMI. 
Notes
https://www.sbvimprover.com/
project dirven by scientists - IBM watson team & another team...

    Testable blocks

    Watson Team - DREAM challenges

    Crowdsourcing - backbone of system bio

    Why do we need IMPROVER

    facing proteomics nad more omics methods

    same time, vivid area in literature where new knowledge coming in (curation, hi throughput)

    methodology - many different processes.

    a lot of noise in this area to come up with the best approach

    2011 study - how do we verify methodology/data?

    most publications self-assess

    locgical - each study has a special context

    unfortunatly methodology does not apply to other cases

    looking for independent verification

    3 challenges: crowdsourcing, IBM watson team

    1. diagnostic signature challenge

    processing of gene expression dta to predict the phenotype

    prepare test dataset so participants can train and create classifiers

    test set created with known outcome phenotypes (e.g. diseases - lunch cancer)

    54 scientific teams 

    2. translatability - species translation challenge

    better understanding of what processes are translatable between orgs.

     which networks translate well

    orthologous databases?

    outcome: 2013, 28 teams gathering to come up wiht publication - what models worked and verified translate between species

    3. Network verification challenge

    4 years activity to understand disease mechanisms

    come up wiht netowrk models - help us to find suitable biomarkers for product testing

    focused on respiratory/cardiovascular diseases

    used unpublished knowledge

    used networks as starting point for this challenge

    verifying what is displayed in these networks

    5 phase approache

    1. prepare data

    select articles, datasets, combine knowledge, curate, feed to networks (4-5 years)

    represented in network - causalbionet

    2. crowdsource - interface

    website 

    allow us ot review the networks

    modify/vote for networks

    space to exchange/discuss

    important: easy way for participants to review the netowrk/code

    prepare traiing material with 'game rules' and language easy to understand

    based on gaming rules - reputation, leader board

    3. wrapping up after challenge and contributions

    stop point

    where do we have clear understandings?

    lots of ppl voted - frozen

    contradiction

    subject for /another/ meeting - face to face (nice location -reward!)

    review all the feedback from networks - select sub-networks we need face to face

    come to conclusion

    experts come together

    once we have meeting: come to conclusion, update networks we have. start from scratch again

    iterations

    Boundary conditions of netowrks

    focus on lung and cardio tissue

    canonical mechanisms included

    human data and edges prioritiezed

    initators of and responses to biological signaling included

    50 network models selected

    cell fate, cell proliferation, cell stress, tissue repaire angiogenesis, inflamation

    make it easier to find network where you have expertise

    how do we code this knowledge?

    bel - communication standards

    coding of teh knowledge

    full text articles - manually curated, assembled in network (bel)

    BEL - semantic triple

    allows subject, object, function, nested functions, flexible

    ontology tool

    fixed nomenclature - type of relationships

    evidence based

    collect # of evidences - translate evidence text into this coding

    comes close to written language - human readable & machine readable

    preferred coding language

    released for open access - hope others will see the benefits

    Prepare whole interace

    where participants reigster and vote

    self-moderated reputation system

    scoring based - at a node or edge with high score - if truth verified, then much higher scoring

    can focus energies on approving or rejecting

    if my contribution led to an event in this edge - my score is higher

    leader board - see how high score

    tricks so not discouraged by high scores

    keep work up and running

    prepared video: training material for users to get a quick start

Q & A:
Q: Generate the model - it has value bc evidence. static. Have you thought about reachign a place where you want this to be live model?
A: General RDF and bel converter. Explort and load to cytoscape - lose some level of information (evidence tags), but there is this possibility. We intent to share it through this verification platform so users cna contribute and download.
--------------------------------------------------------------------------
13:40 - 14:00
Linking tissues to phenotypes using gene expression profiles
Anika Oellrich
Wellcome Trust Sanger Institute, UK
Abstract
Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3,500) of these diseases are still without identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human disease. Targeted modifications led to a vast amount of model organism data. However, this data is scattered across different databases, preventing an integrated view and miss- ing out on context information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease, and how species differ. Here, we present an inte- grated data resource combining tissue expression with phenotypes in mouse lines, and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases a systems level approach is required to understand how perturbations to gene-networks connecting mul- tiple tissues leads to a phenotype. Automated evaluation of the predicted tissue-phenotype associations reveals that 72-76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55-64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between total body fat ab- normalities and genes expressed in the brain which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue-phenotype associa- tions can improve the detection of a known disease-gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1 rises from the 7th best candidate to t
he top hit when the associated tissues are taken into consideration. Data accessible via: http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list.
Notes

    Started looking into this - human heritable disease without identified cause

    still 3.5K disease 

    numerous effort 

    number of obstacles - identifying disease genes

    number of times disease occurs

    rarely occuring - hard to establish signs and symptoms and genetics

    connection between gene nad phenotype

    multiplicity - one gnee mult. phenotypes

    other way around also true

    several layers involved 

    individual several organs

    organs, several tissues

    tissues, several cells

    try to do this automatically, but not always spacial connection between genotype to phenotype

    data resources:

    expression and phenotype data: sanger mouse genetics

    phenotype data - mouse genome db

    barcode gene

    intersection of all the data resources

    21 adult tissues, 5K genes

    connect tissue to phenotype

    1. data download

    2. data harmonization

    3. association - tissue and phenotye

    1. association rules

    2. hypergeometric distribution

    Paper - poster here #8

    4. evaluation - (automated & manual)

    5. phenodigm

    3K tissue-phenotype associations

    phenotype not associated with one particular tissue

    gene expressed in different tissues - same phenotype

    or combination of differnet genes

    don't limit to only one tissue per phenotype

    Evaluation: automated

    75% cover associated

    needed curated set 

    Entity-Quality statements available for mouse phenotypes contain tissue info 

    manual evaluation of 'no tissue match' cases

    genes in brain - total fat abnormalities

    genes in spleen - liver hypoplasia

    also saw - don't obtain observed tissue which should have been there

    could the results be useful to any other studies?

    tissue-phenotype associattions improve disease-gene discovery

    phenodigm

    associations available online - sanger.ac.uk/resources/databases/phenodigm/phenotype/list

    http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list

Q & A
Q: in measuring association, there's significance and strength of hte association. p-value?
A: software applied - hard to get p-values. Don't get entire range of p-values which you would require for testing. If we export the associations and say anything above a certain threshold is okay, we do evaluation afterwards. Better assessment in future: what should p-value be. Just experiment ot see if we can get issue-phenotype association.
--------------------------------------------------------------------------
14:00 - 14:10
UbiGRID: a database resource for protein and genetic interactions of the ubiquitin-proteasome system
Andrew Chatr-Aryamontri
IRIC - University of Montreal, Canada
Abstract
The covalent attachment of ubiquitin to substrate proteins controls the stability, interactions, activity and/or localization of much of the proteome. The canonical ubiquitination cascade proceeds by a three-step process: activation of ubiquitin as a thioester by an E1 enzyme, transfer to an E2 enzyme as a thioester intermediate, and conjugation to a lysine residue (or N-terminal amino group) on the substrate, or to lysine residues of a previously conjugated ubiquitin moiety to build ubiquitin chains. The fate of the ubiquitinated substrate is determined by interactions with a variety of ubiquitin binding domains, which can direct the substrate for degradation by the 26S proteasome or alter substrate localization, interactions or activity. Fi- nally, the extent of substrate ubiquitination is dynamically controlled by a host of deubiquitinating enzymes. Despite the fundamental role played by the ubiquitin-proteasome system (UPS) in the regulation of many cellular processes and in human diseases, its therapeutic potential has been largely underexplored. This is due to insufficient understanding of the interdependence and redundancy between the different sys- tem components and an incomplete mapping of substrate-UPS system interactions. Nonetheless, the development and commercialization of Bortezomib (Velcade), a non-reversible proteasome inhibitor for the treatment of refractory multiple myeloma, has demonstrated that the UPS can be successfully mod- ulated for therapeutic benefit. Given the broad effects of the UPS on the proteome, and its consequent connections to many disease states, the UPS is currently under intense investigation in the biotechnology and pharmaceutical sectors. To facilitate understanding of the UPS and its links to disease, the UbiGRID curation project aims to comprehensively curate the biomedical literature for genetic and protein interac- tions of all UPS genes/proteins in humans, budding yeast and other model species. UbiGRID will serve as a 
centralized resource for three types of data: (i) an annotated reference set of UPS components or- ganized into functional classes; (ii) comprehensive curation of genetic and protein interactions for all UPS genes; (iii) the annotation of ubiquitinated residues derived from mass spectrometry datasets. We have recently completed curation of the entire biomedical literature for interactions of the UPS in human (82,218 interactions derived from 8,377 publications) and budding yeast (42,672 interactions derived from 2,472 publications). An updated analysis of the network properties and disease connections of the UPS will be presented. UbiGRID datasets are freely accessible through a dedicated web interface and available for download in tabular or PSI-MI formats. 
Notes
Ubiquitin-proteasome system (UPS)

    multistep process 

    controls stability/activity of most proteome

    highly hierarchical

    weak interactions

    Disease - link system to human diseases

    can virtualize the mutations affect the system

    over degradation of protein - loss of function

    e.g. parkin 

    Consequence- small molecule inhibitors

    /// SORRY LOST WIFI FOR A BIT

    UBiGRID curation project

    annotate ubiquitination sites, 

    drafting members of UPS (gene list)

    human-  1K genes

    for more than 50% of genes, no functional annotations

    conservation of UPS

    the closing interface, system and proteome, more conservation

    core of work: annotating interactions

    20K publications - annotating interactions

    resulted in more than 80K interactions (human)

    20K physical/genetic interactions

    ubiquitylome - ubiquitination profiles 

    UPS network features

    close relevance to other groups

    knowledge base for human UPS disease biology - 

    better understanding of relationship between different genes

    microarray data, proteomics data

    Future plans:

    dedicated web interface

    UbiGRID on monthly bases

    and more

Q & A
Q: do you feel you have achieved complete coverage of the domain?
A: Took awhile - always moving back and forth - list of genes and anotation. Many more genes than expected. Surprises by changes inlist of gnees. hoping last iteration - mostly just maintenance curation. But never say never.
    
    
--------------------------------------------------------------------------
14:10 - 14:20
Yeast – why it simply has a lot to say about human disease
Selina Dwight
Stanford University, USA
Abstract
Science has long approached a complex system by exploring a simpler system that exhibits similar func- tionality. What happens when the complex system itself becomes more penetrable? Human gene prod- ucts have been known for decades to have counterparts in S. cerevisiae, and detailed knowledge about these S. cerevisiae genes has provided clues to basic cellular functions in humans and other higher or- ganisms. In the past decade, however, technological advances in genomic sequencing and other methods have allowed more direct studies in human cells. Contrary to what might be expected, the result is not that budding yeast is less consequential as a model organism, but that increasing information about hu- man genes verifies and expands the connections between the two organisms, allowing human research to more readily leverage yeast knowledge. For instance, identification of regions in human genes with similarity to yeast prion-like domains led to the discovery of a mutation in an evolutionarily conserved residue in a gene that co-segregates with ALS. Conversely, increasing knowledge about human biologi- cal processes also suggests areas for further study in yeast. As an example, mutation of mitochondrial DNA associated with aging, disease, and anticancer agents has prompted studies of pathways in yeast that sense the nutritional state of the cell and affect mitochondrial function. Lastly, while yeast may be a relatively simple system, yeast research has witnessed its own advances in technology that serve to further its relevance as a model organism - such as chemogenomic profiling of systematic deletion strains to identify potential drug targets and the mechanisms of action by which compounds exert their effects on disease. As researchers continue to elucidate functional homology between yeast and humans, and to employ yeast as a tool for discovery, it becomes increasingly important to highlight and connect this infor- mation. The Saccharomyces Genome Database (SGD, w
ww.yeastgenome.org) is building a new class of information called ”Species Connections” towards this goal through the incorporation of OMIM homologs and phenotypes, phenologs, drug interactions, and functional homology studies. 
Notes
Yeast - it simply has a lot to say about human disease
Yeast

    provide with beer and wine and bread

    influencial in the office too- a lot to say about human disease

    article in news:

    kids who can't cry

    symptom + genomic sequencing 

    8 different cases, mutation in ame gene

    identify main character in teh diease

    far less interesting without plot

    gene with no bio context, less informative

    Yeast cna provide missing context

    richness of data associated with it - multi dimensional story

    main observation: representation of /something/

    stitching together 6 differnet datatypes of yeast genomic representation

    correlations between gene variation, gene expression, metabolite providers

    gene in context - clues to missing story

    SGD

    provides several ways to connect human genes, yeast genes and richness of yeast data

    orthologues

    drug related phenotypes

    literature: associate yeast genes with papers which describe human counterparts

    full text searchabel by keywords

    collect ino: ability for human gene to substitute yeast gene in cell

    drug phenotypes - search for shared phenotyes in orthologs

    searchable connections

    search for human gene - identify human dieases adn orthologs

    search other way, too

    many genes in yeast can help tell human story

    nearly 50% of all yeast genes have one or more human orthologs and function in yeast

    maybe we can begin to tell a more complete story about human disease

    SGD Powering connections

Q & A
Q: it's 50 or 60% of human genes that have at least domain level homology
A: go yeast!
Q: Wondering if SGD has plans to curate signalling pathways in yeast?
A: We do have some pathway info. We are looking to incorporate more. On horizon.
Q: Pooling all this data - yeast coming soon. A lot of times similar types are in the same pathway
A: plans to get some mamallian phenotype data in. Link to omim disease data.
--------------------------------------------------------------------------
14:20 - 14:30
Visualization and analysis of data using Atlas of Cancer Signalling Networks (ACSN) and NaviCell tools for integrative systems biology of cancer
Inna Kuperstein
Institut Curie, France
Abstract
Poster 11 - further discussion
https://acsn.curie.fr/
Studying reciprocal regulations between cancer-related pathways is essential for understanding signalling rewiring during cancer evolution and in response to treatments. To allow systematic analysis of cancer signalling, the knowledge about cell mechanisms dispersed in scientific literature can be collectively rep- resented in the form of comprehensive maps of signalling networks amenable for computational analytical methods. The Atlas of Cancer Signalling Networks (ACSN, http://acsn.curie.fr) is a resource of cancer signalling tools with interactive web-based environment for navigation, curation and data visualisation supported by a user friendly Google Maps-based tool NaviCell (http://navicell.curie.fr). Construction and update of ACSN involves manual mining of molecular biology literature and participation of the experts in the corresponding fields. ACSN covers major mechanisms involved in cancer progression systemat- ically represented in the form of comprehensive interconnected maps. Cell signalling mechanisms are depicted on the maps in great detail, together creating a seamless map of molecular interactions in can- cer. The content of ACSN is visually presented in the form of a global ’geographic-like’ molecular map browsable using the Google Maps engine and semantic zooming. The associated blog provides a forum for commenting and curating the ACSN maps content. ACSN and NaviCell are a systems biology tool for integration and visualization of cancer molecular profiles generated by high through-put techniques as genome, transcriptome, proteome or analysing results from drug screenings or synthetic interaction studies. Integration and analysis of these data in the context of ACSN helps in interpretation, understand- ing the biological significance of the data and rationalising the scientific hypothesis. This network-based approach is a framework form deciphering complex molecular characteristics of cancers, improving pa- tients stratification, predicting respon
ses and resistance to cancer drugs and proposing new treatment strategies. 
Notes

    Cancer - complicated disease

    multiple signals

    regulation pathways in different cancers - frequently mutated in cancer presented together

    atlas of cancer signalling network

    manually curated by expert

    layout of maps, curated by more experts

    global map of processes - similar to global map

    divided to modular maps - processes

    Atlas is huge

    standards and tools - SBGN syntax - visual syntax

    integrate with other groups

    community effort

    Open source - accessed via website

    currently huge map with detailed

    ongoing project 

    all interconnected map

    created navigation tool NaviCell: google map engine + blog (wordpress)

    very user friendly  - fiends at hospital can use it

    associated with blog - each entity annotated and posted on blog

    mutation data vis using ACSN

    high throughput data - cancer research

    'omics

    typical and simple thing to vis: list of genes

    mutation data, e.g. list of oncogenes most frequently mutated accross cancers

    Expression data - every gene may have multiple functions in cell

    ACSN 'staining' for module activity

    visualization of phosphoproteome data - ACSN

    Since networks interconnected - signalling networks for intervention strategy design

    synthetic lethality - case of negative genetic interactions

    OCSANA- tools that calculates based on structural analysis and data

    Conlcusions

    ACSN - resource of cnacer signalling knowledge

Q & A:
Q: do you provide your data in BioPAX? 
A: Each map and global map downloadable in several formats. including biopax
Q: License?
A: Open source
Q: When mapping postulation onto proteins - multiple sites?
A: map together - each entity representated multiple times, every entity correspondign phosphorylation data