Comparative Analysis of Normalization Methods for Network Propagation.

Published on Jan 22, 2019in Frontiers in Genetics3.258
· DOI :10.3389/FGENE.2019.00004
Hadas Biran3
Estimated H-index: 3
(TAU: Tel Aviv University),
Martin Kupiec58
Estimated H-index: 58
(TAU: Tel Aviv University),
Roded Sharan67
Estimated H-index: 67
(TAU: Tel Aviv University)
Sources
Abstract
Network propagation is a central tool in biological research. While a number of variants and normalizations have been proposed for this method, each has its own shortcomings and no large scale assessment of those variants is available. Here we propose a novel normalization method for network propagation that is based on evaluating the propagation results against those obtained on randomized networks that preserve node degrees. In this way, our method overcomes potential biases of previous methods. We evaluate its performance on multiple large scale datasets and find that it compares favorably to previous approaches in diverse gene prioritization tasks. We further demonstrate its utility on a focused dataset of telomere length maintenance in yeast. The normalization method will be made available at http://anat.cs.tau.ac.il/WebPropagate upon acceptance of the paper.
📖 Papers frequently viewed together
282 Citations
2013BIOINFORMATICS: International Conference on Bioinformatics
2 Citations
2011
References22
Newest
#1Hadas Biran (TAU: Tel Aviv University)H-Index: 3
#2Tovi Almozlino (TAU: Tel Aviv University)H-Index: 1
Last. Roded Sharan (TAU: Tel Aviv University)H-Index: 67
view all 4 authors...
Abstract Network propagation is a powerful tool for genetic analysis which is widely used to identify genes and genetic modules that underlie a process of interest. Here we provide a graphical, web-based platform ( http://anat.cs.tau.ac.il/WebPropagate/ ) in which researchers can easily apply variants of this method to data sets of interest using up-to-date networks of protein–protein interactions in several organisms.
3 CitationsSource
#1Juan Lafuente-Barquero (Pablo de Olavide University)H-Index: 4
#2Sarah Luke-GlaserH-Index: 7
Last. Brian LukeH-Index: 26
view all 9 authors...
RNA-DNA hybrids are naturally occurring obstacles that must be overcome by the DNA replication machinery. In the absence of RNase H enzymes, RNA-DNA hybrids accumulate, resulting in replication stress, DNA damage and compromised genomic integrity. We demonstrate that Mph1, the yeast homolog of Fanconi anemia protein M (FANCM), is required for cell viability in the absence of RNase H enzymes. The integrity of the Mph1 helicase domain is crucial to prevent the accumulation of RNA-DNA hybrids and R...
28 CitationsSource
#1Yomtov Almozlino (TAU: Tel Aviv University)H-Index: 1
#2Nir Atias (TAU: Tel Aviv University)H-Index: 10
Last. Roded Sharan (TAU: Tel Aviv University)H-Index: 67
view all 4 authors...
Background ANAT is a graphical, Cytoscape-based tool for the inference of protein networks that underlie a process of interest. The ANAT tool allows the user to perform network reconstruction under several scenarios in a number of organisms including yeast and human.
8 CitationsSource
#1Lenore J. Cowen (Tufts University)H-Index: 26
#2Trey Ideker (UCSD: University of California, San Diego)H-Index: 98
Last. Roded Sharan (TAU: Tel Aviv University)H-Index: 67
view all 4 authors...
Network propagation is based on the principle that genes underlying similar phenotypes are more likely to interact with each other. It is proving to be a powerful approach for extracting biological information from molecular networks that is relevant to human disease.
282 CitationsSource
#1Gregorio Alanis-Lobato (University of Mainz)H-Index: 13
#2Miguel A. Andrade-Navarro (University of Mainz)H-Index: 51
Last. Martin Schaefer (EMBL-EBI: European Bioinformatics Institute)H-Index: 34
view all 3 authors...
The increasing number of experimentally detected interactions between proteins makes it difficult for researchers to extract the interactions relevant for specific biological processes or diseases. This makes it necessary to accompany the large-scale detection of protein–protein interactions (PPIs) with strategies and tools to generate meaningful PPI subnetworks. To this end, we generated the Human Integrated Protein–Protein Interaction rEference or HIPPIE (http://cbdm.uni-mainz.de/hippie/). HIP...
210 CitationsSource
#1Seth Carbon (LBNL: Lawrence Berkeley National Laboratory)H-Index: 17
#2J. Chan (LBNL: Lawrence Berkeley National Laboratory)H-Index: 1
Last. Paul W. SternbergH-Index: 115
view all 8 authors...
The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new fe...
1,247 CitationsSource
#1Anna K. Dieckmann (DKFZ: German Cancer Research Center)H-Index: 2
#2Vera Babin (TAU: Tel Aviv University)H-Index: 2
Last. Martin Kupiec (TAU: Tel Aviv University)H-Index: 58
view all 7 authors...
ABSTRACT Eukaryotic chromosomal ends are protected by telomeres from fusion, degradation, and unwanted double-strand break repair events. Therefore, telomeres preserve genome stability and integrity. Telomere length can be maintained by telomerase, which is expressed in most human primary tumors but is not expressed in the majority of somatic cells. Thus, telomerase may be a highly relevant anticancer drug target. Genome-wide studies in the yeast Saccharomyces cerevisiae identified a set of gene...
9 CitationsSource
#1Santhilal Subhash (University of Gothenburg)H-Index: 9
#2Chandrasekhar Kanduri (University of Gothenburg)H-Index: 33
Background High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplif...
49 CitationsSource
#1Aisha Ellahi (California Institute for Quantitative Biosciences)H-Index: 3
#2Deborah M. Thurtle (California Institute for Quantitative Biosciences)H-Index: 3
Last. Jasper Rine (California Institute for Quantitative Biosciences)H-Index: 87
view all 3 authors...
Saccharomyces cerevisiae telomeres have been a paradigm for studying telomere position effects on gene expression. Telomere position effect was first described in yeast by its effect on the expression of reporter genes inserted adjacent to truncated telomeres. The reporter genes showed variable silencing that depended on the Sir2/3/4 complex. Later studies examining subtelomeric reporter genes inserted at natural telomeres hinted that telomere position effects were less pervasive than previously...
60 CitationsSource
#1Jörg Menche (NU: Northeastern University)H-Index: 21
#2Amitabh Sharma (NU: Northeastern University)H-Index: 25
Last. Albert-László BarabásiH-Index: 154
view all 7 authors...
According to the disease module hypothesis,the cellular components associated with a disease segregate in the same neighborhood of the human interactome, the map of biologically relevant molecular interactions.Yet, given the incompleteness of the interactome and the limited knowledge of disease-associated genes, it is not obvious if the available data have sufficient coverage to map out modules associated with each disease. Here we derive mathematical conditions for the identifiability of diseas...
791 CitationsSource
Cited By11
Newest
#1Konstantina Charmpi (University of Cologne)H-Index: 4
#2Chokkalingam M (University of Cologne)
Last. Andreas Beyer (University of Cologne)H-Index: 38
view all 4 authors...
Network propagation refers to a class of algorithms that integrate information from input data across connected nodes in a given network. These algorithms have wide applications in systems biology, protein function prediction, inferring condition-specifically altered sub-networks, and prioritizing disease genes. Despite the popularity of network propagation, there is a lack of comparative analyses of different algorithms on real data and little guidance on how to select and parameterize the vari...
Source
#1Sergio Picart-Armada (UPC: Polytechnic University of Catalonia)H-Index: 5
#2Wesley K. ThompsonH-Index: 68
Last. Alexandre Perera-Lluna (UPC: Polytechnic University of Catalonia)H-Index: 12
view all 4 authors...
MOTIVATION Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applicati...
1 CitationsSource
#1Apurva Badkas (University of Luxembourg)H-Index: 1
#2Thanh Phuong NguyenH-Index: 12
Last. Thomas Sauter (University of Luxembourg)H-Index: 22
view all 6 authors...
A large percentage of the global population is currently afflicted by metabolic diseases (MD), and the incidence is likely to double in the next decades. MD associated co-morbidities such as non-alcoholic fatty liver disease (NAFLD) and cardiomyopathy contribute significantly to impaired health. MD are complex, polygenic, with many genes involved in its aetiology. A popular approach to investigate genetic contributions to disease aetiology is biological network analysis. However, data dependence...
Source
#1Kuo Yang (THU: Tsinghua University)
#2Kezhi Lu (Katholieke Universiteit Leuven)
Last. Xuezhong Zhou (ITI: Information Technology Institute)H-Index: 1
view all 8 authors...
Disease gene identification is a critical step towards uncovering the molecular mechanisms of diseases and systematically investigating complex disease phenotypes. Despite considerable efforts to develop powerful computing methods, candidate gene identification remains a severe challenge owing to the connectivity of an incomplete interactome network, which hampers the discovery of true novel candidate genes. We developed a network-based machine-learning framework to identify both functional modu...
1 CitationsSource
#1Minwoo PakH-Index: 2
#2Dabin JeongH-Index: 2
Last. Sun KimH-Index: 25
view all 7 authors...
Source
#1Gal Barel (MPG: Max Planck Society)H-Index: 1
#2Ralf Herwig (MPG: Max Planck Society)H-Index: 52
We present NetCore, a novel network propagation approach based on node coreness, for phenotype-genotype associations and module identification. NetCore addresses the node degree bias in PPI networks by using node coreness in the random walk with restart procedure, and achieves improved re-ranking of genes after propagation. Furthermore, NetCore implements a semi-supervised approach to identify phenotype-associated network modules, which anchors the identification of novel candidate genes at know...
1 CitationsSource
#1Kuo Yang (Beijing Jiaotong University)H-Index: 6
Last. Xuezhong ZhouH-Index: 1
view all 10 authors...
The knowledge of phenotype-genotype associations is crucial for the understanding of disease mechanisms. Numerous studies have focused on developing efficient and accurate computing approaches to predict disease genes. However, owing to the sparseness and complexity of medical data, developing an efficient deep neural network model to identify disease genes remains a huge challenge. Therefore, we develop a novel deep neural network model that fuses the multi-view features of phenotypes and genot...
1 CitationsSource
#1Yaniv Harari (TAU: Tel Aviv University)H-Index: 9
#2Lihi Gershon (TAU: Tel Aviv University)H-Index: 2
Last. Martin Kupiec (TAU: Tel Aviv University)H-Index: 58
view all 10 authors...
Abstract Telomeres are structures composed of simple DNA repeats and specific proteins that protect the eukaryotic chromosomal ends from degradation, and facilitate the replication of the genome. They are central to the maintenance of the genome integrity, and play important roles in the development of cancer and in the process of aging in humans. The yeast Saccharomyces cerevisiae has greatly contributed to our understanding of basic telomere biology. Our laboratory has carried out systematic s...
2 CitationsSource
#1Noemi Di Nanni (National Research Council)H-Index: 3
#2Matteo Bersanelli (UNIBO: University of Bologna)H-Index: 5
Last. Ettore Mosca (National Research Council)H-Index: 13
view all 4 authors...
The development of integrative methods is one of the main challenges in bioinformatics. Network-based methods for the analysis of multiple gene-centered datasets take into account known and/or inferred relations between genes. In the last decades, the mathematical machinery of network diffusion-also referred to as network propagation-has been exploited in several network-based pipelines, thanks to its ability of amplifying association between genes that lie in network proximity. Indeed, network ...
5 CitationsSource
#1Sergio Picart-Armada (UPC: Polytechnic University of Catalonia)H-Index: 5
#2Wesley K. Thompson (UCSD: University of California, San Diego)H-Index: 68
Last. Alexandre Perera-Lluna (UPC: Polytechnic University of Catalonia)H-Index: 12
view all 4 authors...
Motivation: Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applicat...
Source