diffuStats: an R package to compute diffusion-based scores on biological networks.

Published on Feb 1, 2018in Bioinformatics5.61
· DOI :10.1093/BIOINFORMATICS/BTX632
Sergio Picart-Armada5
Estimated H-index: 5
(UPC: Polytechnic University of Catalonia),
Wesley K. Thompson68
Estimated H-index: 68
(UCSD: University of California, San Diego)
+ 1 AuthorsAlexandre Perera-Lluna12
Estimated H-index: 12
(UPC: Polytechnic University of Catalonia)
Sources
Abstract
This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of record Sergio Picart-Armada, Wesley K Thompson, Alfonso Buil, Alexandre Perera-Lluna; diffuStats: an R package to compute diffusion-based scores on biological networks, Bioinformatics, Volume 34, Issue 3, 1 February 2018, Pages 533–534 is available online at: https://doi.org/10.1093/bioinformatics/btx632.
Download
📖 Papers frequently viewed together
282 Citations
2003COLT: Conference on Learning Theory
744 Citations
26 Citations
References11
Newest
#1Matteo Bersanelli (UNIBO: University of Bologna)H-Index: 5
#2Ettore Mosca (National Research Council)H-Index: 13
Last. Luciano Milanesi (National Research Council)H-Index: 32
view all 5 authors...
Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules
26 CitationsSource
#1Giorgio Valentini (University of Milan)H-Index: 30
#2Giuliano Armano (University of Cagliari)H-Index: 17
Last. Matteo Re (University of Milan)H-Index: 17
view all 6 authors...
Summary: RANKS is a flexible software package that can be easily applied to any bioinformatics task formalizable as ranking of nodes with respect to a property given as a label, such as automated protein function prediction, gene disease prioritization and drug repositioning. To this end RANKS provides an efficient and easy-to-use implementation of kernelized score functions, a semi-supervised algorithmic scheme embedding both local and global learning strategies for the analysis of biomolecular...
24 CitationsSource
#1Olga Zoidi (A.U.Th.: Aristotle University of Thessaloniki)H-Index: 8
#2Eftychia Fotiadou (A.U.Th.: Aristotle University of Thessaloniki)H-Index: 2
Last. Ioannis Pitas (A.U.Th.: Aristotle University of Thessaloniki)H-Index: 87
view all 4 authors...
The expansion of the Internet over the last decade and the proliferation of online social communities, such as Facebook, Googlep, and Twitter, as well as multimedia sharing sites, such as YouTube, Flickr, and Picasa, has led to a vast increase of available information to the user. In the case of multimedia data, such as images and videos, fast querying and processing of the available information requires the annotation of the multimedia data with semantic descriptors, that is, labels. However, o...
20 CitationsSource
#1Zaid Harchaoui (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 47
#2Francis Bach (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 101
Last. Eric Moulines (ENST: Télécom ParisTech)H-Index: 55
view all 4 authors...
Kernel-based methods provide a rich and elegant framework for developing nonparametric detection procedures for signal processing. Several recently proposed procedures can be simply described using basic concepts of reproducing kernel Hilbert space (RKHS) embeddings of probability distributions, mainly mean elements and covariance operators. We propose a unified view of these tools and draw relationships with information divergences between distributions.
44 CitationsSource
#1Insuk Lee (Yonsei University)H-Index: 32
#2U. Martin Blom (University of Texas at Austin)H-Index: 1
Last. Edward M. MarcotteH-Index: 84
view all 5 authors...
Network “guilt by association” (GBA) is a proven approach for identifying novel disease genes based on the observation that similar mutational phenotypes arise from functionally related genes. In principle, this approach could account even for nonadditive genetic interactions, which underlie the synergistic combinations of mutations often linked to complex diseases. Here, we analyze a large-scale, human gene functional interaction network (dubbed HumanNet). We show that candidate disease genes c...
553 CitationsSource
#1Sara Mostafavi (U of T: University of Toronto)H-Index: 36
#2Debajyoti RayH-Index: 11
Last. Quaid Morris (U of T: University of Toronto)H-Index: 62
view all 5 authors...
Background: Most successful computational approaches for protein function prediction integrate multiple genomics and proteomics data sources to make inferences about the function of unknown proteins. The most accurate of these algorithms have long running times, making them unsuitable for real-time protein function prediction in large genomes. As a result, the predictions of these algorithms are stored in static databases that can easily become outdated. We propose a new algorithm, GeneMANIA, th...
672 CitationsSource
May 22, 2007 in KDD (Knowledge Discovery and Data Mining)
#1Luh Yen (UCL: Université catholique de Louvain)H-Index: 12
#2François Fouss (UCL: Université catholique de Louvain)H-Index: 17
Last. Marco Saerens (UCL: Université catholique de Louvain)H-Index: 28
view all 5 authors...
This work presents a kernel method for clustering the nodes of a weighted, undirected, graph. The algorithm is based on a two-step procedure. First, the sigmoid commute-time kernel (KCT), providing a similarity measure between any couple of nodes by taking the indirect links into account, is computed from the adjacency matrix of the graph. Then, the nodes of the graph are clustered by performing a kernel kmeans or fuzzy k-means on this CT kernel matrix. For this purpose, a new, simple, version o...
57 CitationsSource
#1Koji Tsuda (MPG: Max Planck Society)H-Index: 49
#2Hyunjung Shin (MPG: Max Planck Society)H-Index: 25
Last. Bernhard Schölkopf (MPG: Max Planck Society)H-Index: 151
view all 3 authors...
Motivation: Support vector machines (SVMs) have been successfully used to classify proteins into functional categories. Recently, to integrate multiple data sources, a semidefinite programming (SDP) based SVM method was introduced. In SDP/SVM, multiple kernel matrices corresponding to each of data sources are combined with weights obtained by solving an SDP. However, when trying to apply SDP/SVM to large problems, the computational cost can become prohibitive, since both converting the data to a...
190 CitationsSource
Jan 1, 2003 in COLT (Conference on Learning Theory)
#1Alexander J. Smola (ANU: Australian National University)H-Index: 126
#2Risi Kondor (Columbia University)H-Index: 24
We introduce a family of kernels on graphs based on the notion of regularization operators. This generalizes in a natural way the notion of regularization and Greens functions, as commonly used for real valued functions, to graphs. It turns out that diffusion kernels can be found as a special case of our reasoning. We show that the class of positive, monotonically decreasing functions on the unit interval leads to kernels and corresponding regularization operators.
744 CitationsSource
#2Roland KrauseH-Index: 23
Last. Peer BorkH-Index: 222
view all 7 authors...
Comprehensive protein–protein interaction maps promise to reveal many aspects of the complex regulatory network underlying cellular function. Recently, large-scale approaches have predicted many new protein interactions in yeast. To measure their accuracy and potential as well as to identify biases, strengths and weaknesses, we compare the methods with each other and with a reference set of previously reported protein interactions.
2,085 CitationsSource
Cited By18
Newest
#1Seung Min Jung (Catholic University of Korea)H-Index: 16
#2Kyung-Su Park (Catholic University of Korea)H-Index: 20
Last. Ki-Jo Kim (Catholic University of Korea)H-Index: 12
view all 3 authors...
Objectives null Interstitial lung disease is a significant comorbidity and the leading cause of mortality in patients with systemic sclerosis. Transcriptomic data of systemic sclerosis-associated interstitial lung disease (SSc-ILD) were analysed to evaluate the salient molecular and cellular signatures in comparison with those in related pulmonary diseases and to identify the key driver genes and target molecules in the disease module. null Methods null A transcriptomic dataset of lung tissues f...
Source
#2Pol Solà-Santos (UPC: Polytechnic University of Catalonia)
Last. Alexandre Perera-Lluna (UPC: Polytechnic University of Catalonia)H-Index: 12
view all 6 authors...
Untargeted metabolomics using liquid chromatography coupled to mass spectrometry (LC-MS) allows the detection of thousands of metabolites in biological samples. However, LC-MS data annotation is still considered a major bottleneck in the metabolomics pipeline since only a small fraction of the metabolites present in the sample can be annotated with the required confidence level. Here, we introduce mWISE (metabolomics wise inference of speck entities), an R package for context-based annotation of...
Source
#1Niklas GebauerH-Index: 11
#2Axel Künstner (University of Lübeck)H-Index: 21
Last. Alfred C. FellerH-Index: 36
view all 20 authors...
Epstein-Barr virus (EBV)-associated diffuse large B-cell lymphoma not otherwise specified (DLBCL NOS) constitute a distinct clinicopathological entity in the current World Health Organization (WHO) classification. However, its genomic features remain sparsely characterized. Here, we combine whole-genome sequencing (WGS), targeted amplicon sequencing (tNGS), and fluorescence in situ hybridization (FISH) from 47 EBV + DLBCL (NOS) cases to delineate the genomic landscape of this rare disease. Integ...
Source
#1Sergio Picart-Armada (UPC: Polytechnic University of Catalonia)H-Index: 5
#2Wesley K. ThompsonH-Index: 68
Last. Alexandre Perera-Lluna (UPC: Polytechnic University of Catalonia)H-Index: 12
view all 4 authors...
MOTIVATION Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applicati...
1 CitationsSource
#1Jamie Soul (Newcastle University)H-Index: 7
#2Matt J. Barter (Newcastle University)H-Index: 20
Last. David Young (Newcastle University)H-Index: 84
view all 4 authors...
Objectives To collate the genes experimentally modulated in animal models of osteoarthritis (OA) and compare these data with OA transcriptomics data to identify potential therapeutic targets. Methods PubMed searches were conducted to identify publications describing gene modulations in animal models. Analysed gene expression data were retrieved from the SkeletalVis database of analysed skeletal microarray and RNA-Seq expression data. A network diffusion approach was used to predict new genes ass...
4 CitationsSource
#2Apichat Suratanee (King Mongkut's University of Technology North Bangkok)H-Index: 7
view all 3 authors...
Disease-related gene prioritization is one of the most well-established pharmaceutical techniques used to identify genes that are important to a biological process relevant to a disease. In identifying these essential genes, the network diffusion (ND) approach is a widely used technique applied in gene prioritization. However, there is still a large number of candidate genes that need to be evaluated experimentally. Therefore, it would be of great value to develop a new strategy to improve the p...
Source
#1Jiang Xie (SHU: Shanghai University)H-Index: 7
#2Yiting Yin (SHU: Shanghai University)H-Index: 1
Last. Jiao Wang (SHU: Shanghai University)H-Index: 4
view all 5 authors...
Deciphering regulatory patterns of neural stem cell (NSC) differentiation with multiple stages is essential to understand NSC differentiation mechanisms. Recent single-cell transcriptome datasets became available at individual differentiation. However, a systematic and integrative analysis of multiple datasets at multiple temporal stages of NSC differentiation is lacking. In this study, we propose a new method integrating prior information to construct three gene regulatory networks at pair-wise...
1 CitationsSource
#1Josep Marín-Llaó (Fraunhofer Society)H-Index: 5
#2Sarah Mubeen (Fraunhofer Society)H-Index: 5
Last. Daniel Domingo-Fernández (Fraunhofer Society)H-Index: 9
view all 6 authors...
High-throughput screening yields vast amounts of biological data which can be highly challenging to interpret. In response, knowledge-driven approaches emerged as possible solutions to analyze large datasets by leveraging prior knowledge of biomolecular interactions represented in the form of biological networks. Nonetheless, given their size and complexity, their manual investigation quickly becomes impractical. Thus, computational approaches, such as diffusion algorithms, are often employed to...
Source
#1Noopur Sinha (CSIR: Council of Scientific and Industrial Research)H-Index: 2
#2Saikat Chowdhury (CSIR: Council of Scientific and Industrial Research)H-Index: 5
Last. Ram Rup Sarkar (CSIR: Council of Scientific and Industrial Research)H-Index: 7
view all 3 authors...
Smoothened (SMO) antagonist Vismodegib effectively inhibits the Hedgehog pathway in proliferating cancer cells. In early stage of treatment, Vismodegib exhibited promising outcomes to regress the tumors cells, but ultimately relapsed due to the drug resistive mutations in SMO mostly occurring before (primary mutations G497W) or after (acquired mutations D473H/Y) anti-SMO therapy. This study investigates the unprecedented insights of structural and functional mechanism hindering the binding of Vi...
3 CitationsSource
#1Jessica Gliozzo (University of Milan)H-Index: 3
#1Jessica Gliozzo (University of Milan)H-Index: 2
Last. Giorgio Valentini (University of Milan)H-Index: 30
view all 12 authors...
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm...
2 CitationsSource