Principal component analysis based unsupervised feature extraction applied to budding yeast temporally periodic gene expression

Published on Jun 29, 2016in Biodata Mining2.522
路 DOI :10.1186/S13040-016-0101-9
Y-h. Taguchi23
Estimated H-index: 23
(Chu-Dai: Chuo University)
Sources
Abstract
Background The recently proposed principal component analysis (PCA) based unsupervised feature extraction (FE) has successfully been applied to various bioinformatics problems ranging from biomarker identification to the screening of disease causing genes using gene expression/epigenetic profiles. However, the conditions required for its successful use and the mechanisms involved in how it outperforms other supervised methods is unknown, because PCA based unsupervised FE has only been applied to challenging (i.e. not well known) problems.
Figures & Tables
Download
馃摉 Papers frequently viewed together
20143.97BMC Genomics
3 Authors (Hideaki Umeyama, ..., Y-h. Taguchi)
References34
Newest
MicroRNA(miRNA)鈥搈RNA interactions are important for understanding many biological processes, including development, differentiation and disease progression, but their identification is highly context-dependent. When computationally derived from sequence information alone, the identification should be verified by integrated analyses of mRNA and miRNA expression. The drawback of this strategy is the vast number of identified interactions, which prevents an experimental or detailed investigation of...
Source
#1Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
Background Transgenerational epigenetics (TGE) are currently considered important in disease, but the mechanisms involved are not yet fully understood. TGE abnormalities expected to cause disease are likely to be initiated during development and to be mediated by aberrant gene expression associated with aberrant promoter methylation that is heritable between generations. However, because methylation is removed and then re-established during development, it is not easy to identify promoter methyl...
Source
#1Yoshiki Murakami (OCU: Osaka City University)H-Index: 23
#2Shoji Kubo (OCU: Osaka City University)H-Index: 59
Last. Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
view all 10 authors...
Intrahepatic cholangiocarcinoma (ICC) and hepatocellular carcinoma (HCC) are liver originated malignant tumors. Of the two, ICC has the worse prognosis because it has no reliable diagnostic markers and its carcinogenic mechanism is not fully understood. The aim of this study was to integrate metabolomics and transcriptomics datasets to identify variances if any in the carcinogenic mechanism of ICC and HCC. Ten ICC and 6 HCC who were resected surgically, were enrolled. miRNA and mRNA expression a...
Source
Oct 19, 2015 in CIBCB (Computational Intelligence in Bioinformatics and Computational Biology)
#1Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
#2Mitsuo Iwadate (Chu-Dai: Chuo University)H-Index: 19
Last. Hideaki Umeyama (Chu-Dai: Chuo University)H-Index: 9
view all 3 authors...
We applied principal component analysis (PCA)-based unsupervised feature extraction (FE) to amyotrophic lateral sclerosis (ALS) gene expression profiles. ALS is a debilitating neurodegenerative disorder with no effective therapy. The relevant gene expression profiles contained a small number of samples (from a few to tens) with a large number of features (several tens of thousands). Although it is important to recognize critical genes from gene expression profiles, a small-sample-large-feature s...
Source
#1Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
#2Mitsuo Iwadate (Chu-Dai: Chuo University)H-Index: 19
Last. Hideaki Umeyama (Chu-Dai: Chuo University)H-Index: 9
view all 3 authors...
Background Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems.
Source
#1Damian Szklarczyk (Swiss Institute of Bioinformatics)H-Index: 26
#2Andrea Franceschini (Swiss Institute of Bioinformatics)H-Index: 9
Last. Christian von Mering (Swiss Institute of Bioinformatics)H-Index: 73
view all 14 authors...
The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http: //string-db.org) aims to provide a critical assessment and integration of protein鈥損rotein int...
Source
#1Alberto Santos (UCPH: University of Copenhagen)H-Index: 18
#2Rasmus Wernersson (DTU: Technical University of Denmark)H-Index: 19
Last. Lars Juhl Jensen (UCPH: University of Copenhagen)H-Index: 61
view all 3 authors...
The eukaryotic cell division cycle is a highly regulated process that consists of a complex series of events and involves thousands of proteins. Researchers have studied the regulation of the cell cycle in several organisms, employing a wide range of high-throughput technologies, such as microarray-based mRNA expression profiling and quantitative proteomics. Due to its complexity, the cell cycle can also fail or otherwise change in many different ways if important genes are knocked out, which ha...
Source
#1Hideaki Umeyama (Chu-Dai: Chuo University)H-Index: 25
#2Mitsuo Iwadate (Chu-Dai: Chuo University)H-Index: 19
Last. Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
view all 3 authors...
Background Non-small cell lung cancer (NSCLC) remains lethal despite the development of numerous drug therapy technologies. About 85% to 90% of lung cancers are NSCLC and the 5-year survival rate is at best still below 50%. Thus, it is important to find drugable target genes for NSCLC to develop an effective therapy for NSCLC.
Source
#1Yoshiki Murakami (OCU: Osaka City University)H-Index: 23
#2Toshihito Tanahashi (Kobe Pharmaceutical University)H-Index: 28
Last. Takeshi Azuma (Kobe University)H-Index: 73
view all 10 authors...
MicroRNA (miRNA) expression profiling has proven useful in diagnosing and understanding the development and progression of several diseases. Microarray is the standard method for analyzing miRNA expression profiles; however, it has several disadvantages, including its limited detection of miRNAs. In recent years, advances in genome sequencing have led to the development of next-generation sequencing (NGS) technologies, which significantly advance genome sequencing speed and discovery. In this st...
Source
#1Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
#2Yoshiki Murakami (OCU: Osaka City University)H-Index: 23
Background The selection of disease biomarkers is often difficult because of their unstable identification, i.e., the selection of biomarkers is heavily dependent upon the set of samples analyzed and the use of independent sets of samples often results in a completely different set of biomarkers being identified. However, if a fixed set of disease biomarkers could be identified for the diagnosis of multiple diseases, the difficulties of biomarker selection could be reduced.
Source
Cited By26
Newest
#1Kota Fujisawa (TITech: Tokyo Institute of Technology)H-Index: 1
#2Mamoru Shimo (University of the Ryukyus)H-Index: 1
Last. Ryota Miyata (University of the Ryukyus)H-Index: 3
view all 5 authors...
Coronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and man...
Source
#1Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
#2Turki Turki (KAU: King Abdulaziz University)H-Index: 9
ABSTRACT Identifying differentially expressed genes is difficult because of the small number of available samples compared with the large number of genes. Conventional gene selection methods employing statistical tests have the critical problem of heavy dependence of P-values on sample size. Although the recently proposed principal component analysis (PCA) and tensor decomposition (TD)-based unsupervised feature extraction (FE) has often outperformed these statistical test-based methods, the rea...
Source
#1Sharipuddin (Sriwijaya University)H-Index: 1
#2Benni Purnama (Sriwijaya University)H-Index: 3
Last. Rahmat Budiarto (Al Baha University)H-Index: 13
view all 8 authors...
Feature extraction solves the problem of finding the most efficient and comprehensive set of features. A Principle Component Analysis (PCA) feature extraction algorithm is applied to optimize the effectiveness of feature extraction to build an effective intrusion detection method. This paper uses the Principal Components Analysis (PCA) for features extraction on intrusion detection system with the aim to improve the accuracy and precision of the detection. The impact of features extraction to at...
Source
#1Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
Although PCA is often blamed as an old technology, if it is useful, no other reasons will be required to be used. In this chapter, I will apply PCA based unsupervised FE to various bioinformatics problems. As discussed in the earlier chapter, PCA based unsupervised FE is fitted to the situation that there are more number of features than the number of samples. This specific situation is very usual, because features are genes whose numbers are as many as several tens thousands, while the number o...
Source
#1Sherry Bhalla (Panjab University, Chandigarh)H-Index: 12
#2Harpreet Kaur (CSIR: Council of Scientific and Industrial Research)H-Index: 16
Last. Gajendra P. S. Raghava (IIIT-D: Indraprastha Institute of Information Technology)H-Index: 76
view all 4 authors...
The metastatic Skin Cutaneous Melanoma (SKCM) has been associated with diminished survival rates and high mortality rates worldwide. Thus, segregating metastatic melanoma from the primary tumors is crucial to employ an optimal therapeutic strategy for the prolonged survival of patients. The SKCM mRNA, miRNA and methylation data of TCGA is comprehensively analysed to recognize key genomic features that can segregate metastatic and primary tumors. Further, machine learning models have been develop...
Source
#1Feng Nan (North China University of Science and Technology)H-Index: 2
#2Yang Li (North China University of Science and Technology)H-Index: 1
Last. YongJie Chen (North China University of Science and Technology)H-Index: 1
view all 5 authors...
Abstract At present, cluster analysis has become a very good channel for analyzing gene expression data to obtain biological information. In recent years, many experts have used traditional clustering algorithms and new clustering algorithms to mine gene expression data. This article first introduces the preprocessing of gene expression data. Then, by using principal component analysis (PCA) to process the gene data, a small number of characteristic variables are extracted as new indicators, and...
Source
#1Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
#2Turki Turki (KAU: King Abdulaziz University)H-Index: 9
Although single cell RNA sequencing (scRNA-seq) technology is newly invented and promising one, because of lack of enough information that labels individual cells, it is hard to interpret the obtained gene expression of each cell. Because of this insufficient information available, unsupervised clustering, e.g., t-Distributed Stochastic Neighbor Embedding and Uniform Manifold Approximation and Projection , is usually employed to obtain low dimensional embedding that can help to understand cell-c...
Source
#1Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
Multiomics data analysis is the central issue of genomics science. In spite of that, there are not well defined methods that can integrate multomics data sets, which are formatted as matrices with different sizes. In this paper, I propose the usage of tensor decomposition based unsupervised feature extraction as a data mining tool for multiomics data set. It can successfully integrate miRNA expression, mRNA expression and proteome, which were used as a demonstration example of DIABLO that is the...
Source
#1M. Jansi RaniH-Index: 2
#2D. Devaraj (KARE: Kalasalingam University)H-Index: 26
Cancer is a deadly disease which requires a very complex and costly treatment. Microarray data classification plays an important role in cancer treatment. An efficient gene selection technique to select the more promising genes is necessary for cancer classification. Here, we propose a Two-stage MI-GA Gene Selection algorithm for selecting informative genes in cancer data classification. In the first stage, Mutual Information based gene selection is applied which selects only the genes that have...
Source
#1Y-h. Taguchi (Chu-Dai: Chuo University)H-Index: 23
Abstract Multiomics data analysis is the central issue of genomics science. In spite of that, there are not well defined methods that can integrate multomics data sets, which are formatted as matrices with different sizes. In this paper, I propose the usage of tensor decomposition based unsupervised feature extraction as a data mining tool for multiomics data set. It can successfully integrate miRNA expression, mRNA expression and proteome, which were used as a demonstration example of DIABLO th...
Source
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.