Variants of combinations of additive and multiplicative updates for GRU neural networks

Published on May 2, 2018
· DOI :10.1109/SIU.2018.8404457
Ali H. Mirza3
Estimated H-index: 3
(Bilkent University)
In this paper, we formulate several variants of the mixture of both the additive and multiplicative updates using stochastic gradient descent (SGD) and exponential gradient (EG) algorithms respectively. We employ these updates on the gated recurrent unit (GRU) networks. We then derive the gradient-based updates for the parameters of the GRU networks. We propose four different updates as a mean, minimum, even-odd and balanced set of updates for the GRU network. Through an extensive set of experiments, we demonstrate that these update variants perform better than simple SGD and EG updates. Overall, we observed that GRU-Mean update achieved the minimum cumulative and steady-state error performance. We also simulated the same set of experiments on the long short-term memory (LSTM) networks.
ūüďĖ Papers frequently viewed together
2019AAAI: National Conference on Artificial Intelligence
#1Junyoung ChungH-Index: 17
#2Caglar GulcehreH-Index: 41
Last. Yoshua Bengio (√Čcole Polytechnique de Montr√©al)H-Index: 205
view all 4 authors...
In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units s...
Jun 16, 2013 in ICML (International Conference on Machine Learning)
#1Razvan Pascanu (UdeM: Université de Montréal)H-Index: 65
#2Tomas Mikolov (Brno University of Technology)H-Index: 47
Last. Yoshua Bengio (UdeM: Université de Montréal)H-Index: 205
view all 3 authors...
There are two widely known issues with properly training recurrent neural networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft co...
#1Shai Shalev-Shwartz (HUJI: Hebrew University of Jerusalem)H-Index: 57
Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions given knowledge of the correct answer to previous prediction tasks and possibly additional available information. Online learning has been studied in several research fields including game theory, information theory, and machine learning. It also became of great interest to practitioners due the recent emergence of lar...
#1Jes√ļs Alcal√°-Fdez (UGR: University of Granada)H-Index: 22
#2Alberto Fernández (University of Jaén)H-Index: 55
Last. Francisco Herrera (UGR: University of Granada)H-Index: 158
view all 7 authors...
(Knowledge Extraction based onEvolutionary Learning) tool, an open source software that supports datamanagement and a designer of experiments. KEEL pays special attentionto the implementation of evolutionary learning and soft computing basedtechniques for Data Mining problems including regression, classiÔ¨Ācation,clustering, pattern mining and so on.The aim of this paper is to present three new aspects of KEEL: KEEL-dataset, a data set repository which includes the data set partitions in theKEELfo...
#1Léon BottouH-Index: 70
During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algo...
#1John C. Duchi (University of California, Berkeley)H-Index: 61
#2Yoram SingerH-Index: 73
We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This view yields a simple yet effective algorithm that can be used for batch penal...
#1Terry AndersonH-Index: 60
Neither an academic tome nor a prescriptive 'how to' guide, "The Theory and Practice of Online Learning" is an illuminating collection of essays by practitioners and scholars active in the complex field of distance education.Distance education has evolved significantly in its 150 years of existence. For most of this time, it was an individual pursuit defined by infrequent postal communication. But recently, three more developmental generations have emerged, supported by television and radio, tel...
Providing travel time information to travelers on available route alternatives in traffic networks is widely believed to yield positive effects on individual drive behavior and (route/departure time) choice behavior, as well as on collective traffic operations in terms of, for example, overall time savings and-if nothing else-on the reliability of travel times. As such, there is an increasing need for fast and reliable online travel time prediction models. Previous research showed that data-driv...
#1Song-Chun Zhu (UCLA: University of California, Los Angeles)H-Index: 78
#2David Mumford (Brown University)H-Index: 73
This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents both the hierarchical decompositions from scenes, to objects, parts, primitives and pixels by terminal and nonterminal nodes and the contexts for spatial and functional relations by horizontal links betw...
Cited By1
This study uses deep learning to model the discharge characteristic curve of the lithium-ion battery. The battery measurement instrument was used to charge and discharge the battery to establish the discharge characteristic curve. The parameter method tries to find the discharge characteristic curve and was improved by MLP (multilayer perceptron), RNN (recurrent neural network), LSTM (long short-term memory), and GRU (gated recurrent unit). The results obtained by these methods were graphs. We u...
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.