Contextual Policy Search for Micro-Data Robot Motion Learning through Covariate Gaussian Process Latent Variable Models

Published on Oct 24, 2020 in IROS (Intelligent Robots and Systems)
· DOI :10.1109/IROS45743.2020.9340709
Juan Antonio Delgado-Guerrero1
Estimated H-index: 1
(CSIC: Spanish National Research Council),
Adrià Colomé9
Estimated H-index: 9
(CSIC: Spanish National Research Council),
Carme Torras33
Estimated H-index: 33
(CSIC: Spanish National Research Council)
Sources
Abstract
In the next few years, the amount and variety of context-aware robotic manipulator applications is expected to increase significantly, especially in household environments. In such spaces, thanks to programming by demonstration, non-expert people will be able to teach robots how to perform specific tasks, for which the adaptation to the environment is imperative, for the sake of effectiveness and users safety. These robot motion learning procedures allow the encoding of such tasks by means of parameterized trajectory generators, usually a Movement Primitive (MP) conditioned on contextual variables. However, naively sampled solutions from these MPs are generally suboptimal/inefficient, according to a given reward function. Hence, Policy Search (PS) algorithms leverage the information of the experienced rewards to improve the robot performance over executions, even for new context configurations. Given the complexity of the aforementioned tasks, PS methods face the challenge of exploring in high-dimensional parameter search spaces. In this work, a solution combining Bayesian Optimization, a data-efficient PS algorithm, with covariate Gaussian Process Latent Variable Models, a recent Dimensionality Reduction technique, is presented. It enables reducing dimensionality and exploiting prior demonstrations to converge in few iterations, while also being compliant with context requirements. Thus, contextual variables are considered in the latent search space, from which a surrogate model for the reward function is built. Then, samples are generated in a low-dimensional latent space, and mapped to a context-dependent trajectory. This allows us to drastically reduce the search space with the covariate GPLVM, e.g. from 105 to 2 parameters, plus a few contextual features. Experimentation in two different scenarios proves the data-efficiency and the power of dimensionality reduction of our approach.
References31
Newest
May 1, 2020 in ICRA (International Conference on Robotics and Automation)
#1Juan Antonio Delgado-Guerrero (CSIC: Spanish National Research Council)H-Index: 1
#2Adrià Colomé (CSIC: Spanish National Research Council)H-Index: 9
Last. Carme Torras (CSIC: Spanish National Research Council)H-Index: 33
view all 3 authors...
Robotic manipulators are reaching a state where we could see them in household environments in the following decade. Nevertheless, such robots need to be easy to instruct by lay people. This is why kinesthetic teaching has become very popular in recent years, in which the robot is taught a motion that is encoded as a parametric function - usually a Movement Primitive (MP)-. This approach produces trajectories that are usually suboptimal, and the robot needs to be able to improve them through tri...
Source
#1Konstantinos Chatzilygeroudis (CNRS: Centre national de la recherche scientifique)H-Index: 11
#2Vassilis Vassiliades (University of Lorraine)H-Index: 9
Last. Jean-Baptiste Mouret (CNRS: Centre national de la recherche scientifique)H-Index: 33
view all 5 authors...
Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynami...
Source
May 24, 2019 in ICML (International Conference on Machine Learning)
#1Kaspar Märtens (University of Oxford)H-Index: 5
#2Kieran R. Campbell (UBC: University of British Columbia)H-Index: 10
Last. Christopher Yau (University of Birmingham)H-Index: 29
view all 3 authors...
The interpretation of complex high-dimensional data typically requires the use of dimensionality reduction techniques to extract explanatory low-dimensional representations. However, in many real-world problems these representations may not be sufficient to aid interpretation on their own, and it would be desirable to interpret the model in terms of the original features themselves. Our goal is to characterise how feature-level variation depends on latent low-dimensional representations, externa...
May 20, 2019 in ICRA (International Conference on Robotics and Automation)
#1Robert Pinsler (University of Cambridge)H-Index: 6
#2Peter Karkus (NUS: National University of Singapore)H-Index: 9
Last. Wee Sun Lee (NUS: National University of Singapore)H-Index: 49
view all 5 authors...
Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different task contexts. Contextual policy search offers data-efficient learning and generalization by explicitly conditioning the policy on a parametric context space. In this paper, we further structure the contextual policy representation. We propose to factor contexts into two components: target contexts that describe the task objectives, e.g. target posit...
Source
#1Pascal KlinkH-Index: 2
#2Hany AbdulsamadH-Index: 6
Last. Jan PetersH-Index: 86
view all 4 authors...
Generalization and adaptation of learned skills to novel situations is a core requirement for intelligent autonomous robots. Although contextual reinforcement learning provides a principled framework for learning and generalization of behaviors across related tasks, it generally relies on uninformed sampling of environments from an unknown, uncontrolled context distribution, thus missing the benefits of structured, sequential learning. We introduce a novel relative entropy reinforcement learning...
Jul 20, 2018 in ICRA (International Conference on Robotics and Automation)
#1Adrià Colomé (CSIC: Spanish National Research Council)H-Index: 9
#2Carme Torras (CSIC: Spanish National Research Council)H-Index: 33
Robotic manipulation often requires adaptation to changing environments. Such changes can be represented by a certain number of contextual variables that may be observed or sensed in different manners. When learning and representing robot motion—usually with movement primitives, it is desirable to adapt the learned behaviors to the current context. Moreover, different actions or motions can be considered in the same framework, using contextualization to decide which action applies to which situa...
Source
#1Adrià Colomé (CSIC: Spanish National Research Council)H-Index: 9
#2Carme Torras (CSIC: Spanish National Research Council)H-Index: 33
Dynamic movement primitives (DMPs) are widely used as movement parametrization for learning robot trajectories, because of their linearity in the parameters, rescaling robustness, and continuity. However, when learning a movement with DMPs, a very large number of Gaussian approximations needs to be performed. Adding them up for all joints yields too many parameters to be explored when using reinforcement learning (RL), thus requiring a prohibitive number of experiments/simulations to converge to...
Source
#1Andras Kupcsik (NUS: National University of Singapore)H-Index: 6
#2Marc Peter Deisenroth (ICL: Imperial College London)H-Index: 37
Last. Gerhard NeumannH-Index: 33
view all 6 authors...
In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A com...
Source
Mar 21, 2017 in IROS (Intelligent Robots and Systems)
#1Konstantinos Chatzilygeroudis (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 11
#2Roberto Rama (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 2
Last. Jean-Baptiste Mouret (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 33
view all 6 authors...
The most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynamical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of re...
Source
Abstract Gaussian Process Latent Variable Model (GPLVM), as a flexible bayesian non-parametric modeling method, has been extensively studied and applied in many learning tasks such as Intrusion Detection, Image Reconstruction, Facial Expression Recognition, Human pose estimation and so on. In this paper, we give a review and analysis for GPLVM and its extensions. Firstly, we formulate basic GPLVM and discuss its relation to Kernel Principal Components Analysis . Secondly, we summarize its improv...
Source
Cited By1
Newest
#1Fabio Amadio (UNIPD: University of Padua)H-Index: 1
Last. Carme TorrasH-Index: 33
view all 4 authors...
Over the last years, robotic cloth manipulation has gained relevance within the research community. While significant advances have been made in robotic manipulation of rigid objects, the manipulation of non-rigid objects such as cloth garments is still a challenging problem. The uncertainty on how cloth behaves often requires the use of model-based approaches. However, cloth models have a very high dimensionality. Therefore, it is difficult to find a middle point between providing a manipulator...
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.