A Theory of Cheap Control in Embodied Systems

Contents

Abstract

We present a framework for designing cheap control architectures of embodied agents. Our derivation is guided by the classical problem of universal approximation, whereby we explore the possibility of exploiting the agent’s embodiment for a new and more efficient universal approximation of behaviors generated by sensorimotor control. This embodied universal approximation is compared with the classical non-embodied universal approximation. To exemplify our approach, we present a detailed quantitative case study for policy models defined in terms of conditional restricted Boltzmann machines. In contrast to non-embodied universal approximation, which requires an exponential number of parameters, in the embodied setting we are able to generate all possible behaviors with a drastically smaller model, thus obtaining cheap universal approximation. We test and corroborate the theory experimentally with a six-legged walking machine. The experiments indicate that the controller complexity predicted by our theory is close to the minimal sufficient value, which means that the theory has direct practical implications.

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004427

Reference

• G. Montúfar, K. Ghazi-Zahedi, and N. Ay, “A theory of cheap control in embodied systems,” Plos comput biol, vol. 11, iss. 9, p. e1004427, 2015.
[Bibtex]
@article{Montufar2015aA-Theory,
Author = {Mont{\'u}far, Guido AND Ghazi-Zahedi, Keyan AND Ay, Nihat},
Journal = {PLoS Comput Biol},
Month = {09},
Number = {9},
Pages = {e1004427},
Pdf = {http://dx.doi.org/10.1371%2Fjournal.pcbi.1004427},
Publisher = {Public Library of Science},
Title = {A Theory of Cheap Control in Embodied Systems},
Volume = {11},
Year = {2015}}

In a nutshell

We model the behaviour of a system by the conditional probability over its world states

$\fn_phv&space;\mathbb{P}^\pi(w_1,&space;w_2,&space;\ldots,&space;w_T|w_0)&space;:=&space;\sum_\mathcal{S}\sum_\mathcal{A}\prod_{t=0}^{T-1}&space;\beta(s_t|w_t)&space;\pi(a_t|s_t)&space;\alpha(w_{t+1}|w_t,a_t)$

We can now define a policy-behaviour map, which assigns a behaviour to each policy

$\fn_phv&space;\psi_\infty\colon&space;\Delta_\mathcal{A}^\mathcal{S}\longrightarrow&space;\Delta_{\mathcal{W}^\infty}^\mathcal{W},\qquad&space;\pi&space;\mapsto&space;\mathbb{P}^\pi(w_1,&space;w_2,&space;\ldots|w_0)$

Given a morphology, it is clear that there are policies, which generate the same behaviour

$\fn_phv&space;\psi_\infty(\pi_1)&space;=&space;\psi_\infty(\pi_2)$

So we can ask the question, what is the minimal set of policies that generate all possible behaviours. We want to now find the minimal model $\fn_phv&space;\overline{\mathcal{M}}$ that parameterises this set:

$\fn_phv&space;\psi_\infty(\overline{\mathcal{M}})&space;=&space;\psi_\infty(\Delta^\mathcal{S}_\mathcal{A})$

Finally, we can define the embodiment behaviour dimension in the following way

$\fn_phv&space;d=\dim(\Delta^\mathcal{S}_\mathcal{A})$

We apply this theory to estimate the number of hidden unit $m$ of a CRBM that are required to reproduce a target behaviour:

$\fn_phv&space;m&space;\,&space;\geq&space;\,&space;|{\mathcal&space;S}_\beta|&space;+&space;d_{\alpha,\beta}&space;-&space;1,$

where $\fn_phv&space;|{\mathcal&space;S}_\beta|$ is the approx. the number of different sensor values that the agent receives, and $\fn_phv&space;d_{\alpha,\beta}$ is the corresponding embodiment behaviour dimension.

We used this to estimate that a tripod walking behaviour required 65 hidden units. The details are presented in the paper. The videos can be found below.

Videos

[yt4wp-video video_id=”6wgo6tSDODg”]

This video shows the behaviour of the hexapod with a limited brain of only 35 hidden units. The hexapod is able to move but is far from reproducing the target behaviour (shown as transparent hexapod). The left hand of the video shows how the sensor values are translated into binary input units and how the output of the CRBM is transformed to control the robot.

[yt4wp-video video_id=”vzEqNDHyuug”]

As predicted, the CRBM is able to reproduce the target behaviour with 65 hidden units.

The full list of videos can be found here.

This site uses Akismet to reduce spam. Learn how your comment data is processed.