Higher Coordination with Less Control

Abstract

This work presents a novel learning method in the context of embodied artiﬁcial intelligence and self-organization, which has as few assumptions and restrictions as possible about the world and the underlying model. The learning rule is derived from the principle of maximizing the predictive information in the sensorimotor loop. It is evaluated on robot chains of varying length with individually controlled, non-communicating segments. The comparison of the results shows that maximizing the predictive information per wheel leads to a higher coordinated behavior of the physically connected robots compared to a maximization per robot. Another focus of this paper is the analysis of the eﬀect of the robot chain length on the overall behavior of the robots. It will be shown that longer chains with less capable controllers outperform those of shorter length and more complex controllers. The reason is found and discussed in the information-geometric interpretation of the learning process.

http://journals.sagepub.com/doi/10.1177/1059712310375314

Primary Question asked in this paper

How can a system with no prior knowledge about itself or the environment gather information so that it is able to perform a task? This is the underlying question of this work.

Reference

• K. Zahedi, N. Ay, and R. Der, “Higher coordination with less control – a result of information maximization in the sensori-motor loop,” Adaptive behavior, vol. 18, iss. 3–4, p. 338–355, 2010.
[Bibtex]
@article{Zahedi2010aHigher,
Author = {Zahedi, Keyan and Ay, Nihat and Der, Ralf},
Number = {3--4},
Pages = {338--355},
Title = {Higher coordination with less control -- A result of information maximization in the sensori-motor loop},
Volume = {18},
Year = {2010},

The paper in a nutshell

Predictive information is the mutual information of the past and future of a random variable. We applied it to the sensor values $S$ of an autonomous agent. In this case, it can be written in the following form

$\fn_phv&space;I(\stackrel{\leftarrow}{S};\stackrel{\rightarrow}{S})&space;=&space;H(\stackrel{\rightarrow}{S})&space;-&space;H(\stackrel{\rightarrow}{S}|\stackrel{\leftarrow}{S}),$

where $\fn_phv&space;\stackrel{\leftarrow}{S}$ is the past and $\fn_phv&space;\stackrel{\rightarrow}{S}$ is the future of the sensor values. The following image is a depiction of the PI

We use a one-step approximation, which is given by

$\fn_phv&space;I(S_t;S_{t+1})&space;=&space;H(S_{t+1})&space;-&space;H(S_{t+1}|S_t).$

The predictive information can be written in the following form

$\fn_phv&space;I(S_t;S_{t+1})&space;=&space;\sum_{s,s'\in\mathcal{S}}&space;p(s',s)&space;\log_s\frac{p(s'|s)}{p(s')}$

In this form, maximising predictive information would require information that is not intrinsically available to the agent. Hence, we rewrite it in the following form:

$\fn_phv&space;I(S_t;S_{t+1})&space;=&space;\sum_{s,s'\in\mathcal{S}}&space;\sum_{a\in\mathcal{A}}&space;p(s',s,a)&space;\log_s\frac{\sum_{a'\in\mathcal{A}}p(s',s,a)}{p(s)\sum_{s''\in\mathcal{A},a'\in\mathcal{A}}p(s',s'',a')}\\&space;\hspace*{2.4cm}=&space;\sum_{s',s,a}p(s'|s,a)p(a|s)p(s)\log_2\frac{\sum_{a'}p(s'|s,a)p(a|s)p(s)}{p(s)\sum_{s'',a'}&space;p(s'|s'',a')p(a'|s'')p(s'')},$

where $\fn_phv&space;p(s'|s,a)$ is the intrinsic world model $\fn_phv&space;p(a|s)$ is the policy, and $\fn_phv&space;p(s)$ is the input distribution.

Applying Amari’s natural gradient method, we are able to obtain a policy gradient that maximises the predictive information:

The input distribution is updated through sampling

$\fn_phv&space;p^{(0)}(s)&space;=&space;\frac{1}{|\mathcal{S}|}\\&space;p^{(n+1)}(s)&space;=&space;\left\{\begin{array}{cl}&space;\displaystyle\frac{n}{n+1}p^{(n-1)}(s)+\frac{1}{n+1}&space;&&space;\text{if&space;}&space;S_{n+1}&space;=&space;s\\[5ex]&space;\displaystyle\frac{n}{n+1}p^{(n-1)}(s)&space;&&space;\text{if&space;}&space;S_{n+1}&space;\not=&space;s&space;\end{array}&space;\right.$

just as the world model is:

$\fn_phv&space;p^{(0)}(s'|s,a)&space;=&space;\frac{1}{|\mathcal{S}|}\\&space;p^{\left(n_{a}^s\right)}(s'|s,{a})&space;:=&space;\left\{\begin{array}{ll}&space;\displaystyle\frac{n_{a}^s}{n_{a}^s+1}p^{(n_{a}^s-1)}(s'|s,{a})+\frac{1}{n_{a}^s+1}&space;&&space;{{\text{if&space;}&space;{S_{n_{a}^s+1}&space;=&space;s',\,&space;S_n=s,\,A_{n_{a}^s+1}={a}}}}\\[3ex]&space;\displaystyle\frac{n_{a}^s}{n_{a}^s+1}p^{(n_{a}^s-1)}(s'|s,a)&space;&&space;{\text{if&space;}&space;S_{n_{a}^s+1}&space;\not=&space;s',\,&space;S_n=s,\,A_{n_{a}^s+1}={a}}\\[3ex]&space;p^{(n_{a}^s-1)}(s'|s,{a})&space;&&space;{\text{if&space;}&space;S_{n_{a}^s}\not=s&space;\text{&space;or&space;}&space;\,A_{n_{_{s,{a}}}+1}\not={a}}&space;\end{array}\right.$

The policy is updated in the following way

$\fn_phv&space;\pi^{(0)}({a}|s)&space;:=&space;\frac{1}{|S|}&space;\\&space;\pi^{(n)}({a}|s)&space;=&space;\pi^{(n-1)}({a}|s)&space;+&space;\frac{1}{n+1}&space;\pi^{(n)}({a}|s)&space;\left(F(s,a)&space;-&space;\sum_{a}&space;\pi^{(n-1)}(a|s)&space;F(s,a)\right)\\&space;F(s,a)&space;:=&space;p^{(n)}(s)\sum_{s'}p^{(n)}(s'|s,{a})&space;\log_2\frac{\sum_{{a}}\pi^{(n-1)}({a}|s)p^{(n)}(s'|s,{a})}&space;{\sum_{s''}p^{(n)}(s'')&space;\sum_a&space;\pi^{(n-1)}(a&space;|&space;s'')&space;\,&space;p^{(n)}(s'&space;|&space;s'',a)}$

Experiments

We conducted our experiments with the YARS Simulator, which freely available [here]. Installation instructions are found [here]. Examples are found [here].

We simulated a two-wheeled, different drive robot that was loosely inspired by the Khepera robot, which we also passive couple to a chain of robots, as the following image shows:

Shown below are robot chains of length one, three, and five. For each of these robot chains, we evaluated two different control strategies, to which we refer as combined and split control:

The results are shown below. For a discussion, please read the paper.

Videos

Five robots with split control

This site uses Akismet to reduce spam. Learn how your comment data is processed.