V-3. Mean Dynamics

Luis R. Izquierdo; Segismundo S. Izquierdo; William H. Sandholm

Part V. Agent-based models vs ODE models

V-3. Mean Dynamics

1. Introduction

Every dynamic that can be run with the NetLogo model we have implemented in the previous chapter (nxn-games-in-well-mixed-populations.nlogo) can be usefully seen as a time-homogeneous Markov chain on the space of possible strategy distributions. This means that the fraction of agents that are using each strategy, i.e., the population state, contains all the information we need to –probabilistically– predict the evolution of the agent-based dynamic as accurately as it is possible.

Let $x_i$ denote the fraction of agents using strategy $i$ . In games with $n$ strategies, population states $x = (x_1, ..., x_n)$ are elements of the simplex $\Lambda = \{x \in \mathbb{R}^n_+ \colon \sum_{i=1}^n x_i=1\}$ . More specifically, if there are $N$ agents, population states are elements of the finite grid $\mathbb{X}^N = \{x \in \Lambda \colon Nx_i \in \mathbb{Z}\text{ for all }i\}$ .

A fully parameterized model defines a discrete-time Markov chain $\{X^N_k\}$ on state space $\mathbb{X}^N$ , where $k$ is the index for the ticks. Our goal in this chapter is to approximate the transient dynamics of this (stochastic) process $\{X^N_k\}$ with its corresponding (deterministic) mean dynamic.

In the next section, we explain how to derive the mean dynamic in general, and in section 3 we present the mean dynamic for every possible parameterization of NetLogo model nxn-games-in-well-mixed-populations.nlogo. Then, in section 4 we show how we can numerically solve the mean dynamic using the Euler method within NetLogo. In section 5, we present some representative simulations together with their corresponding mean dynamics solved at runtime within NetLogo. In this way, we will be able to appreciate how useful the mean dynamic can be. We conclude the chapter (and the book) emphasizing how sensitive agent-based evolutionary dynamics can be to seemingly unimportant details.

2. The mean dynamic

The mean dynamic (Benaïm & Weibull, 2003; Sandholm, 2010a, chapter 10; Roth & Sandholm, 2013) is a system of ODEs that provides a good deterministic approximation to the dynamics of the stochastic evolutionary process $\{X^N_k\}$

over finite time spans,
when the number of agents is sufficiently large, and
when the revision probability is sufficiently low.

To derive the mean dynamic, we consider the behavior of the process $\{X^N_k\}$ over the next $dt$ time units, departing from population state $x$ . For convenience, we define one unit of clock time as 1/prob-revision ticks, i.e., the number of ticks over which every agent is expected to receive exactly one revision opportunity.^[1] Thus, over the time interval $dt$ , the number of agents who are expected to receive a revision opportunity is $N dt.$

Of the $N dt$ agents who revise their strategies over the time interval $dt$ , $x_j N dt$ are expected to be $j$ -strategists. Let $p^N_{ji}(x)$ be the conditional probability that a reviser using strategy $j$ adopts strategy $i$ (so $p_{jj}$ denotes the probability that a reviser using strategy $j$ keeps using strategy $j$ after revision). Using this notation, the expected number of $j$ -revisers who adopt strategy $i$ over the time interval $dt$ is $x_j \,p^N_{ji}(x)\, N dt.$ Hence, the expected change in the number of agents that are using strategy $i$ over the time interval $dt$ equals:

the number of revisers who adopt strategy $i$ , i.e. the inflow, which is $\sum_{j} x_j \, p^N_{ji}(x) \, N dt$ ,
minus the number of $i$ -revisers, i.e. the outflow, which is $x_i \, N dt$ .

Therefore, the expected change in the fraction of agents using strategy $i$ , i.e. the expected change in variable $x_i$ at population state $x$ , is:

$dx_i = \frac1N \left(\sum_{j} x_j \, p^N_{ji}(x) \, N dt - x_i \, N dt\right) = \left(\sum_{j} x_j \, p^N_{ji}(x) - x_i\right) \,dt$

Note that conditional probabilities $p^N_{ji}(x)$ may depend on $N$ . This does not represent a problem as long as this dependency vanishes as $N$ gets large. In that case, to deal with that dependency, we take the limit of $p^N_{ji}(x)$ as $N$ goes to infinity since, after all, the mean dynamic approximation is only valid for large $N$ . Thus, defining

$p^\infty_{ji}(x) = \lim_{N \to \infty}p^N_{ji}(x)$

we arrive at the mean dynamic equation for strategy $i$ :

$\frac{d}{dt}x_i = \dot x_i = \sum_{j} x_j \, p^\infty_{ji}(x) - x_i$

(1)

An equivalent formulation that is sometimes more convenient to use is the following:

$\dot{x}_i = \sum_{j \neq i} x_j \, p^\infty_{ji}(x) - x_i \sum_{j \neq i} p^\infty_{ij}(x)$

(2)

The two formulations above only differ in how they treat $i$ -revisers who keep strategy $i$ . In the first formulation (1), these agents are included both in the inflow and in the outflow, while in the second formulation (2) they are excluded from both flows. In any case, it is clear that all we need to derive the mean dynamic of a model is the conditional probabilities $p^\infty_{ji}(x)$ .

Finally, introducing noise in the mean dynamic is not difficult. If noise $=\mu$ , i.e., if revisers choose a random strategy with probability $\mu$ , and with probability $(1-\mu)$ they adopt strategies with conditional probabilities $p^N_{ji}$ , the first formulation of the general mean dynamic (1) would be:

$\dot x_i = (1-\mu) \left(\sum_{j} x_j \, p^\infty_{ji}(x) - x_i\right) + \mu \left(\frac1n-x_i\right)$

And the second formulation would be adapted as follows:

$\dot x_i = (1-\mu) \left(\sum_{j \neq i} x_j \, p^\infty_{ji}(x) - x_i \sum_{j \neq i} p^\infty_{ij}(x)\right) + \mu \left(\frac1n-x_i\right)$

3. Derivation of the mean dynamic for different stochastic processes

In this section we derive the mean dynamic for every possible parameterization of the NetLogo model we developed in the previous chapter (nxn-games-in-well-mixed-populations.nlogo). In particular, we analyze every possible combination of values for parameters decision-rule and payoff-to-use. For each combination of these two parameter values, we provide a formula for the mean dynamic that is valid for any game (payoffs) and any initial condition (n-of-players-for-each-strategy). The impact of noise can be easily taken into account as explained above. The mean dynamic will be a good approximation for the transient dynamics of the corresponding agent-based model when prob-revision is low and the population size $N$ is large.

3.1. Imitate if better

Description: The revising agent looks at another individual at random and imitates her strategy if and only if the observed agent’s payoff is greater than the reviser’s payoff.

Let us consider the two possible values of parameter payoff-to-use.

payoff-to-use = “play-with-one-rd-agent“

The probability that a reviser is a $j$ -player with payoff $\pi_{jl}$ is $x_j \, x_l$ . The probability that the observed agent is an $i$ -player with payoff $\pi_{ik}$ is $x_i \, x_k$ .

The probability that a $j$ -player with payoff $\pi_{jl}$ observes an $i$ -player with payoff $\pi_{ik}$ is $x_j \, x_l \, x_i \, x_k$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player if $\pi_{ik}>\pi_{jl}$ , the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is

$Inflow_i (x) = \sum_{j \neq i} x_j \, p^\infty_{ji}(x) = x_i \sum_{j \neq i} x_j \sum_k x_k \left(\sum_{l: \pi_{ik} > \pi_{jl}} x_l\right)$

and the total $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \sum_{j \neq i} p^\infty_{ij}(x) = x_i \sum_{j \neq i} x_j \sum_k x_k \left(\sum_{l: \pi_{ik} < \pi_{jl}} x_l\right)$

The mean dynamic is then

$\dot{x}_i = Inflow_i (x) - Outflow_i (x) = x_i \sum_{j \neq i} x_j \sum_k x_k \left(\sum_{l: \pi_{ik} > \pi_{jl}} x_l - \sum_{l: \pi_{ik}< \pi_{jl}} x_l \right)$

This rule has been studied by Izquierdo and Izquierdo (2013) and Loginov (2021). Loginov (2021) calls this rule “imitate-the-better-realization”.

payoff-to-use = “use-strategy-expected-payoff“

The probability that a reviser is a $j$ -player (whose payoff is $\pi_{j}$ ) who observes an $i$ -player (whose payoff is $\pi_{i}$ ) is $x_j \, x_i$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player iff $\pi_{i}>\pi_{j}$ , the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is

$Inflow_i (x) = \sum_{j \neq i} x_j \, p^\infty_{ji}(x) = \sum_{j: \pi_j < \pi_i} x_j x_i$

and the $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \sum_{j \neq i} p^\infty_{ij}(x) = x_i \sum_{j: \pi_j > \pi_i} x_j$

The mean dynamic is then

$\begin{align*}\dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \sum_{j: \pi_j < \pi_i} x_j x_i - x_i \sum_{j: \pi_j > \pi_i} x_j \\ &= x_i \left( \sum_{j: \pi_j < \pi_i} x_j - \sum_{j: \pi_j > \pi_i} x_j\right) \end{align*}$

3.2. Imitative pairwise-difference

Description: The revising agent looks at another individual at random and considers imitating her strategy only if her payoff is greater than the reviser’s payoff; in that case he switches with probability proportional to the payoff difference. In order to turn the difference in payoffs into a meaningful probability, we divide the payoff difference by the maximum possible payoff ( $M$ ) minus the minimum possible payoff ( $m$ ).

payoff-to-use = “play-with-one-rd-agent“

The probability that a reviser is a $j$ -player with payoff $\pi_{jl}$ is $x_j \, x_l$ . The probability that the observed agent is an $i$ -player with payoff $\pi_{ik}$ is $x_i \, x_k$ . The probability that a $j$ -player with payoff $\pi_{jl}$ observes an $i$ -player with payoff $\pi_{ik}$ is $x_j \, x_l \, x_i \, x_k$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player with probability $\frac{ [\pi_{ik}-\pi_{jl}]_+}{M-m},$ the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is^[2]

$Inflow_i (x) = \sum_{j \neq i} x_j \, p^\infty_{ji}(x) = \sum_{j \neq i} x_j x_i \sum_k x_k \sum_{l} x_l \frac{ [\pi_{ik}-\pi_{jl}]_+}{M-m}$

and the $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \sum_{j \neq i} p^\infty_{ij}(x) = x_i \sum_{j \neq i} x_j \sum_k x_k \sum_l x_l \frac{ [\pi_{jl} - \pi_{ik}]_+}{M-m}$

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \frac{x_i}{M-m} \sum_{j \neq i} x_j \sum_k x_k \sum_{l} x_l \left([\pi_{ik}-\pi_{jl}]_+ - [\pi_{jl} - \pi_{ik}]_+ \right) \\ &= \frac{x_i}{M-m} \sum_{j \neq i} x_j \sum_k x_k \sum_{l} x_l (\pi_{ik}-\pi_{jl}) \\ &= \frac{x_i}{M-m} (\pi_i - \pi)\end{align*}$

This ODE is the replicator dynamics with a constant speed factor $\frac{1}{M-m}.$ The orbits of the mean dynamic do not change with the speed factor. However, if we are interested in the relationship between the solution of the mean dynamic at a given time $t$ and the stochastic process after a number of ticks, then the speed factor must be considered.

payoff-to-use = “use-strategy-expected-payoff“

The probability that a revising $j$ -player (with payoff $\pi_{j}$ ) observes an $i$ -player (with payoff $\pi_{i}$ ) is $x_j \, x_i$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player with probability $\frac{ [\pi_{i}-\pi_{j}]_+}{M-m}$ , the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is

$Inflow_i (x) = \sum_{j \neq i} x_j \, p^\infty_{ji}(x) = \sum_{j \neq i} x_j x_i \frac{ [\pi_{i}-\pi_{j}]_+}{M-m}$

and the $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \sum_{j \neq i} p^\infty_{ij}(x) = x_i \sum_{j \neq i} x_j \frac{ [\pi_{j} - \pi_{i}]_+}{M-m}$

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \frac{x_i}{M-m} \sum_{j \neq i} x_j \left([\pi_{i}-\pi_{j}]_+ - [\pi_{j} - \pi_{i}]_+ \right)\\ &= \frac{x_i}{M-m} \sum_{j \neq i} x_j (\pi_{i}-\pi_{j}) \\ &= \frac{x_i}{M-m} (\pi_i - \pi)\end{align*}$

This ODE is the replicator dynamics with constant speed factor $\frac{1}{M-m}$ .

3.3. Imitative linear attraction

Description: The revising agent selects another individual at random and imitates her strategy with probability equal to the difference between the observed individual’s payoff and the minimum possible payoff ( $m$ ), divided by the maximum possible payoff ( $M$ ) minus the minimum possible payoff. (The revising agent’s payoff is ignored.)

payoff-to-use = “play-with-one-rd-agent“

The probability that a revising $j$ -player observes an $i$ -player with payoff $\pi_{ik}$ is $x_j \, x_i \, x_k$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player with probability $\frac{ \pi_{ik}-m}{M-m}$ , the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is

$Inflow_i (x) = \sum_{j \neq i} x_j \, p^\infty_{ji}(x) = \sum_{j \neq i} x_j x_i \sum_k x_k \frac{ \pi_{ik}-m}{M-m}$

and the $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \sum_{j \neq i} p^\infty_{ij}(x) = x_i \sum_{j \neq i} x_j \sum_k x_k \frac{ \pi_{jk} - m}{M-m}$

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \frac{x_i}{M-m} \sum_{j} x_j \sum_k x_k (\pi_{ik}-\pi_{jk}) \\ &= \frac{x_i}{M-m} (\pi_i - \pi)\end{align*}$

This ODE is the replicator dynamics with constant speed factor $\frac{1}{M-m}$ .

payoff-to-use = “use-strategy-expected-payoff“

The probability that a revising $j$ -player observes an $i$ -player (whose payoff is $\pi_{i}$ ) is $x_j \, x_i$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player with probability $\frac{ \pi_{i}-m}{M-m}$ , the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is

$Inflow_i (x) = \sum_{j \neq i} x_j \, p^\infty_{ji}(x) = \sum_{j \neq i} x_j x_i \frac{ \pi_{i}-m}{M-m}$

and the $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \sum_{j \neq i} p^\infty_{ij}(x) = x_i \sum_{j \neq i} x_j \frac{ \pi_{j} - m}{M-m}$

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \frac{x_i}{M-m} \sum_{j} x_j (\pi_{i}-\pi_{j}) \\ &= \frac{x_i}{M-m} (\pi_i - \pi)\end{align*}$

This ODE is the replicator dynamics with constant speed factor $\frac{1}{M-m}$ .

3.4. Imitative linear dissatisfaction

Description: The revising agent selects another agent at random and imitates her strategy with probability equal to the difference between the maximum possible payoff ( $M$ ) and his own payoff, divided by the maximum possible payoff minus the minimum possible payoff ( $m$ ). (The observed agent’s payoff is ignored.)

payoff-to-use = “play-with-one-rd-agent“

The probability that a revising $j$ -player with payoff $\pi_{jk}$ observes an $i$ -player is $x_j \, x_k \, x_i$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player with probability $\frac{M-\pi_{jk}}{M-m}$ , the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is

$Inflow_i (x) = \sum_{j \neq i} x_j \, p^\infty_{ji}(x) = x_i \sum_{j \neq i} x_j \sum_k x_k \frac{M-\pi_{jk}}{M-m}$

and the $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \sum_{j \neq i} p^\infty_{ij}(x) = x_i \sum_{j \neq i} x_j \sum_k x_k \frac{M-\pi_{ik}}{M-m}$

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \frac{x_i}{M-m} \sum_{j} x_j \sum_k x_k (\pi_{ik}-\pi_{jk}) \\ &= \frac{x_i}{M-m} (\pi_i - \pi)\end{align*}$

Again, this ODE is the replicator dynamics with constant speed factor $\frac{1}{M-m}$ .

payoff-to-use = “use-strategy-expected-payoff“

The probability that a revising $j$ -player (whose payoff is $\pi_{j}$ ) observes an $i$ -player is $x_j \, x_i$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player with probability $\frac{ M-\pi_{j}}{M-m}$ , the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is

$Inflow_i (x) = \sum_{j \neq i} x_j \, p^\infty_{ji}(x) = x_i \sum_{j \neq i} x_j \frac{ M-\pi_{j}}{M-m}$

and the $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \sum_{j \neq i} p^\infty_{ij}(x) = x_i \sum_{j \neq i} x_j \frac{ M-\pi_{i} }{M-m}$

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \frac{x_i}{M-m} \sum_{j} x_j (\pi_{i}-\pi_{j}) \\ &= \frac{x_i}{M-m} (\pi_i - \pi)\end{align*}$

Again, this ODE is the replicator dynamics with constant speed factor $\frac{1}{M-m}$ .

3.5. Direct best

Description: The revising agent selects the strategy with the greatest payoff. Ties are resolved uniformly at random.

payoff-to-use = “play-with-one-rd-agent“

Under this process, a revising agents tests each of the $n$ possible strategies once (with each play of each strategy being against a newly drawn opponent) and selects one of the strategies that obtained the greatest experienced payoff in the test. Ties are resolved uniformly at random, with equal probability for each of the maximum-payoff strategies.

Note that which strategy is selected depends on what strategy is sampled (i.e., which strategy is used by the sampled co-player) when testing each of the $n$ strategies, and we will need some notation to refer to the strategy $s_i$ of the agent that is sampled when testing strategy $i$ . Consider the following notation:

Sample list $s$ . Considering a sample of $n$ agents (one for each tested strategy), let a sample list $s = (s_1, ..., s_n)$ be a list whose $i$ -th component $s_i$ indicates the strategy of the agent that was sampled when testing strategy $i$ . Let $S^n$ be the set of all such lists of strategies of length $n$ . For instance, $S^{n=2} = \{(1,1), (1,2), (2,1), (2,2)\}$ , with e.g. sample list (2,1) indicating that when testing strategy 1 (first position) the sampled agent (co-player) played strategy 2 (element in first position), and when testing strategy 2 (second position) the sampled agent played strategy 1 (element in position 2).
$P_x(s)$ : probability of obtaining sample list $s$ at state $x$ . This probability is $P_x(s) = \prod_{j=1}^n x_{(s_j)}$ .
Experienced payoff vector $\pi(s)$ . This is the vector of experienced payoffs obtained by testing each of the $n$ strategies, when the strategies of the sampled co-players are those in $s$ . The $i$ -th component of this payoff vector, $\pi_i (s)$ , is the payoff to an agent testing strategy $i$ whose co-player uses strategy $s_i$ .
Best experienced payoff indicator for strategy $i$ , $\text{I}_i (\pi(s))$ . Given an experienced payoff vector $\pi(s)$ , this function returns the value 1 if strategy $i$ obtains the maximum payoff in $\pi(s)$ , i.e., if $\pi_i(s) = \max_h \, \pi_h(s)$ , and the value 0 otherwise.

With the previous notation, adding the probabilities of all the events that lead a revising agent to choose strategy $i$ , we have

$Inflow_i (x) = \sum_{s \in S^n} P_x(s) \frac{\text{I}_i(\pi(s))}{\sum_j \text{I}_j(\pi(s))}$

$Outflow_i(x)$ is given by the fraction of revisers that are $i$ -players, i.e., $Outflow_i(x) = x_i$ .

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \sum_{s \in S^n} P_x(s) \frac{\text{I}_i(\pi(s))}{\sum_j \text{I}_j(\pi(s))} - x_i \end{align*}$

This is a best-experienced-payoff dynamic analyzed by Osborne and Rubinstein (1998), Sethi (2000, 2021) and Sandholm et al. (2019, 2020).

payoff-to-use = “use-strategy-expected-payoff“

Under this process, a revising agent adopts one of the strategies with the greatest expected payoff in the population. This is a best-response dynamic. The greatest expected payoff obtained by (one or several) strategies at state $x$ is $M(x) = \max_{j} \pi_{j}$ . Let $BR(x)$ be the set of stategies that obtain the greatest expected payoff $M(x)$ at state $x$ and let $\#BR(x)$ be the cardinality of that set, i.e., the number of strategies that obtain the maximum expected payoff at state $x$ .

The probability that a reviser adopts strategy $i$ is then

$Inflow_i (x) = \begin{cases} \frac{1}{\#BR(x)} & i \in BR(x) \\ 0 & \text{otherwise} \end{cases}$

$Outflow_i(x)$ is given by the fraction of revisers that are $i$ -players, i.e., $Outflow_i(x) = x_i$ . Thus, the mean dynamic is

$\dot{x}_i = \begin{cases} \frac{1}{\#BR(x)} - x_i & i \in BR(x) \\ - x_i & \text{otherwise} \end{cases}$

Gilboa and Matsui (1991), Hofbauer (1995b) and Kandori and Rob (1995) are pioneering references on best-response dynamics.

3.6. Direct pairwise-difference

Description: The revising agent randomly selects another strategy, and considers adopting it only if the payoff associated with the selected strategy is greater than the agent’s current strategy’s payoff; in that case, he switches with probability proportional to the payoff difference. In order to turn the difference in payoffs into a meaningful probability, we divide the payoff difference by the maximum possible payoff $M$ minus the minimum possible payoff $m$ .

payoff-to-use = “play-with-one-rd-agent“

The probability that the revising agent is a $j$ -player with payoff $\pi_{jk}$ and he selects candidate strategy $i$ , with a payoff $\pi_{il}$ obtained for strategy $i$ , is $x_j \, x_k \, \frac{1}{n-1} \, x_l$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player with probability $\frac{ [\pi_{il}-\pi_{jk}]_+}{M-m},$ the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is

$Inflow_i (x) = \frac{1}{(n-1)(M-m)} \sum_{j \neq i,k,l} x_j x_k x_l [\pi_{il}-\pi_{jk}]_+$

and the $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \frac{1}{(n-1)(M-m)} \sum_{j \neq i,k,l} x_k x_l [\pi_{jk} - \pi_{il}]_+$

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \frac{1}{(n-1)(M-m)} \sum_{j \ne i} \left(x_j \sum_{k,l} x_k \, x_l [\pi_{il}-\pi_{jk}]_+ - x_i \sum_{k,l} x_k \, x_l [\pi_{jk} - \pi_{il}]_+\right) \\ \end{align*}$

payoff-to-use = “use-strategy-expected-payoff“

The probability that the revising agent is a $j$ -player (with payoff $\pi_{j}$ ) and he selects candidate strategy $i$ (with payoff $\pi_{i}$ ) is $x_j \, \frac{1}{n-1}$ .

Considering that the revising $j$ -player adopts the strategy of the $i$ -player with probability $\frac{ [\pi_{i}-\pi_{j}]_+}{M-m},$ the total $Inflow_i$ of $j$ -players who switch to strategy $i$ (adding for every $j \neq i$ ) is

$Inflow_i (x) = \frac{1}{(n-1)(M-m)} \sum_{j \neq i} x_j [\pi_{i}-\pi_{j}]_+$

and the $Outflow_i$ of $i$ -strategists who switch to some other strategy $j \neq i$ is

$Outflow_i (x) = x_i \frac{1}{(n-1)(M-m)} \sum_{j \neq i} [\pi_{j} - \pi_{i}]_+$

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \frac{1}{(n-1)(M-m)} \sum_{j} \left(x_j [\pi_{i}-\pi_{j}]_+ - x_i [\pi_{j} - \pi_{i}]_+\right) \\ \end{align*}$

This is the Smith dynamic (Smith, 1984; Sandholm, 2010b) with constant speed factor $\frac{1}{(n-1)(M-m)}$ .

3.7. Direct positive proportional

Description: The revising agent selects one strategy at random, with probabilities proportional to payoffs raised to parameter m, and adopts this strategy. Payoffs are assumed to be non-negative.

payoff-to-use = “play-with-one-rd-agent“

Under this process, using the notation introduced for decision rule “direct-best“, the probability that a reviser selects strategy $i$ is

$Inflow_i (x) = \sum_{s \in S^n} P_x(s) \frac{\pi_{is_i}^m}{\sum_j \pi_{js_j}^m}$

$Outflow_i(x)$ is given by the fraction of revisers that are $i$ -players, i.e., $Outflow_i(x) = x_i$ .

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \sum_{s \in S^n} P_x(s) \frac{\pi_{is_i}^m}{\sum_j \pi_{js_j}^m} - x_i \end{align*}$

payoff-to-use = “use-strategy-expected-payoff“

Under this process, the probability that a reviser selects strategy $i$ is

$Inflow_i (x) = \frac{\pi_i^m}{\sum_j \pi_j^m}$

$Outflow_i(x)$ is given by the fraction of revisers that are $i$ -players, i.e., $Outflow_i(x) = x_i$ .

The mean dynamic is then

$\begin{align*} \dot{x}_i &= Inflow_i (x) - Outflow_i (x) \\ &= \frac{\pi_i^m}{\sum_j \pi_j^m} - x_i \end{align*}$

4. Running an agent-based model and solving its mean dynamic at runtime

In this section, we explain the Euler method to solve ODEs and we show how to implement it in NetLogo. For illustration purposes we use the 1-2 coordination game, which is the 2-player 2-strategy single-optimum coordination game that we introduced in chapter I-2 and analyzed in Part IV. Its payoff matrix is shown in Figure 1 below:

		Player 2
		Player 2 chooses A	Player 2 chooses B
Player 1	Player 1 chooses A	1 , 1	0 , 0
	Player 1 chooses B	0 , 0	2 , 2

Figure 1. Payoffs for the 1-2 coordination game

4.1. The Euler method to numerically solve ODEs

Since there are only two strategies in the 1-2 coordination game, the mean dynamic can be expressed using one single variable. Let $x$ denote the fraction of B-strategists.

To solve the mean dynamic, we use the Euler method, which is one of the simplest numerical procedures for solving ODEs with an initial condition. Consider the ODE $\dot{x} = f(x)$ with initial condition $x(t=0).$ Our goal is to compute a series of values $\{x_k\}_{k=0, 1,...,n}$ which will approximate the solution trajectory $x(t)$ that departs from initial value $x(t=0)$ . Note that in this section $x_k$ denotes a value in the series $\{x_k\}_{k=0, 1,...,n}$ (not the fraction of $k$ -strategists).

We start by setting $x_0=x(t=0)$ , i.e. the first point in our series is the initial condition. To compute the following values $\{x_k\}_{k=1,...,n}$ we proceed recursively as follows:

$x_{k+1} = x_k + h\, f(x_k)$

where $h$ is a step size chosen by us.

Geometrically, note that $f(x_k)$ can be interpreted as the slope of the tangent line to the solution curve $x(t)$ at $x=x_k$ . So basically, departing from $x_k$ , we are approximating the value of the solution trajectory after $h$ time units using the tangent line at $x_k$ , which is the best linear approximation to the curve $x(t)$ at that point (see Figure 2 below).

In this way, by using the Euler method, we are approximating the solution curve $x(t)$ with initial value $x(t=0)$ by the series of line segments that connect the consecutive points $\{x_k\}_{k=0, 1,...,n}$ . The value $x_i$ is the approximation to the solution curve $x(t)$ with initial value $x(t=0)$ after $i \, h$ time units.

How should we choose the value of step size $h$ ? In general, the lower the value of $h$ , the better the approximation. Here, since we are going to run a discrete-time simulation and solve its mean dynamic at the same time, a natural candidate is to choose the time that corresponds to one tick in the NetLogo model. And how long is that? Recall that, in section 2 above, we derived the mean dynamic defining one unit of clock time as 1/prob-revision ticks. Therefore, one tick corresponds to prob-revision time units, so we choose $h =$ prob-revision.

4.2. Solving an ODE numerically within NetLogo

As an example, consider the agent-based dynamic with decision-rule = “direct-positive-proportional-m“, m = 1, and payoff-to-use = “play-with-one-rd-agent“. The mean dynamic for this setting in the 1-2 coordination game (Fig. 1) is:

$\dot{x} = x \, (1 - x) / 6$

where $x$ is the number of B-strategists.

Therefore, using the Euler method we compute the value $x_{k+1}$ from $x_k$ as follows:

$x_{k+1} = x_k + h\, x_k \, (1 - x_k) / 6$

In NetLogo, we can define a global variable x and we can update its value every tick as follows:

set x x + prob-revision * x * (1 - x) / 6

For simplicity, here we have discussed a model where the mean dynamic has one single independent variable, but the Euler method can be used (and implemented within NetLogo) with any number of variables.^[3]

4.3. The influence of population size

Figure 3 below shows some representative simulations of the parameterization considered in the previous section for different population sizes. The numerical solution of the mean dynamic $\left(\dot{x} = x \, (1 - x) / 6 \right)$ is shown as a black line in every plot. All these graphs have been created with NetLogo model nxn-games-in-well-mixed-populations-md.nlogo, which numerically solves the mean dynamic for any setting in the 1-2 coordination game, at runtime.

Figure 3. Representative simulation runs executed with NetLogo model nxn-games-in-well-mixed-populations-md.nlogo, for different population sizes $N$ . The black line shows the numerical solution of the mean dynamic, obtained using the Euler method. Settings are: decision-rule = “direct-positive-proportional-m“, m = 1, payoff-to-use = “play-with-one-rd-agent“, noise = 0 and prob-revision = 0.05. The game is the 1-2 coordination game. Initially, 70% of the agents are A-strategists (orange) and 30% are B-strategists (green)

As you can see in Figure 3, the greater the population size $N$ , the better the mean dynamic approximates the stochastic dynamics of our agent-based model. As $N$ grows, the stochastic fluctuations of the agent-based model (in strategy proportions) diminish, and simulations tend to get closer and closer to the –deterministic– solution of the mean dynamic.

5. Representative simulations together with their mean dynamics

In this section we explore the influence of parameters decision-rule and payoff-to-use in our model. Table 1 below shows representative simulations for different values of these parameters in the 1-2 coordination game, including the numerical solution of the corresponding mean dynamic as a black line. All these graphs can be replicated using NetLogo model nxn-games-in-well-mixed-populations-md.nlogo. The equation displayed under each plot is the corresponding mean dynamic (where $x$ denotes the fraction of B-strategists, which is shown in green in the plots).

decision-rule	payoff-to-use
decision-rule	play-with-one-rd-agent	use-strategy-expected-payoff
Imitate if better
Imitate if better	$\dot{x} = x \, (x-1)\, \left(x^2-3 x+1\right)$	$\dot{x} = x \, (1-x)\begin{cases} -1 & x < \frac{1}{3} \\ 0 & x = \frac13 \\ 1 & x > \frac13 \end{cases}$
Imitative pairwise-difference
Imitative pairwise-difference	$\dot{x} = x \, (4x - 3x^2 - 1) / 2$	$\dot{x} = x \, (4x - 3x^2 - 1) / 2$
Direct best
Direct best	$\dot{x} = x (1 - x) / 2$	$\dot{x} = \begin{cases} -x & x < \frac{1}{3} \\ \frac12 - x & x = \frac13 \\ 1 -x & x > \frac13 \end{cases}$
Direct pairwise-difference
Direct pairwise-difference	$\dot{x} = (1 - x)\, x^2$	$\dot{x} = \frac12 \, (3 x - 1) \begin{cases} x & x < \frac{1}{3} \\ 1 -x & x \ge \frac13 \end{cases}$
Direct positive proportional m = 1
Direct positive proportional m = 1	$\dot{x} = \frac{ x \, (1 - x) \, (2^m - 1) }{2 \, (2 ^ m + 1)}$	$\dot{x} = (1 - x) - \frac{(1 - x) ^ m}{(1 - x)^m + 2^m \, x ^ m}$
Direct positive proportional m = 10
Direct positive proportional m = 10	$\dot{x} = \frac{ x \, (1 - x) \, (2^m - 1) }{2 \, (2 ^ m + 1)}$	$\dot{x} = (1 - x) - \frac{(1 - x) ^ m}{(1 - x)^m + 2^m \, x ^ m}$

Table 1. Representative simulation runs executed with NetLogo model nxn-games-in-well-mixed-populations-md.nlogo, for different decision rules and different ways of computing payoffs. Initial conditions are [700 300], i.e. 700 A-strategists (in orange) and 300 B-strategist (in green), prob-revision = 0.05 and there is no noise. The equation of the mean dynamic (where $x$ denotes the fraction of B-strategists) is included under each plot. The black line shows the numerical solution of the mean dynamic, obtained using the Euler method

The results shown in Table 1 allow us to draw a number of observations. The most obvious one is that the mean dynamic provides a very good approximation for the dynamics of all parameterizations shown in the table. The population size in all simulations shown in Table 1 is $N=1000$ . The approximation would become even better for greater population sizes (and would degrade for lower population sizes).

Looking at Table 1, it is also clear that we can make the population go to one monomorphic state or the other simply by changing the value of parameter decision-rule and, in some cases, the value of payoff-to-use. These two parameters affect the dynamics in ways that do not seem easy to understand without the help of the mean dynamic.

Finally, in terms of mean dynamics, note that only one of the rows in Table 1 corresponds to the replicator dynamics (i.e., the row for decision-rule = “imitative-pairwise-difference“).^[4] The other decision rules in Table 1 have mean dynamics that are different from the replicator dynamics; and, in many cases, these other dynamics lead to the opposite end of the state space. For this reason, it seems clear that we should not make general statements about evolutionary dynamics if such statements are based only on the behavior of the replicator dynamics (or any other particular model, for that matter). There are many other evolutionary dynamics that lead to results completely different from those obtained with the replicator dynamics.

To be clear, in our view, all the decision rules in Table 1 could be defended as sensible models of evolution in the sense that, under all of them, agents preferentially switch to strategies that provide greater payoffs. Nonetheless, they lead to completely different dynamics. It is true that the replicator dynamics appears as the mean dynamic of many different agent-based models (see chapter V-1), but it is also true that there are a myriad of other sensible models of evolution with properties completely different from those of the replicator dynamics.

6. Details matter

In previous chapters of this book, we learned that evolutionary dynamics on grids (and, more generally, on networks) can be significantly affected by small, seemingly unimportant details. In this final chapter, we have seen that this sensitivity to apparently innocuous details can also be present in the simpler setting of well-mixed populations. In general, evolutionary game dynamics in well-mixed populations can also depend on factors that may seem insignificant at first sight. Because of this, if we had to summarize this book in one sentence, this sentence could well be: “Details matter”

This sensitivity to small details suggests that a simple general theory on evolutionary game dynamics cannot be derived. It also highlights the usefulness of the skills you have acquired with this book. We hope that you have learned new ways to explore and tame the complexity and the beauty of evolutionary game dynamics, using both computer simulation and mathematical analysis.

7. Exercises

Exercise 1. Derive the mean dynamic for a decision rule that you can call “imitate-the-best” using expected payoffs (Hofbauer, 1995a; Vega-Redondo, 1997). Under this rule, the revising agent copies the strategy of the individual with the greatest expected payoff. Resolve ties uniformly at random.

Exercise 2. Derive the mean dynamic for a decision rule that you can call “imitative-positive-proportional” assuming agents obtain their payoff by playing with one random agent, i.e., payoff-to-use = “play-with-one-rd-agent“. Under this decision rule, the revising agent selects one individual at random, with probabilities proportional to payoffs, and copies her strategy. Negative payoffs are not allowed when using this rule.

Exercise 3. Derive the mean dynamic for a decision rule that you can call “imitative-logit-m” using expected payoffs (Weibull, 1995, example 4.5). Under this decision rule, the revising agent selects one individual at random, with probabilities proportional to $e^{m \pi_i}$ (where $\pi_i$ denotes agent $i$ ‘s expected payoff), and copies her strategy.

Exercise 4. Derive the mean dynamic for a decision rule that you can call “Fermi-m“, assuming agents obtain their payoff by playing with one random agent, i.e., payoff-to-use = “play-with-one-rd-agent” (Woelfing and Traulsen, 2009). Under this rule, the revising agent $i$ looks at another individual $j$ at random and copies her strategy with probability $\frac{e^{m \pi_j}}{e^{m \pi_i}+e^{m \pi_j}}$ , where $\pi_z$ denotes agent $z$ ‘s payoff and $m\ge0$ .

Exercise 5. Derive the mean dynamic for a decision rule that you can call “direct-logit-m” using expected payoffs (Blume, 1997; Fudenberg and Levine, 1998; Hofbauer and Sandholm, 2007). Under this rule, the revising agent selects one strategy at random, with probabilities proportional to $e^{m \pi_i}$ (where $\pi_i$ denotes strategy $i$ ‘s expected payoff), and adopts it.

Exercise 6. Derive the mean dynamic for a decision rule where the revising agent looks at a random agent in the population and adopts her strategy.

This choice of time scale is convenient, but any other definition for one unit of clock time would also be fine. ↵
$[z]_+=\max(0,z)$ . ↵
Izquierdo et al. (2014) use the Euler method to numerically solve a mean dynamic with 35 independent variables within NetLogo. ↵
The replicator dynamics is also obtained if decision-rule = "imitative-linear-attraction" and if decision-rule = "imitative-linear-dissatisfaction". ↵

License

Icon for the Creative Commons Attribution 4.0 International License

Agent-Based Evolutionary Game Dynamics Copyright © 2024 by Luis R. Izquierdo, Segismundo S. Izquierdo & William H. Sandholm is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

1. Introduction

2. The mean dynamic

3. Derivation of the mean dynamic for different stochastic processes

3.1. Imitate if better

3.2. Imitative pairwise-difference

3.3. Imitative linear attraction

3.4. Imitative linear dissatisfaction

3.5. Direct best

3.6. Direct pairwise-difference

3.7. Direct positive proportional

4. Running an agent-based model and solving its mean dynamic at runtime

4.1. The Euler method to numerically solve ODEs

4.2. Solving an ODE numerically within NetLogo

4.3. The influence of population size

5. Representative simulations together with their mean dynamics

6. Details matter

7. Exercises

License

Share This Book