This page requires Javascript. Please enable it to view the website.

Petri Dish Neural Cellular Automata

tl;dr
Petri Dish Neural Cellular Automata (PD-NCA) is a new ALife simulation substrate that replaces the fixed, non-adaptive morphogenesis of conventional NCA—where model parameters remain constant during development—with multi-agent open-ended growth, trained via continual backpropagation throughout the entire simulation.

Abstract

While neural cellular automata (NCA) have proven effective for modeling morphogenesis and self-organizing processes, they are typically governed by a fixed, non-adaptive update rule shared across all cells. Each cell applies the same learned local transition function throughout its lifetime, resulting in static developmental dynamics once training is complete. We introduce Petri Dish Neural Cellular Automata (PD-NCA), a differentiable Artificial Life substrate that removes this constraint by allowing multiple, independent NCA agents to coexist, compete, and adapt within a shared environment. Unlike conventional NCA, each agent in PD-NCA continually updates its parameters via gradient descent during the simulation itself, enabling within-lifetime learning and open-ended behavioral change. This continual, multi-agent learning process transforms morphogenesis from a fixed developmental program into a dynamic ecosystem of interacting, adaptive entities. Through these interactions, PD-NCA exhibits emergent behaviors such as cyclic dynamics, cooperation, and persistent complexity growth, providing a promising new framework for studying open-endedness in differentiable systems.


Introduction

Neural Cellular Automata (NCA) are a powerful tool for building and exploring the process of morphogenesis. They have been a topic of interest in the ALife community since their introduction. While they have demonstrated the ability to evolve decentralized capabilities, most NCA experiments are intentionally limited to a single agent operating on a grid, so as to explore the idea of "growing organisms" from scratch.

ALife is a broad and multidisciplinary domain concerned with understanding the systems, computation, processes, evolution, and simulation of life. It spans various orders of magnitude in terms of simulation, from computational symbiogenesis to the simulation and analysis of complex organisms in order to aid our understanding of the natural world, the challenges that impinge upon life, and the incredible mechanisms that emerge to overcome these challenges.

We propose Petri Dish Neural Cellular Automata (PD-NCA) as an ALife simulation, wherein multiple NCA agents compete with the singular goal of self-replication. PD-NCA differs substantially from the standard NCA setup: instead of a single, fixed model operating on a grid with immutable parameters, PD-NCA introduces a population of distinct, continuously learning NCA, each maintaining its own neural parameters and adapting through ongoing gradient-based optimization during the simulation. These agents share a common spatial substrate, a "petri dish", where they interact through competitive and cooperative dynamics mediated by differentiable attack and defense channels. In contrast to conventional NCA, where morphogenesis unfolds deterministically according to pre-trained rules, PD-NCA’s learning-in-the-loop design enables open-ended adaptation and emergent complexity within a single differentiable simulation.


Methods

Overview: Petri Dish NCA continuously compete for space in a constrained 2D grid. Each different color represents an individual model. The simulation proceeds through 4 distinct phases: (1) processing, (2) competition, (3) normalization, and (4) a state update.

Our simulation operates on a discrete spatial grid GRW×H×C\mathcal{G} \in \mathbb{R}^{W \times H \times C}, where WW and HH denote the spatial dimensions, and CC represents channel dimensionality. At any position (x,y)(x, y) and time tt, grid state is characterized by a feature vector:

sx,yt=[ax,yt,dx,yt,hx,yt]RC\mathbf{s}_{x, y}^t=\left[\mathbf{a}_{x, y}^t, \mathbf{d}_{x, y}^t, \mathbf{h}_{x, y}^t\right] \in \mathbb{R}^C

where atRCa\mathbf{a}^t \in \mathbb{R}^{C_a} denotes attack channels, dtRCd\mathbf{d}^t \in \mathbb{R}^{C_d} denotes defense channels, and htRCh\mathbf{h}^t \in \mathbb{R}^{C_h} denotes hidden state information, with C=Ca+Cd+ChC=C_a+C_d+C_h. At each timestep, the simulation proceeds through four phases: processing, competition, normalization, and state update.

Processing

Each NCA agent i{1,,N}i \in\{1, \ldots, N\} is parameterized by a convolutional function fθif_{\theta_i} that produces local state updates. For a given position (x,y)(x, y), the agent observes neighborhood Nx,y\mathcal{N}_{x, y} and generates update proposals:

Δsx,yt,i=fθi(Nx,y)RC\Delta \mathbf{s}_{x, y}^{t, i}=f_{\theta_i}\left(\mathcal{N}_{x, y}\right) \in \mathbb{R}^C

where Nx,y\mathcal{N}_{x, y} represents the Moore neighborhood of radius rr. An NCA's ability to propose updates is gated by an aliveness mask AtRW×H×N\mathbf{A}^t \in \mathbb{R}^{W \times H \times N} where Ait(x,y)A_i^t(x, y) indicates agent ii's aliveness at position (x,y)(x, y). Each NCA can only propose updates to cells where it is currently alive or cells adjacent to its living territory.

Background environment

To ensure consistent competitive dynamics in regions with only 11 NCA alive, we introduce a static environment tensor ERW×H×C\mathbf{E} \in \mathbb{R}^{W \times H \times C} that acts as a constant background competitor. This tensor is initialized once at the beginning of the simulation with random noise and then normalized.

E~x,y=[e~x,ya,e~x,yd,e~x,yh], where e~x,y{a,d,h}U(1,1),\tilde{\mathbf{E}}_{x, y}=\left[\tilde{\mathbf{e}}_{x, y}^a, \tilde{\mathbf{e}}_{x, y}^d, \tilde{\mathbf{e}}_{x, y}^h\right], \quad \text { where } \tilde{\mathbf{e}}_{x, y}^{\{a, d, h\}} \sim \mathcal{U}(-1,1),

Ex,y=E~x,yE~x,y.\mathbf{E}_{x, y}=\frac{\tilde{\mathbf{E}}_{x, y}}{\left\|\tilde{\mathbf{E}}_{x, y}\right\|}.

The environment participates in competition by contributing its own update proposal:

Δsx,yt, env =Ex,y\Delta \mathbf{s}_{x, y}^{t, \text { env }}=\mathbf{E}_{x, y}

This ensures that agents must maintain active attack and defense even in territories they control, preventing stagnation and encouraging continuous adaptation.

Competition

The resolution of competing update proposals follows a strength-based arbitration mechanism. For each pair of entities (i,j)(i, j) (including the environment) proposing updates at position (x,y)(x, y), we define the pairwise interaction strength:

ϕij(x,y)=ax,yt,i,dx,yt,jdx,yt,i,ax,yt,j\phi_{i j}(x, y)=\left\langle\mathbf{a}_{x, y}^{t, i}, \mathbf{d}_{x, y}^{t, j}\right\rangle-\left\langle\mathbf{d}_{x, y}^{t, i}, \mathbf{a}_{x, y}^{t, j}\right\rangle

where ,\langle\cdot, \cdot\rangle denotes cosine similarity. The total competitive strength for agent ii at position (x,y)(x, y) includes both agent-agent and agent-environment interactions.

Ψi(x,y)=jiϕij(x,y)+ϕi,env(x,y)\Psi_i(x, y)=\sum_{j \neq i} \phi_{i j}(x, y)+\phi_{i, \mathrm{env}}(x, y)

The environment's competitive strength is similarly computed:

Ψenv(x,y)=j=1Nϕenv,j(x,y)\Psi_{\mathrm{env}}(x, y)=\sum_{j=1}^N \phi_{\mathrm{env}, j}(x, y)

Normalization

To enforce resource constraints within the environment, we apply a softmax normalization which transforms agent and environment interaction strengths into contribution weights:

wi(x,y)=exp(Ψi(x,y)/τ)exp(Ψenv (x,y)/τ)+k=1Nexp(Ψk(x,y)/τ)w_i(x, y)=\frac{\exp \left(\Psi_i(x, y) / \tau\right)}{\exp \left(\Psi_{\text {env }}(x, y) / \tau\right)+\sum_{k=1}^N \exp \left(\Psi_k(x, y) / \tau\right)}

where τ\tau is a temperature parameter controlling competition sharpness. Contribution weights determine each proposal's relative contribution to the final state update.

State update

The final grid state delta is the weighted aggregation of all proposals.

sx,yt+1=clip(sx,yt+wenv (x,y)Δsx,yt, env +i=1Nwi(x,y)Δsx,yt,i)\mathbf{s}_{x, y}^{t+1}=\operatorname{clip}\left(\mathbf{s}_{x, y}^t+w_{\text {env }}(x, y) \cdot \Delta \mathbf{s}_{x, y}^{t, \text { env }}+\sum_{i=1}^N w_i(x, y) \cdot \Delta \mathbf{s}_{x, y}^{t, i}\right)

where the clipping operation ensures state boundedness. For our experiments, we use a clamp between [1,1][-1, 1]. The aliveness distribution is updated to reflect the normalized competitive strengths:

Ait+1(x,y)={wi(x,y)if wi(x,y)>α0otherwise,A^{t+1}_i(x,y) = \begin{cases} w_i(x,y) & \text{if } w_i(x,y) > \alpha \\ 0 & \text{otherwise} \end{cases},

where α\alpha represents a minimum viability threshold. Aliveness from non-viable agents is redistributed among surviving agents.

A note on the state and our visualizations

In our experiments, we use an aliveness threshold of 0.40.4 which allows up to 2 NCAs to survive in each cell. At first this seems counterintuitive, as one would expect only 1 NCA per cell. However, our initial experiments demonstrated that threshold values above 0.50.5 resulted in uninteresting simulations, typically with individuals expanding until they meet, but remaining fixed thereafter. We were inspired by Mixture-of-Expert models, which commonly pick the Top-2 experts within a router to ensure some minimal amount of exploration. A threshold value of 0.40.4 allows for both competitive dynamics between NCA, while also enabling singular occupancy in a given cell.

The visualizations are set up for maximum clarity, but the state updates themselves happen in continuous (albeit clipped) space, meaning that each cell's state during the simulation is usually a composition or superposition of multiple competing updates. We visualize the winning NCA per cell in all videos, where each color represents an individual NCA and is selected using an argmax of the aliveness.

Optimization objective

Each agent ii optimizes for territorial expansion by maximizing its total aliveness across the spatial domain. We formulate this as minimizing the negative log-aliveness:

Li=log(x,yAi(x,y))\mathcal{L}_i=-\log \left(\sum_{x, y} A_i(x, y)\right)

The logarithmic transformation ensures stable gradient flow across multiple orders of magnitude of aliveness during backpropagation. This formulation naturally induces emergent behaviors where agents must balance offensive expansion strategies (optimizing attack channels) with defensive territory maintenance (optimizing defense channels) against both other agents and the persistent environmental pressure, leading to complex multi-agent dynamics reminiscent of biological competition systems.


Experiments

We ran many experiments to explore PD-NCA, namely:

  1. Analyzing the complex dynamics of PD-NCA over time by measuring information.
  2. Inspecting the impact of learning to understand its role in ALife simulation.
  3. Hyperparameter searches for noteworthy simulations, using video compression and the open-endedness score from the ASAL framework.
  4. Exploration of connection to hypercycles.

Dynamics

One of our first findings when exploring PD-NCA was that scaling the grid size and number of NCA consistently led to richer collective behavior. This suggests that an avenue of exploration must involve engineering PD-NCA to run on much larger grids, support more NCA, and potentially run on many GPUs simultaneously. In order to measure the notion of 'richness' or 'interesting behavior', we cannot only rely on subjective assessment, as this precludes scaling hyperparameter searches. To this end, we propose to measure the amount of information stored on the grid as a proxy for complexity. Since NCA model size is fixed, the simulation behavior complexity is fully explained by input (the grid) complexity and the learned parameters of the individual NCA.

Single NCA simulation Single NCA simulation
Left: A simulation with a single, non-competing NCA collapses, storing no information on the grid.
Right: Competition pushes multiple NCA to store more information on the grid over time.
Top: A simulation with a single, non-competing NCA collapses, storing no information on the grid.
Bottom: Competition pushes multiple NCA to store more information on the grid over time.

We see in simulations with just 1 NCA, information collapses to 0 as the NCA pushes each channel towards uniformity. With multiple NCA competing on the same grid, information increases over time. While bits/channel would be maximized with fully random channels, there is an optimization force against such an outcome: an NCA yielding pseudo-random updates can be trivially overwritten by an NCA with a strong attack vector.

Grid size impact

Results from grid sizes of 16 x 16 to 196 x 196

Here we show the difference in behavior as we scale from 16×1616 \times 16 grids to 196×196196 \times 196. Qualitatively, we see that cycles only emerge at the largest grid size and the grid size also supports the survival of more NCAs.

Grid size impact on entropy
As size scales up, we see greater amounts of information being stored on the grid.

While entropy has an initial spike because each NCA's first location is seeded with random noise, this smooths out over time so that the grid only retains information that is useful for NCA computation. As grid size increases, we see an increase in the amount of information stored on the grid.

The impact of learning

The videos below explore whether learning has a notable impact on the PD-NCA simulation. Without learning, the system eventually settles into a steady state with only minor fluctuation. With learning, however, we often observe interesting cyclic behavior and progression through various 'states of interaction'. These demonstrations suggest that the number of NCA, grid size, and learning are necessary for the complex simulations that PD-NCA can yield.

Does Backprop Matter?

Can interesting dynamics emerge without backpropagation or learning? Left simulation has no backprop; right simulation is a normal run with backpropagation.

Hyperparameter search

The PD-NCA framework is essentially a new and fully differentiable ALife substrate. We wish to search the hyperparameter space of the substrate. The goal of this experiment was to search through hyperparameters for signs of open-endedness.

We searched the space of model configurations up to 15 layers, 128 channels (≈500K parameters). We ran up to a maximum of 15 NCA on grids up to 256 × 256 in size. We also included hyperparameters that control the ratio of steps with and without backpropagation as we found this to impact the simulation outcome. To guide this search we leveraged both a classical video compression score and neural score (i.e., the open-endedness score used in ASAL).

Compression Score Search

We use video compressibility as a proxy for complexity to search for interesting behavior. Even though these are all high-scoring simulations, they explore different parts of the parameter space.

For our classical approach, we followed Aaronson et al. (2014) in using compression of a blurred video as our search metric. Rather than taking the compression ratio for the raw video, we used average pooling to prevent rewarding overly noisy visuals. For each set of parameters, we ran 20,000 epochs before computing the video compression ratio as a performance metric. An epoch in our simulation is some number of steps without gradients, steps with gradients, and then running optimization on the NCA. The video frames are the grid state after each optimization. Intuitively, the less compressible the video, the better-performing the parameters. By this metric, the best run would be maximally noisy. However, we observed that the optimization process pushes towards order, which means that measuring compression is a 'good enough' approximation for complexity.

Consider the videos above. The top left, bottom middle, and bottom right simulations all run for many steps without learning followed by few steps with learning. This is reflected in their behavior where they have stable 'rock-paper-scissor' dynamics before backpropagation changes the balance. The top middle simulation shows how stable cycles can emerge and compete, hinting towards symbiogenesis.

Open-endedness score

The other method we used for scoring simulations was the 'open-endedness' score from ASAL. This metric evaluates how consistently a system generates novel states as measured through a vision-language model (VLM)'s embedding space.

Given a simulation trajectory of T=20,000 epochs T=20,000 \text { epochs }, we extract frames {Ft}t=1T\left\{\mathbf{F}_t\right\}_{t=1}^T at regular intervals. Each frame is encoded through the VLM FtRW×H×C\mathbf{F}_t \in \mathbb{R}^{W \times H \times C} to produce a high-dimensional embedding:

zt=ΦVLM(Ft)RD,\mathbf{z}_t=\Phi_{\mathrm{VLM}}\left(\mathbf{F}_t\right) \in \mathbb{R}^D,

where DD is the embedding dimensionality. For each timestep tt, we compute the novelty score νt\nu_t as the maximum cosine similarity to all previously observed embeddings:

νt=maxi{1,,t1}zt,zi\nu_t=\max _{i \in\{1, \ldots, t-1\}}\left\langle\mathbf{z}_t, \mathbf{z}_i\right\rangle

The overall open-endedness score O\mathcal{O} for the simulation is then computed as the temporal average of novelty scores.

O=1Tt=1Tνt\mathcal{O}=\frac{1}{T} \sum_{t=1}^T \nu_t

Open-endedness Score Search

For an alternative approach, we use ASAL's open-endedness score. We believe this aligns better with the human notion of open-endedness.

This formulation rewards simulations that continuously explore new regions of the VLM's embedding space rather than converging to static patterns. In contrast to video compression metrics that capture fine-grained pixel-level changes, the ASAL score measures semantic diversity which biases towards larger structures that change over time. From the videos above, the bottom middle run shows a common attractor for the ASAL score with two NCAs forming large dynamic structures.

We see indications that PD-NCA as an ALife substrate can result in new types of dynamics that are amenable to hyperparameter search. In both the compression-based search and when using ASAL's open-endedness score, we see multiple instances of NCA forming larger structures and competing against one another. We also see simulations with oscillatory dynamics between 3 NCA. Even though our loss is purely competitive, this indicates that cooperation can emerge from a purely competitive environment, which we believe to be a profound insight. The two metrics also exhibit complementary biases: video compression tends to maintain higher NCA counts with rapid local changes while ASAL can yield dynamic systems with as few as 2 NCA. For future work, we expect that balancing both approaches might yield even more interesting behaviors.

Measuring ASAL supervised alignment

Result: Here we show alignment with the prompt "chemical waves". The peak alignment score occurs early, at a point that is not visually optimal. Later points in the simulation, which appear visually more aligned with the prompt, receive a lower score.

Since our substrate is fully differentiable, this also means that we are able to use other neural network models' gradients to direct the learning of parts of the simulation. For instance, in addition to using the open-endedness score after simulation, we could use it during simulation in order to match a text-based prompt (e.g., "chemical waves").

As a proof-of-concept, we measured whether it was possible to use a VLM to this end. How well could a VLM identify whether our substrate was exhibiting complex or cyclical behavior? While this supervised method worked well for the original ASAL paper, where the search was for dynamics which appear like some sequence of targets, our more abstract targets did not lead to notable results. Even when the simulation was visually interesting, subjectively matching the prompts, the cosine similarity in the VLM's embedding space was near zero.

We suspect that this poor performance is owing to two reasons: (1) the VLM is not attuned to such abstract visual input and (2) color sensitivity. The PD-NCA simulation is effectively out of distribution, meaning that fine-tuning the VLM model might be necessary. Further, when visualizing, we map each NCA to a consistent, arbitrarily chosen color. Since colors matter for visual models, it might be necessary to do an inner search over color assignments to maximize the utility of the VLM.

Another challenge is related to temporally meaningful prompts, such as "cyclic behavior". One can only truly observe such behavior over time, but VLMs process static images. For future work we hope to explore whether a video model could provide a better learning signal for temporal concepts.

Hypercycles

From our initial search for interesting dynamics, we often observed persistent cycles, where 2-3 NCA would compete against one another in larger groups, sometimes in a hierarchical, or nested, arrangement. Was it possible to explicitly encourage this cyclic behavior to get higher-order emergence of cycles of much longer lengths?

These dynamics bear striking resemblance to hypercycles from theoretical chemistry; self-replicating molecular systems where each species catalyzes the formation of another in a closed cycle, enabling survival against parasites and increasing complexity. While our NCAs are continuously learning rather than strictly self-replicating, they exhibit analogous cyclical interdependencies.

To explicitly encourage hypercycle formation of length NN, we modify the optimization objective to reward each NCA ii not only for its own territorial expansion but also for the success of NCA (i+1)modN(i+1) \bmod N in the cycle. This cooperative reward is spatially localized, meaning NCA ii only benefits from agent (i+1)(i+1)'s aliveness in cells where NCA ii is alive or directly adjacent to.

Modified Loss

Our hypercycle-augmented loss function becomes:

Li=log(x,yAi(x,y)+γx,yVi(x,y)A(i+1)modN(x,y))\mathcal{L}_i=-\log \left(\sum_{x, y} A_i(x, y)+\gamma \cdot \sum_{x, y} V_i(x, y) \cdot A_{(i+1) \bmod N}(x, y)\right)

where γ\gamma weights the cooperative term and Vi(x,y)V_i(x,y) is 11 if NCA ii is alive at (x,y)(x,y) or adjacent to it, and 00 otherwise. This ensures NCA ii only receives cooperative reward for NCA (i+1)(i+1)'s presence within its local perceptual field.

Hypercycle loss

From 3 to 6 NCA, we test whether we can encourage hypercycles of longer lengths.

Despite the modified loss function, we observe that full-length hypercycles rarely stabilize. Instead, we see NN-cycles collapse to shorter 232\text{--}3 NCA cycles, parasitic behavior, and defection cascades. In our N=6N = 6 case on the far right of the figure, agent iyellowi_{\text{yellow}} receives reward for igreen i_{\text {green }}'s expansion, yet igreen i_{\text {green }} aggressively invades iyellow i_{\text {yellow }}'s territory. We also see a truncated 22-cycle formed between yellow and blue where blue defends against the parasitic green agent.

This suggests that stable hypercycles of length N>3N \gt 3 might require either additional hyperparameter search or mechanisms beyond localized cooperative rewards which we have not yet found.


Main Extensions

We invite the community to explore PD-NCA, for which we provide an open-source implementation. We envision three main extensions to this work: integrating evolution, scaling up the simulations, and improving the complexity of the environment in which the PD-NCA are optimized.

Evolution. Our system is end-to-end differentiable, which allows us to use gradient descent to train our NCA. It is also possible to interlace evolution with learning. One way to approach this is to implement a conceptual 'cell division' mechanic, where we measure distinct NCA colonies of the same type, splitting these off such that each fragment becomes its own independent NCA (but with a shared lineage). Another approach is to interweave a neural network evolution strategy, such as NEAT, with gradient-based learning. Doing so would effectively lift the constraint on the number of alive NCA at any given time, thus enabling both gradient-based adaptation and evolutionary dynamics to work in tandem. Such a hybrid approach is designed to leverage the complementary strengths of gradient descent’s efficiency when training CNNs, and evolution’s capacity for open-ended exploration. To do so would require improving the scalability of PD-NCA.

Increasing scale. Our current simulations run on a single GPU. This system is amenable to parallelization across a larger number of GPUs. While we have seen more interesting behavior emerge as we scale up the number of NCA on a 2D grid, this also scales the computational complexity. Scaling these simulations up requires an engineering effort, but we hope to see the complexity of PD-NCA increase as we scale.

Environment improvements. The current PD-NCA simulation runs on a homogeneous 2D grid. For future work we hope to explore richer environments by adding walls to create niches, bringing various compatible simulations together after some pre-training stage, or driving the background environment vector to implement daily, or even seasonal, cycles.

Minor Extensions

Cooperation. We are able to qualitatively identify that NCA are cooperating through shared space and coupled growth, but we do not yet have a way to measure such cooperation. Quantifying cooperation would enable driving the hyperparameter search towards cooperation. What is the underlying principle that allows a purely competitive PD-NCA simulation to transition into cooperation? And what does that tell us about life itself? What conditions allow for cooperation to arise and how can you ensure its stability? These are all questions we, or others, can unpack in future work.

Compression metrics. Is complexity directly related to the amount of information an NCA is able to access on the grid? So far we have used simple measurements of Shannon entropy, but it would be valuable to rethink and improve this strategy.

Pretraining. Standard NCA have an explicit training phase. While our experiments continually backprop throughout the entire simulation, it could be worth exploring the impact of an explicit pretraining phase. In our early experiments, where we had a more traditional train/test phase, we observed an overfitting-like situation: the simulation converged on a stable structure with only minor fluctuation.

Discretization and softmax temperature. When neural networks require discrete outputs, it is common to implement an annealing strategy. Our current simulation includes a threshold system and a standard softmax with a temperature set to 1. We aim to explore these factors, particularly the role of annealed discretization or low-temperature softmax, and how they impact learning dynamics and emergent complexity.


Related Works

Differentiable Artificial Life

One major strand of differentiable ALife focuses on morphogenetic processes using neural cellular automata (NCA) and related differentiable models. Mordvintsev et al. (2020) pioneered this approach by training an NCA via backpropagation to “grow” a target 2D pattern from a single seed cell. Following this foundational work, researchers have extended differentiable morphogenetic systems in several directions. For example, Sudhakaran et al. (2021) introduced a 3D differentiable cellular automata model by using 3D convolutional neural updates, enabling NCA to construct 3D Minecraft structures.

The success of these models has spurred a “differentiable morphogenesis” thread in the ALife community, suggesting new ways to study development, and even raising hypotheses that techniques from deep learning might help uncover principles of biological self-organization.

Multi-Agent Differentiable ALife Systems

A second important category of differentiable ALife involves multi-agent ecosystems, simulations with many interacting agents, in which some or all components are differentiable and learned. Traditional ALife worlds like Polyworld or Avida demonstrated that rich behaviors (predation, cooperation, etc.) can emerge in populations of evolving agents, but those systems relied on evolutionary algorithms or fixed rules. In contrast, differentiable multi-agent ALife systems incorporate gradient-based learning or differentiable models to drive adaptation within or between agents, often aiming for open-ended complexity and co-evolution in a continuous state space.

One conceptual blueprint in this area is the work of Gregor and Besse (2021) on self-organizing intelligent matter. They proposed an artificial life framework with no explicit agent boundary at the start: the world consists of many atomic elements (like particles or cells), each governed by neural network rules, and all interactions are local and physics-like. Through an evolutionary process, these distributed elements can self-assemble into persistent multi-element organisms that compete and evolve. While largely a theoretical roadmap, this work set the stage for thinking about differentiable ALife as matter that organizes itself under selection, combining neural computation with evolutionary dynamics at the ecosystem level.

Building on such ideas, researchers have implemented practical multi-agent ALife platforms that leverage differentiable models. A notable example is Biomaker CA by Randazzo and Mordvintsev (2023), which is a differentiable biome simulator inspired by plant ecosystems. Biomaker CA uses evolutionary algorithms to drive open-ended adaptation, but it is implemented in a fully differentiable manner in JAX to allow efficient simulation and potential gradient-based fine-tuning. A key insight from their work is the importance of morphogenesis for evolving complexity; unlike many earlier digital ecosystems, where agents’ forms were simple or fixed, Biomaker CA makes developmental plasticity a first-class feature. Indeed, they hypothesize that incorporating a rich developmental process is essential for achieving unbounded evolutionary innovation in ALife, a hypothesis their platform is designed to test.

Another line of work brings multi-agent reinforcement learning ideas into ALife. JaxLife is an example of a large-scale open-ended agent simulation implemented in a differentiable framework. In JaxLife, a population of embodied agents (each controlled by a deep neural network policy) evolves under natural selection in an environment that includes programmable tools and resources. The simulation is written end-to-end in JAX, allowing it to run on accelerators and, in principle, to propagate gradients, though evolution rather than backpropagation is used for the outer loop of adaptation.

Evolution and Learning

In earlier ALife and evolutionary robotics, researchers often combined evolution with learning rules (like Hebbian plasticity) to enable lifetime adaptation. The genome would encode parameters of a local update rule (e.g., how synapses change with activity), rather than fixed weights, so that an individual agent can adjust its neural connections during its lifetime. These approaches show that evolving plasticity rules can confer adaptability: agents can learn from experience within one generation, guided by simple local rules. However, the lifetime learning in these cases is typically constrained to those pre-evolved Hebbian rules; the agent’s adaptation is local and unsupervised, without an explicit error feedback to drive learning in a particular direction.

In contrast, our PD-NCA demonstrates a different approach: NCA continually learn via gradient descent during their lifetime. Here the entire simulation is made differentiable, enabling backpropagation of an objective through the agents’ interactions. Thus, instead of using fixed Hebbian updates, each agent’s internal parameters are continuously adjusted by online gradient-based optimization as the simulation runs. This means the agents receive direct feedback on their performance and can tune themselves in a goal-directed manner, blending developmental learning with real-time optimization.

Potential advantages of the PD-NCA approach are more rapid and targeted within-lifetime learning and a capacity for open-ended adaptation. Agents can swiftly adjust to new conditions or even engage in coevolutionary dynamics within one generation. Ultimately, combining evolutionary search with continuous online learning (via backpropagation) could yield ALife systems that are both highly adaptive in the short term and open-ended in the long term, going beyond the limits of classical evolving-but-static or Hebbian-plastic agents.


Conclusion

Summary We take the first step in seeking to understand whether end-to-end differentiable ALife systems can show signs of open-endedness. In our preliminary experiments, we see signs of cooperation, chemical waves, and the emergence of higher-order structure.

If you would like to discuss any issues or give feedback, please visit the GitHub repository of this page for more information.

Vision Icon by artist Kylie Whittaker on Noun Project.  Microorganism icon by artist Kylie Whittaker.

Citation

For attribution in academic contexts, please cite this work as

Ivy Zhang and Sebastian Risi and Luke Darlow, "Petri Dish NCA", 2025.

BibTeX citation

@article{zhang2025pdnca,
  title = {Petri Dish NCA},
  author = {Ivy Zhang and Sebastian Risi and Luke Darlow},
  year = {2025},
  url = {https://pub.sakana.ai/pdnca}
}

Open Source Code

We release our code for this project here.