While neural cellular automata (NCA) have proven effective for modeling morphogenesis and self-organizing processes, they are typically governed by a fixed, non-adaptive update rule shared across all cells. Each cell applies the same learned local transition function throughout its lifetime, resulting in static developmental dynamics once training is complete. We introduce Petri Dish Neural Cellular Automata (PD-NCA), a differentiable Artificial Life substrate that removes this constraint by allowing multiple, independent NCA agents to coexist, compete, and adapt within a shared environment. Unlike conventional NCA, each agent in PD-NCA continually updates its parameters via gradient descent during the simulation itself, enabling within-lifetime learning and open-ended behavioral change. This continual, multi-agent learning process transforms morphogenesis from a fixed developmental program into a dynamic ecosystem of interacting, adaptive entities. Through these interactions, PD-NCA exhibits emergent behaviors such as cyclic dynamics, cooperation, and persistent complexity growth, providing a promising new framework for studying open-endedness in differentiable systems.
Neural Cellular Automata (NCA)
ALife is a broad and multidisciplinary domain concerned with understanding the systems, computation, processes, evolution, and simulation of life. It spans various orders of magnitude in terms of simulation, from computational symbiogenesis
We propose Petri Dish Neural Cellular Automata (PD-NCA) as an ALife simulation, wherein multiple NCA agents compete with the singular goal of self-replication. PD-NCA differs substantially from the standard NCA setup: instead of a single, fixed model operating on a grid with immutable parameters, PD-NCA introduces a population of distinct, continuously learning NCA, each maintaining its own neural parameters and adapting through ongoing gradient-based optimization during the simulation. These agents share a common spatial substrate, a "petri dish", where they interact through competitive and cooperative dynamics mediated by differentiable attack and defense channels. In contrast to conventional NCA, where morphogenesis unfolds deterministically according to pre-trained rules, PD-NCA’s learning-in-the-loop design enables open-ended adaptation and emergent complexity within a single differentiable simulation
Our simulation operates on a discrete spatial grid , where and denote the spatial dimensions, and represents channel dimensionality. At any position and time , grid state is characterized by a feature vector:
where denotes attack channels, denotes defense channels, and denotes hidden state information, with . At each timestep, the simulation proceeds through four phases: processing, competition, normalization, and state update.
Each NCA agent is parameterized by a convolutional function that produces local state updates. For a given position , the agent observes neighborhood and generates update proposals:
where represents the Moore neighborhood of radius . An NCA's ability to propose updates is gated by an aliveness mask where indicates agent 's aliveness at position . Each NCA can only propose updates to cells where it is currently alive or cells adjacent to its living territory.
To ensure consistent competitive dynamics in regions with only NCA alive, we introduce a static environment tensor that acts as a constant background competitor. This tensor is initialized once at the beginning of the simulation with random noise and then normalized.
The environment participates in competition by contributing its own update proposal:
This ensures that agents must maintain active attack and defense even in territories they control, preventing stagnation and encouraging continuous adaptation.
The resolution of competing update proposals follows a strength-based arbitration mechanism. For each pair of entities (including the environment) proposing updates at position , we define the pairwise interaction strength:
where denotes cosine similarity. The total competitive strength for agent at position includes both agent-agent and agent-environment interactions.
The environment's competitive strength is similarly computed:
To enforce resource constraints within the environment, we apply a softmax normalization which transforms agent and environment interaction strengths into contribution weights:
where is a temperature parameter controlling competition sharpness. Contribution weights determine each proposal's relative contribution to the final state update.
The final grid state delta is the weighted aggregation of all proposals.
where the clipping operation ensures state boundedness. For our experiments, we use a clamp between . The aliveness distribution is updated to reflect the normalized competitive strengths:
where represents a minimum viability threshold. Aliveness from non-viable agents is redistributed among surviving agents.
In our experiments, we use an aliveness threshold of which allows up to 2 NCAs to survive in each cell. At first this seems counterintuitive, as one would expect only 1 NCA per cell. However, our initial experiments demonstrated that threshold values above resulted in uninteresting simulations, typically with individuals expanding until they meet, but remaining fixed thereafter. We were inspired by Mixture-of-Expert
The visualizations are set up for maximum clarity, but the state updates themselves happen in continuous (albeit clipped) space, meaning that each cell's state during the simulation is usually a composition or superposition of multiple competing updates. We visualize the winning NCA per cell in all videos, where each color represents an individual NCA and is selected using an argmax of the aliveness.
Each agent optimizes for territorial expansion by maximizing its total aliveness across the spatial domain. We formulate this as minimizing the negative log-aliveness:
The logarithmic transformation ensures stable gradient flow across multiple orders of magnitude of aliveness during backpropagation. This formulation naturally induces emergent behaviors where agents must balance offensive expansion strategies (optimizing attack channels) with defensive territory maintenance (optimizing defense channels) against both other agents and the persistent environmental pressure, leading to complex multi-agent dynamics reminiscent of biological competition systems.
We ran many experiments to explore PD-NCA, namely:
One of our first findings when exploring PD-NCA was that scaling the grid size and number of NCA consistently led to richer collective behavior. This suggests that an avenue of exploration must involve engineering PD-NCA to run on much larger grids, support more NCA, and potentially run on many GPUs simultaneously. In order to measure the notion of 'richness' or 'interesting behavior', we cannot only rely on subjective assessment, as this precludes scaling hyperparameter searches. To this end, we propose to measure the amount of information stored on the grid as a proxy for complexity. Since NCA model size is fixed, the simulation behavior complexity is fully explained by input (the grid) complexity and the learned parameters of the individual NCA.
We see in simulations with just 1 NCA, information collapses to 0 as the NCA pushes each channel towards uniformity. With multiple NCA competing on the same grid, information increases over time. While bits/channel would be maximized with fully random channels, there is an optimization force against such an outcome: an NCA yielding pseudo-random updates can be trivially overwritten by an NCA with a strong attack vector.
Results from grid sizes of 16 x 16 to 196 x 196
Here we show the difference in behavior as we scale from grids to . Qualitatively, we see that cycles only emerge at the largest grid size and the grid size also supports the survival of more NCAs.
While entropy has an initial spike because each NCA's first location is seeded with random noise, this smooths out over time so that the grid only retains information that is useful for NCA computation. As grid size increases, we see an increase in the amount of information stored on the grid.
The videos below explore whether learning has a notable impact on the PD-NCA simulation. Without learning, the system eventually settles into a steady state with only minor fluctuation. With learning, however, we often observe interesting cyclic behavior and progression through various 'states of interaction'. These demonstrations suggest that the number of NCA, grid size, and learning are necessary for the complex simulations that PD-NCA can yield.
Can interesting dynamics emerge without backpropagation or learning? Left simulation has no backprop; right simulation is a normal run with backpropagation.
The PD-NCA framework is essentially a new and fully differentiable ALife substrate. We wish to search the hyperparameter space of the substrate. The goal of this experiment was to search through hyperparameters for signs of open-endedness.
We searched the space of model configurations up to 15 layers, 128 channels (≈500K parameters). We ran up to a maximum of 15 NCA on grids up to 256 × 256 in size. We also included hyperparameters that control the ratio of steps with and without backpropagation as we found this to impact the simulation outcome. To guide this search we leveraged both a classical video compression score and neural score (i.e., the open-endedness score used in
We use video compressibility as a proxy for complexity to search for interesting behavior. Even though these are all high-scoring simulations, they explore different parts of the parameter space.
For our classical approach, we followed
Consider the videos above. The top left, bottom middle, and bottom right simulations all run for many steps without learning followed by few steps with learning. This is reflected in their behavior where they have stable 'rock-paper-scissor' dynamics before backpropagation changes the balance. The top middle simulation shows how stable cycles can emerge and compete, hinting towards symbiogenesis.
The other method we used for scoring simulations was the 'open-endedness' score from
Given a simulation trajectory of , we extract frames at regular intervals. Each frame is encoded through the VLM to produce a high-dimensional embedding:
where is the embedding dimensionality. For each timestep , we compute the novelty score as the maximum cosine similarity to all previously observed embeddings:
The overall open-endedness score for the simulation is then computed as the temporal average of novelty scores.
For an alternative approach, we use ASAL's open-endedness score. We believe this aligns better with the human notion of open-endedness.
This formulation rewards simulations that continuously explore new regions of the VLM's embedding space rather than converging to static patterns. In contrast to video compression metrics that capture fine-grained pixel-level changes, the ASAL score measures semantic diversity which biases towards larger structures that change over time. From the videos above, the bottom middle run shows a common attractor for the ASAL score with two NCAs forming large dynamic structures.
We see indications that PD-NCA as an ALife substrate can result in new types of dynamics that are amenable to hyperparameter search. In both the compression-based search and when using ASAL's open-endedness score, we see multiple instances of NCA forming larger structures and competing against one another. We also see simulations with oscillatory dynamics between 3 NCA. Even though our loss is purely competitive, this indicates that cooperation can emerge from a purely competitive environment, which we believe to be a profound insight. The two metrics also exhibit complementary biases: video compression tends to maintain higher NCA counts with rapid local changes while ASAL can yield dynamic systems with as few as 2 NCA. For future work, we expect that balancing both approaches might yield even more interesting behaviors.
Since our substrate is fully differentiable, this also means that we are able to use other neural network models' gradients to direct the learning of parts of the simulation. For instance, in addition to using the open-endedness score after simulation, we could use it during simulation in order to match a text-based prompt (e.g., "chemical waves").
As a proof-of-concept, we measured whether it was possible to use a VLM to this end. How well could a VLM identify whether our substrate was exhibiting complex or cyclical behavior? While this supervised method worked well for the original ASAL paper, where the search was for dynamics which appear like some sequence of targets, our more abstract targets did not lead to notable results. Even when the simulation was visually interesting, subjectively matching the prompts, the cosine similarity in the VLM's embedding space was near zero.
We suspect that this poor performance is owing to two reasons: (1) the VLM is not attuned to such abstract visual input and (2) color sensitivity. The PD-NCA simulation is effectively out of distribution, meaning that fine-tuning the VLM model might be necessary. Further, when visualizing, we map each NCA to a consistent, arbitrarily chosen color. Since colors matter for visual models, it might be necessary to do an inner search over color assignments to maximize the utility of the VLM.
Another challenge is related to temporally meaningful prompts, such as "cyclic behavior". One can only truly observe such behavior over time, but VLMs process static images. For future work we hope to explore whether a video model could provide a better learning signal for temporal concepts.
From our initial search for interesting dynamics, we often observed persistent cycles, where 2-3 NCA would compete against one another in larger groups, sometimes in a hierarchical, or nested, arrangement. Was it possible to explicitly encourage this cyclic behavior to get higher-order emergence of cycles of much longer lengths?
These dynamics bear striking resemblance to hypercycles from theoretical chemistry; self-replicating molecular systems where each species catalyzes the formation of another in a closed cycle, enabling survival against parasites and increasing complexity.
To explicitly encourage hypercycle formation of length , we modify the optimization objective to reward each NCA not only for its own territorial expansion but also for the success of NCA in the cycle. This cooperative reward is spatially localized, meaning NCA only benefits from agent 's aliveness in cells where NCA is alive or directly adjacent to.
Our hypercycle-augmented loss function becomes:
where weights the cooperative term and is if NCA is alive at or adjacent to it, and otherwise. This ensures NCA only receives cooperative reward for NCA 's presence within its local perceptual field.
From 3 to 6 NCA, we test whether we can encourage hypercycles of longer lengths.
Despite the modified loss function, we observe that full-length hypercycles rarely stabilize. Instead, we see -cycles collapse to shorter NCA cycles, parasitic behavior, and defection cascades. In our case on the far right of the figure, agent receives reward for 's expansion, yet aggressively invades 's territory. We also see a truncated -cycle formed between yellow and blue where blue defends against the parasitic green agent.
This suggests that stable hypercycles of length might require either additional hyperparameter search or mechanisms beyond localized cooperative rewards which we have not yet found.
We invite the community to explore PD-NCA, for which we provide an open-source implementation. We envision three main extensions to this work: integrating evolution, scaling up the simulations, and improving the complexity of the environment in which the PD-NCA are optimized.
Evolution. Our system is end-to-end differentiable, which allows us to use gradient descent to train our NCA. It is also possible to interlace evolution with learning. One way to approach this is to implement a conceptual 'cell division' mechanic, where we measure distinct NCA colonies of the same type, splitting these off such that each fragment becomes its own independent NCA (but with a shared lineage). Another approach is to interweave a neural network evolution strategy, such as NEAT
Increasing scale. Our current simulations run on a single GPU. This system is amenable to parallelization across a larger number of GPUs. While we have seen more interesting behavior emerge as we scale up the number of NCA on a 2D grid, this also scales the computational complexity. Scaling these simulations up requires an engineering effort, but we hope to see the complexity of PD-NCA increase as we scale.
Environment improvements. The current PD-NCA simulation runs on a homogeneous 2D grid. For future work we hope to explore richer environments by adding walls to create niches, bringing various compatible simulations together after some pre-training stage, or driving the background environment vector to implement daily, or even seasonal, cycles.
Cooperation. We are able to qualitatively identify that NCA are cooperating through shared space and coupled growth, but we do not yet have a way to measure such cooperation. Quantifying cooperation would enable driving the hyperparameter search towards cooperation. What is the underlying principle that allows a purely competitive PD-NCA simulation to transition into cooperation? And what does that tell us about life itself? What conditions allow for cooperation to arise and how can you ensure its stability? These are all questions we, or others, can unpack in future work.
Compression metrics. Is complexity directly related to the amount of information an NCA is able to access on the grid? So far we have used simple measurements of Shannon entropy, but it would be valuable to rethink and improve this strategy.
Pretraining. Standard NCA have an explicit training phase. While our experiments continually backprop throughout the entire simulation, it could be worth exploring the impact of an explicit pretraining phase. In our early experiments, where we had a more traditional train/test phase, we observed an overfitting-like situation: the simulation converged on a stable structure with only minor fluctuation.
Discretization and softmax temperature. When neural networks require discrete outputs, it is common to implement an annealing strategy.
One major strand of differentiable ALife focuses on morphogenetic processes using neural cellular automata (NCA) and related differentiable models.
The success of these models has spurred a “differentiable morphogenesis” thread in the ALife community, suggesting new ways to study development, and even raising hypotheses that techniques from deep learning might help uncover principles of biological self-organization.
A second important category of differentiable ALife involves multi-agent ecosystems, simulations with many interacting agents, in which some or all components are differentiable and learned. Traditional ALife worlds like
One conceptual blueprint in this area is the work of
Building on such ideas, researchers have implemented practical multi-agent ALife platforms that leverage differentiable models. A notable example is Biomaker CA by
Another line of work brings multi-agent reinforcement learning ideas into ALife.
In earlier ALife and evolutionary robotics, researchers often combined evolution with learning rules (like Hebbian plasticity) to enable lifetime adaptation. The genome would encode parameters of a local update rule (e.g., how synapses change with activity), rather than fixed weights, so that an individual agent can adjust its neural connections during its lifetime.
In contrast, our PD-NCA demonstrates a different approach: NCA continually learn via gradient descent during their lifetime. Here the entire simulation is made differentiable, enabling backpropagation of an objective through the agents’ interactions. Thus, instead of using fixed Hebbian updates, each agent’s internal parameters are continuously adjusted by online gradient-based optimization as the simulation runs. This means the agents receive direct feedback on their performance and can tune themselves in a goal-directed manner, blending developmental learning with real-time optimization.
Potential advantages of the PD-NCA approach are more rapid and targeted within-lifetime learning and a capacity for open-ended adaptation. Agents can swiftly adjust to new conditions or even engage in coevolutionary dynamics within one generation. Ultimately, combining evolutionary search with continuous online learning (via backpropagation) could yield ALife systems that are both highly adaptive in the short term and open-ended in the long term, going beyond the limits of classical evolving-but-static or Hebbian-plastic agents.
Summary We take the first step in seeking to understand whether end-to-end differentiable ALife systems can show signs of open-endedness. In our preliminary experiments, we see signs of cooperation, chemical waves, and the emergence of higher-order structure.
If you would like to discuss any issues or give feedback, please visit the GitHub repository of this page for more information.
For attribution in academic contexts, please cite this work as
Ivy Zhang and Sebastian Risi and Luke Darlow, "Petri Dish NCA", 2025.
BibTeX citation
@article{zhang2025pdnca,
title = {Petri Dish NCA},
author = {Ivy Zhang and Sebastian Risi and Luke Darlow},
year = {2025},
url = {https://pub.sakana.ai/pdnca}
}
We release our code for this project here.