
To illustrate cell-based architecture, we will pretend that the world is on the brink of a zombie apocalypse, and we are responsible for figuring out how to keep an entire city from becoming zombies!
Our mission is to design a city that can keep its citizens safe from the zombie threat.
These are the same goals we have when designing large-scale distributed systems.

The simple approach to keeping citizens safe from zombies is to create one big isolated city with strong defenses (walls, guards, etc.) to keep zombies out.
But this design has flaws.
If critical services (like Hospitals, Grocery Stores, etc.) only exist in specific areas of the city, and those areas become compromised, the entire city could be at risk. One failure can cascade across the entire city.
A large protected city is just not zombie-resistant. We must prevent a single failure from taking down the whole thing and isolate those failures (like how cell-based architecture protects our platforms).

Rather than creating one big isolated city, we can create several isolated neighborhoods each with critical infrastructure and defenses.
Citizens will be distributed across these neighborhoods, ensuring that a single neighborhood isn’t overwhelmed, and reducing the impact of any potential zombie encounters.
In this design, citizens can live, work, and play within their neighborhood without needing to leave. But we need to ensure that everything citizens need (Groceries, Medical, etc.) is in this neighborhood to avoid travel.
Leaving the neighborhood is dangerous. Inside the neighborhood, everything is close by, readily available, and zombie-free.

If supplies for these critical services run out, citizens will be forced to leave the neighborhood to find what they need. Leaving is risky; we don’t want them to leave.
So, to avoid them leaving, we will have a group of trained professionals responsible for moving supplies to the neighborhoods.
Ensuring supplies are delivered before they run out is an essential element of this design.


As new citizens arrive, they need to be placed into neighborhoods, and we don’t want to overcrowd any single neighborhood.
So, like the supply routes, we’ll need to set up a special group of trained professionals who can safely distribute citizens.
This transportation group can monitor the neighborhood’s services; if any are down, it can transport citizens to another neighborhood.
Traveling via this unique group isn’t risk-free, but their specialization keeps the danger in check.
The core of our design is isolation and self-sufficiency.
Each neighborhood is isolated from the others and has everything it needs to operate independently.
By breaking down the city into smaller, manageable neighborhoods, we reduce the impact of any single failure.
Failures are measured in neighborhoods lost, not cities.

With the city’s success, more and more citizens will want to live in these safe neighborhoods, which leads to another benefit of this design.

As more capacity is needed, we have a working blueprint; we can create new ones, establish the supply and transportation routes, and direct new citizens there without disrupting existing infrastructure. Existing neighborhoods remain untouched. Only new supply and transportation routes are needed.
Our zombie-resistant city design works well, because:
While any fan of zombie movies knows that no design is foolproof, this approach maximizes safety, scalability, and resilience. The exact same qualities we want in mission-critical distributed systems.
Our city design followed a cell-based architecture. Each zombie-resistant neighborhood was a cell, and the schools, stores, and hospitals within a neighborhood were our microservices, databases, and caches.


The goal of the neighborhood was to provide everything citizens needed and avoid citizens leaving the boundaries.
The idea behind cells is that each cell can fully service requests within the cell, eliminating external calls from the cell.
Because, like our zombie example, crossing cell boundaries is the highest-risk operation. Firewalls, unreliable networks, and failing services can drop a request at any time.
Like our neighborhoods, cells need supplies.
However, a cell’s supply route doesn’t provide food or medical supplies; it supplies the data used during processing.

Data replication might occur between cells, from a central hub, or both. The goal is to ensure that neighborhoods and cells have what they need before they need it.
The transportation group is the infamous “cell router.”

A cell router is responsible for detecting failure within a cell and rerouting requests to other neighborhoods.
While leaving a cell is always dangerous, there are ways to mitigate those dangers (e.g., retries, circuit breakers, priority load balancing). The cell-router is where you place an extra emphasis on these techniques. Cells should be simple; cell-routers take on complexity.

Don’t build one big system with shared dependencies and single points of failure.
Build isolated cells that can fully serve requests on their own. This limits blast radius, keeps traffic local, and allows the system to scale safely.