Cell-based Architecture Explained with Zombies

What do cell-based architectures and zombie apocalypses have in common? Everything.

by Benjamin Cane

The Scenario

A City Under Siege by Zombies

City with Zombies on the outside

To illustrate cell-based architecture, we will pretend that the world is on the brink of a zombie apocalypse, and we are responsible for figuring out how to keep an entire city from becoming zombies!

Our mission is to design a city that can keep its citizens safe from the zombie threat.

Our Goals

  • Reduce potential exposure to zombies.
  • Limit the impact when zombie encounters occur.
  • Avoid everyday citizens dealing with zombies directly.
  • Create an approach that can scale as more citizens arrive.
  • Ensure the city is prosperous and operates smoothly.

These are the same goals we have when designing large-scale distributed systems.

One Big City

The Problem with a Single Isolated City

City with Zombies on the inside...

The simple approach to keeping citizens safe from zombies is to create one big isolated city with strong defenses (walls, guards, etc.) to keep zombies out.

But this design has flaws.

If critical services (like Hospitals, Grocery Stores, etc.) only exist in specific areas of the city, and those areas become compromised, the entire city could be at risk. One failure can cascade across the entire city.

A large protected city is just not zombie-resistant. We must prevent a single failure from taking down the whole thing and isolate those failures (like how cell-based architecture protects our platforms).

The Solution

Isolated Neighborhoods

Isolated Neighborhoods

Rather than creating one big isolated city, we can create several isolated neighborhoods each with critical infrastructure and defenses.

Citizens will be distributed across these neighborhoods, ensuring that a single neighborhood isn’t overwhelmed, and reducing the impact of any potential zombie encounters.

In this design, citizens can live, work, and play within their neighborhood without needing to leave. But we need to ensure that everything citizens need (Groceries, Medical, etc.) is in this neighborhood to avoid travel.

Leaving the neighborhood is dangerous. Inside the neighborhood, everything is close by, readily available, and zombie-free.

Supply Routes

Keeping Neighborhoods Stocked

Supply Routes

If supplies for these critical services run out, citizens will be forced to leave the neighborhood to find what they need. Leaving is risky; we don’t want them to leave.

So, to avoid them leaving, we will have a group of trained professionals responsible for moving supplies to the neighborhoods.

Ensuring supplies are delivered before they run out is an essential element of this design.

Safe and Resilient Routing

Controlled Travel

Safe and Resilient Routing (Cont.)

Controlled Travel

As new citizens arrive, they need to be placed into neighborhoods, and we don’t want to overcrowd any single neighborhood.

So, like the supply routes, we’ll need to set up a special group of trained professionals who can safely distribute citizens.

This transportation group can monitor the neighborhood’s services; if any are down, it can transport citizens to another neighborhood.

Traveling via this unique group isn’t risk-free, but their specialization keeps the danger in check.

Resilience Through Isolation

The core of our design is isolation and self-sufficiency. Each neighborhood is isolated from the others and has everything it needs to operate independently. By breaking down the city into smaller, manageable neighborhoods, we reduce the impact of any single failure. Failures are measured in neighborhoods lost, not cities. Failure Isolation

Scaling Our City

With the city’s success, more and more citizens will want to live in these safe neighborhoods, which leads to another benefit of this design. Scaling Our City

As more capacity is needed, we have a working blueprint; we can create new ones, establish the supply and transportation routes, and direct new citizens there without disrupting existing infrastructure. Existing neighborhoods remain untouched. Only new supply and transportation routes are needed.

Why Does This Design Work?

Our zombie-resistant city design works well, because:

  • When failures occur (and they will), they are contained within a single neighborhood, preventing widespread disruption.
  • Each neighborhood is self-sufficient, reducing the need for citizens to leave.
  • The design can easily scale by adding new neighborhoods as needed.
  • Supply routes ensure neighborhoods remain stocked with essentials.
  • Transportation routes allow safe travel between neighborhoods when necessary.

While any fan of zombie movies knows that no design is foolproof, this approach maximizes safety, scalability, and resilience. The exact same qualities we want in mission-critical distributed systems.

Act 2:

Cell-based Architecture Explained

From Zombies to Distributed Systems

Our city design followed a cell-based architecture. Each zombie-resistant neighborhood was a cell, and the schools, stores, and hospitals within a neighborhood were our microservices, databases, and caches. Cell-based Architecture

Keep Traffic Local

Keep Traffic Local

The goal of the neighborhood was to provide everything citizens needed and avoid citizens leaving the boundaries.

The idea behind cells is that each cell can fully service requests within the cell, eliminating external calls from the cell.

Because, like our zombie example, crossing cell boundaries is the highest-risk operation. Firewalls, unreliable networks, and failing services can drop a request at any time.

Data Population

Like our neighborhoods, cells need supplies. However, a cell’s supply route doesn’t provide food or medical supplies; it supplies the data used during processing. Data Population

Data replication might occur between cells, from a central hub, or both. The goal is to ensure that neighborhoods and cells have what they need before they need it.

Traffic Controls

The transportation group is the infamous “cell router.” Traffic Controls

A cell router is responsible for detecting failure within a cell and rerouting requests to other neighborhoods.

While leaving a cell is always dangerous, there are ways to mitigate those dangers (e.g., retries, circuit breakers, priority load balancing). The cell-router is where you place an extra emphasis on these techniques. Cells should be simple; cell-routers take on complexity.

TL:DR;

Cell-based Architecture

Don’t build one big system with shared dependencies and single points of failure.

Build isolated cells that can fully serve requests on their own. This limits blast radius, keeps traffic local, and allows the system to scale safely.

EOF