Fallacies of distributed computing - RICH LAST
I've always loved how enterprise application platforms such as Spring, Java EE, and to some extent, PeopleSoft simplified many aspects of the application development. The key enterprise development focus is building the business logic, and it's great when your platform hides the complexities of dealing with things like concurrency and thread-safety.
Spring does that by providing via component scopes and dependency injection, Java EE deals with that through Enterprise Java Beans that can manage concurrency and isolation for the developer among other things. PeopleSoft is trying to accommodate that by bridging WebLogic with Tuxedo (which stands for Transactions for Unix EXtended for Distributed Operations, AT&T technology from the 80s).
As we move into distributed space, beyond a single server or a cluster, we are losing a lot of the convenience of a monolithic enterprise platform. Things are somewhat different, and what was taken for granted needs to be explicitly taken care of. Fallacies of distributed computing summarize these extra concerns very well. These fallacies were initially defined by Peter Deutsch and James Gosling in the early 2000s. An easy way to remember them is RICH LAST acronym:
- R - the network is Reliable - as we rely on the network for service communication, it's important to consider lost packages and missed messages. Idempotent requests are easy to retry, but what about stateful commands? Can we make automatic retries? Can we use a Circuit Breaker pattern to help dealing with catastrophic failures?
- I - bandwidth is Infinite - as more and more calls are made between services, network slows down, impacting latency and reliability. It becomes increasingly important to manage the size of the messages and return just the right amount of data (stamp coupling). Would using GraphQL or CQRS pattern might be a good solution?
- C - transport Cost is zero - We often assume that calls come for free, but try amortizing the cost of the hardware, networking, serialization, etc! It quickly adds up, even with such monsters as AWS
- H - the network is Homogeneous - network is usually made of a range of devices from different manufacturers that may not always work well together. It may impact solution reliability and performance.
- L - Latency is zero - local calls are 1000s orders of magnitude faster than distributed calls (see codinghorrors.com on that). What happens when a call that completes in 10ms in dev, starts taking 500ms in prod? It's important to remember latency averages, as well as 95th-99th percentiles, as it can make or break a distributed system.
- A - there is only one Administrator - there always multiple people in charge of an enterprise solution. so coordination efforts increase significantly when compared to monolithic apps.
- S - the network is Secure - can we throw in a VPN and consider it secure? Or do we need to secure each endpoint in a layered approach? Regardless of the choice we still add communication overhead and make performance slower.
- T - the Topology never changes - network architecture constantly
changes, and it's mission critical to monitor those changes and estimate
the impact on the solution performance. Do we need to run a Chaos Monkey?
To learn more about fallacies of distributed computing and architecture in general, take a look at Fundamentals of Software Architecture by Mark Richards