Metastability and Distributed Systems

#107 · ✸ 41 · 💬 8 · one year ago · brooker.co.za · kiyanwang · 📷
There's no more time-honored way to get things working again, from toasters to global-scale distributed systems, than turning them off and on again. Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is removed. That's a good list, but just scratching the surface of all of the possible causes of these 'down but up' states in distributed systems. This is the start of a line of thinking that treats large-scale distributed systems as control systems, and allows us to start applying the mathematical techniques of control theory and dynamical systems theory. It's interesting to think about what would be different in the way we teach CS, and the way we design and build systems, if we had instead chosen the mathematics of control systems and dynamical systems as the foundation. Overall, Metastable Failures in Distributed Systems is an important part of a conversation that doesn't get nearly the attention it deserves in the academic or industrial literature. Part of the thinking in that talk came from my own experience, and discussions of the topic in books like designing distributed control systems.
Metastability and Distributed Systems



Send Feedback | WebAssembly Version (beta)