Benjamin Hindman argues that the data center needs an operating system, and I agree. Benjamin Hindman is one of the creators of Apache Mesos and the chief architect at Mesosphere and I was curious about his thoughts.
Large-scale, distributed systems: Benjamin’s starting point is that modern applications no longer fit on a single server. Instead, large-scale, distributed systems run on frameworks, such as Apache Hadoop and Apache Spark, message brokers like Apache Kafka, key-value stores like Apache Cassandra.
The right unit of abstraction: As a consequence, applications, not servers, should be the unit of abstraction in the data center. When developers must deal with machines as the available level of abstraction, they need to deal with IP addresses and local storage, which makes moving and resizing applications difficult.
Labor-intensive and inefficient: Operators who deploy applications must anticipate machine loss and often harness complexity by deploying one application per machine, which is clearly inefficient. This problem will become more pronounced as companies replace monolithic architectures with service-oriented architectures and build more software based on micro-services. As a result, data center run at only 8-15% efficiency, and running applications is too labor intensive.
Data center operating system: applications should run on any available resources from any machine, even if there are other applications already running on those machines. The data center operating system should allocate applications to machines, providing resource management and process isolation. An API for the data center would allocate and de-allocate resources, launch, monitor, and destroy processes, and support service discovery and coordination.
Service discovery and coordination: Most distributed applications achieve high availability and fault tolerance through some means of coordination, such as consensus, which is notoriously hard to implement correctly and efficiently. Existing tools for service discovery and coordination include Apache ZooKeeper andCoreOS’ etcd. It would be attractive to centrally offer these services as part of a data center operating system instead and to allow developers to launch applications via a CLI or GUI, and the application executes using the data center operating system’s API.
Apache Mesos, the distributed systems kernel: The open source Apache Mesos project, of which Benjamin Hindman is a co-creators and the project chair, is a step in that direction. Apache Mesos aims to be a distributed systems kernel that provides a portable API upon which distributed applications can be built and run. Distributed systems that can leverage Mesos include Apache Spark, Apache Aurora, Airbnb’s Chronos, Mesosphere’s Marathon, Apache Hadoop, Apache Storm, and Google’s Kubernetes.
For these reasons, Apache Mesos is popular in the industry. For example, Chronos, a distributed system that provides highly available and fault-tolerant cron was built on top of Mesos in a few thousand lines of code, without explicit socket programming for network communication. Twitter and Airbnb use Mesos to run their data centers, and many others are also leveraging Mesos.
For more details, see Benjamin Hindman’s post.