...massive parallelism and fault tolerance to embedded systems, and
...predictable execution and resource consumption to cloud-scale systems
Modularity based on abstraction is the way things get done.Barbara Liskov
The ability to master complexity is not the same as the ability to extract simplicity.Scott Shenker (describing Don Norman)
Predictability means that all operations have a bounded, known time-span. Systems that control the physical world (cars, planes, ...) requires predictable control. Composite focuses on full-system, end-to-end predictability from the smallest of embedded systems, to massively multi-core, and cloud systems.
Composite provides pervasive parallelism. FJOS is an OS implemented on Composite that provides a fork-join, OpenMP environment that is significantly faster than Linux and predictable. FJOS enables fine-grained parallelism that scales across sockets up to 40 cores.
One of the most successful component-based systems is the UNIX command line. Comparably, Composite emphasizes the composition of all system functionality from fine-grained components. Resource management abstraction enables virtual environments, with little overhead.
Computer systems are increasingly required to manage physical environments where a failure can compromise safety or lead to loss of life. Composite can recovery from faults in low-level services such as the scheduler by using pervasive fault isolation, recovery based on µ-rebooting, and intelligent state reconstitution.
Composite is a research operating system out of the George Washington University. It is a clean-slate approach to system development given the new realities of the need for high-confidence, secure, parallel, and predictable systems. See the current contributors, and see the links in the menu for the mailing list, source, and publications.
For a quick, high-level overview of Composite, see the project report
Composite is an Operating System (OS) with a focus on reliability, predictability, security, and flexibility/configurability. System policies, abstractions, and mechanisms are defined and implemented as user-level components. This makes the kernel void of any hard-coded resource management policies or intelligence. For example, taking the idea of a minimal kernel further than µ-kernels Composite removes even the scheduler and physical memory tracking/mapping components out of the kernel. This produces a simple, stupid kernel, with configurable/replacable components defining the important aspects of the system.
Components are hardware isolated from each other and execute at user-level thus increasing both system reliability and security. High-confidence, or "must-be-secure" systems can customize/minimize their execution environment, while advanced functionality can be added by simply composing together many components. UNIX pipes are on of the most successful methods for functionality composition in systems, and Composite empowers the composition of not only small text-processing processes, but also resource management and low-level system components. In a sense, you can think of it as pipes that can enable the composition of, for example, even high-performance web-servers. The component graph to the right is a component-based web-server.
Just as a crash cart is used to resuscitate patients having an unplanned emergency, C3 provides the ability for a system to recovery from the dire emergency of one of its system-level components failing. C3 can survive faults components such as the main system scheduler, the file-system, and the physical memory mapper. It does this while not significantly changing the code of the component (it is mostly transparent), and with a low overhead in the case of no errors. When an error is detected, the system µ-reboots the offending component, and rebuilds the state using the interface-specific semantics of communication between components.
FJOS is implemented as a set of libraries, and components that implement the run-time of the popular OpenMP programming environment for fork-join parallelism. OpenMP enables programs to harness the parallelism of modern multi-core processors by enabling different iterations of loops to execute on different cores. FJOS uses wait-free data-structures for shared-memory coordination, and low-level access to Inter-Processor Interrupts (IPIs), both decoupled from the management of parallelism for efficient, predictable execution on many cores. FJOS is up to a factor of 2 faster than Linux for both spin-based implementations, and up to a factor of 3 for block-based implementations. The system offers significantly lower worst-case costs for basic fork-join operations, leading to control loops that can execute on up to 40 cores with an overhead of less than 10 µ-seconds.
CPU, Memory, and I/O resources are managed hierarchically in Composite. This system for Hierarchical Resource management called HiRes enables individual subsystems to customize fundamental resource management policies efficiently and predictably while also isolating different subsystems from each other. A web-server might customize its scheduling and I/O management policies, while a data-base uses specialized memory management policies. Fine grained resource control is maintained, while ensuring isolation between subsystems. As opposed to Exokernels that place all resource management in processes, and monolithic kernels that place management in the single kernel, HiRes enables components to define the appropriate hierarchy for the system's goals. Unlike virtual machines, HiRes, enables the resource management hierarchy to be application and system specific, rather than prescribed by a hardware-driven design, and enables low-overhead management of resources even in children subsystems (e.g. VMs).
The kernel includes no policies, instead implementing them in user-level components. Thus system policies including those for low-level resource management are redefinable, and isolated. The Composite kernel does not have a scheduler. Scheduling policies and synchronization protocols are defined by user-level schedulers. They can be replaced, and redefined for specific systems and applications -- a useful capability in real-time and embedded systems. Additionally, faults in other parts of the system, or in the scheduler itself will not propogate to the rest of the system. To enable efficient and predictable component-based scheduling, special care is taken to ensure accurate accountability, and low overhead for interrupt scheduling.
End-to-end execution in an embedded system starts from when a sensor reading is taken, and continues to include the commands sent out to the actuators. The path of execution between these points is complex and often relies on communication between multiple domains, and threads, it requires the requisition of memory for processing, and for passing arguments between domains. Even in a server-based application such as a web-server, the execution between packet reception and send is comparably complex. Composite uses thread migration, bounded-time shared buffer and execution context (stack) allocation, and unified priority-inheritance over all contended resources to provide predictable end-to-end communication.
We are investigating manipulating a form of memory in Composite called transient memory. This dynamically allocated memory simply has a bounded interval between allocation and deallocation. Transient memory enables memory to be dynamically added and removed from specific applications and scheduled throughout the system, while meeting application end-to-end constraints (i.e. deadlines). By intelligently shuffling memory around the system, we show that -- without swapping -- you can significantly increase the amount of usable system memory even in embedded systems.
To maintain high isolation, invocations between components require IPC, which causes some overhead. The web-server makes between 50-70 IPCs to serve a webpage. Composite supports Mutable Protection Domains (MPD), which enables the system to dynamically construct and remove protection domain boundaries between components in response to where communication (IPC) overheads exist. Using MPD with the webserver, we increase performance by 40% while removing only 15% of the protection boundaries. MPD enables the system to optimally trade-off fault isolation for performance.
Composite was used in the verifiable election based on Scantegrity for Takoma Park, MD. It provided a secure webpage for verifying ballots after the election. The system is based on the premise that the policies, abstractions, and mechanisms of the system, if tailored for the very specific task of providing a security bulletin board (BB) system, can be streamlined to produce a very specialized, secure, and reliable system. This is particularly important for verifiable voting systems that make the strong assumption that the BB is secure.
We'd like to sincerely thank our sponsors. The Composite Component-Based OS development effort has been supported by grants from the National Science Foundation (NSF) under awards CNS 1137973, CNS 1149675, and CNS 1117243.