Microsoft unveils Azure Cosmos DB, its globally-distributed, multi-model database service

Microsoft announced Wednesday general availability of Azure Cosmos DB, its globally-distributed data service that lets users to elastically scale throughput and storage across any number of geographical regions while guaranteeing low latency, high availability and consistency – backed by comprehensive SLAs in the industry. Azure Cosmos DB is built to power today’s IoT and mobile apps, and tomorrow’s AI-hungry future.

The cloud database can natively support a multitude of data models and query APIs, which is built on a novel database engine capable of ingesting sustained volumes of data and provides blazing-fast queries – without having to deal with schema or index management. As it is the initial cloud database to offer five defined consistency models, users can choose just the right one for the app. To create these five consistency levels, and build many of the capabilities within Azure Cosmos DB, Microsoft integrated distributed systems and database research with world-class engineering rigor.

Azure Cosmos DB contains a write optimized, resource governed, schema-agnostic database engine that natively supports multiple data models: key-value, documents, graphs, and columnar. It also supports many APIs for accessing data including MongoDBDocumentDB SQLGremlin (preview), and Azure Tables (preview), in an extensible manner.

Azure Cosmos DB started in late 2010 to address developer pain-points that are faced by large scale applications inside Microsoft. Since building globally distributed applications is not a problem unique to just to Microsoft, the service has been made available externally to all Azure Developers in the form of Azure DocumentDB. Azure Cosmos DB is the next big leap in the evolution of DocumentDB and making it available for users.

Azure Cosmos DB started as “Project Florence” in 2010 to address developer the pain-points faced by large scale applications inside Microsoft. In 2015 Microsoft made the first generation of this technology available to Azure developers in the form of Azure DocumentDB. Since that time, the company added new features and introduced significant new capabilities.  Azure Cosmos DB is the result.  It is the next big leap in globally distributed, at scale, cloud databases. As a part of this release of Azure Cosmos DB, DocumentDB customers, with their data, are automatically Azure Cosmos DB customers. The transition is seamless and they now have access to the new breakthrough system and capabilities offered by Azure Cosmos DB.

With Azure Cosmos DB, Microsoft wants to enable the developers to build cosmos-scale apps, more easily. As a part of this release of Azure Cosmos DB, DocumentDB customers (with their data) are automatically Azure Cosmos DB customers. The transition is seamless and they now have access to a range of new capabilities offered by Azure Cosmos DB.

Azure Cosmos DB containers are distributed along two dimensions — within a given region, all resources are horizontally partitioned using resource partitions (local distribution); and each resource partition is also replicated across geographical regions (global distribution).

As a globally distributed database service, Azure Cosmos DB provides Turnkey global distribution that is instantly available to users, and with data too. It also offers multiple data models and APIs for accessing and querying data such as support for multiple data models including key-value, document, graph, and columnar; extensible APIs for Node.js, Java, .NET, .NET Core, Python, and MongoDB; and SQL and Gremlin for queries.

The new offering can elastically scale throughput and storage on demand, worldwide at second and minute granularities, and change it anytime, while also scaling storage transparently and automatically to cover size requirements now and forever. It also builds highly responsive and mission-critical applications to get access to data with single digit millisecond latencies at the 99th percentile, anywhere in the world.

Azure Cosmos DB natively supports multiple data models including documents, key-value, graph, and column-family. The core content-model of Cosmos DB’s database engine is based on atom-record-sequence (ARS). Atoms consist of a small set of primitive types like string, bool, and number. Records are structs composed of these types. Sequences are arrays consisting of atoms, records or sequences. The database engine can efficiently translate and project different data models onto the ARS-based data model. The core data model of Cosmos DB is natively accessible from dynamically typed programming languages and can be exposed as-is as JSON.

The service also supports popular database APIs for data access and querying. Cosmos DB’s database engine currently supports DocumentDB SQL, MongoDB, Azure Tables, and Gremlin. Users can continue to build applications using popular OSS APIs and get all the benefits of a battle-tested and fully managed, globally distributed database service.

As part of its SLAs, Cosmos DB guarantees end-to-end low latency at the 99th percentile to its customers. For a typical 1-KB item, Cosmos DB guarantees end-to-end latency of reads under 10 ms and indexed writes under 15 ms at the 99th percentile, within the same Azure region. The median latencies are significantly lower (under 5 ms). With an upper bound of request processing on every database transaction, Cosmos DB allows clients to clearly distinguish between transactions with high latency vs. a database being unavailable.

Commercial distributed databases fall into two categories: databases that do not offer well-defined, provable consistency choices at all, and databases which offer two extreme programmability choices (strong vs. eventual consistency). The former burdens application developers with minutia of their replication protocols and expects them to make difficult tradeoffs between consistency, availability, latency, and throughput. The latter puts a pressure to choose one of the two extremes.

Despite the abundance of research and proposals for more than 50 consistency models, the distributed database community has not been able to commercialize consistency levels beyond strong and eventual consistency. Cosmos DB allows users to choose between five defined consistency models along the consistency spectrum – strong, bounded staleness, session, consistent prefix, and eventual.

Leave a Reply

WWPI – Covering the best in IT since 1980