Distributed PostgreSQL comes to Azure Cosmos DB

Ozgun Erdogan

Umur Cubukcu

Today, we’re excited to announce Azure Cosmos DB for PostgreSQL, a new Generally Available service to build cloud-native relational applications. This service brings developers the latest PostgreSQL features, allows you to start with a free trial, and scale out your database as your workload grows.

With this announcement, Azure also becomes the first cloud provider to offer its own single database service that supports both relational and NoSQL workloads. You can now build cloud-native applications for relational and non-relational data using the familiar Azure Cosmos DB database.

This blog post provides a high-level overview of our service, powered by open source Citus and PostgreSQL; and shares some of its key features.

Azure Cosmos DB for PostgreSQL

Azure Cosmos DB for PostgreSQL is the first managed database that brings together a combination of three key properties:

  1. True PostgreSQL, with latest versions: We work with and contribute to open-source PostgreSQL. This way, you don’t get a partial API. You get the full familiarity and benefits of PostgreSQL, within two weeks of each release.
  2. Cloud database: Benefit from a broad range of managed database features – so that you don’t have to worry about your database again. For example, you can create high availability configurations across availability zones, fork or restore your cluster to a particular point in time, or one-click upgrade your PostgreSQL & database extensions.
  3. Start small, scale globally—powered by Citus: Start testing your apps with a Free Trial. As your workload grows, scale it out by enabling distributed tables, powered by the Citus open source extension to PostgreSQL. This way, we’ll take care of relational features at scale – distributed transactions, deadlocks, foreign keys, and more – for you. If you need to go global, enable cross-region replication for lower latency & global availability.

If you’re just starting with Azure Cosmos DB, let’s see how you can benefit from these three properties with a few examples.

Free trial

The Azure Cosmos DB free trial is the easiest way to get started in building your cloud-native app. With this feature, you get all the native capabilities that comes with PostgreSQL, including rich JSON support, powerful indexing, extensive datatypes, full text search, and much more. Furthermore, as PostgreSQL releases new versions, we make those versions available to you within two weeks. This way, you can benefit from the latest features in PostgreSQL without delays.

Image Free Trial quickstart
Use Quick start in Azure portal to set up Azure Cosmos DB for PostgreSQL

 

Of course, Free Trial is enough to get started but you’ll need more as your application gets more serious. When this happens, you can click the Upgrade button to enable a broad range of new features.

Your cloud-native database

Upgrading from the free trial, or creating a new database for PostgreSQL, gives you many new capabilities. Example features include:

  • High Availability across Availability Zones (AZ)
  • Automatic backup/restore & ability to rewind to a particular point-in-time
  • One-click upgrade to latest PostgreSQL & extension versions
  • Scale up/down your CPU and storage resources
  • Encryption at rest and private endpoints
  • Compliance with global and local certifications across 30 Azure regions
  • Global distribution across Azure regions to tolerate regional failures
  • And more

Image Overview screen Portal
Configure your account and explore features in the Azure portal

 

With these features, you get managed database capabilities native to the cloud. We also provide cloud integrations so that you have an easier time building on Azure.

Azure cloud integrations

Another key feature of a cloud-native database is how well it integrates with the rest of the cloud. Prior to today, if you had data in Azure Blob Storage, you’d need to download that data to another VM and then upload it to your database. This introduced unnecessary friction when you were building your application.

Starting now, you can directly interface with Azure Blob Storage through a brand-new PostgreSQL extension, pg_azure_storage. After connecting to your PostgreSQL database, you just need to run the following commands:

SELECT create_extension('azure_storage');
SELECT azure_storage.account_add('mystorageaccount', 'SECRET_ACCESS_KEY');

CREATE TABLE github_events
(
    event_id bigint,
    event_type text,
    event_public boolean,
    repo_id bigint,
    payload jsonb,
    repo jsonb,
    user_id bigint,
    org jsonb,
    created_at timestamp
);

COPY github_events FROM 'https://mystorageaccount.blob.core.windows.net/data/github_events.csv' WITH (format 'csv');

With pg_azure_storage, you can also make modifications as you’re ingesting data from Azure Blob Storage using user-defined functions. For a detailed list of all features, you can refer to our documentation here.

With these cloud-native capabilities, you can build your application with ease. And better yet, you can build your applications ready for running at any scale. For this, our service for PostgreSQL has the Citus extension built-in and allows you to scale-out your database without limitations. Packaged as a fully open-source extension, Citus extends PostgreSQL with the power of distributed tables, enabling distributed query execution and performance at scale. Citus does this while preserving true PostgreSQL at its core, with support for JSONB, geospatial, rich indexing, relational semantics, and more.

Postgres with the power of distributed tables

With our service for PostgreSQL, you can start building your apps on a single node server group, the same way you would with PostgreSQL. As your app’s scalability and performance requirements grow, you can enable distributed tables and seamlessly scale to multiple nodes.

Azure Cosmos DB makes this transition – enabling distributed tables – easy. Previously, if you wanted to use the Citus extension to create a distributed table, you’d first have to pick a sharding key. You’d then have to run a command that would block write operations. With Citus 11.1, creating a distributed table and many previously write blocking operations, become fully online.

If you want to see how this online operation works, you’ll love this 1-minute video.

Once you create a distributed table, Citus takes care of the rest. Example features include:

  • Distributed transactions & distributed deadlock detection
  • Automatic colocation groups that allow you to enforce foreign keys, constraints, and easily join your data without costly repartition operations
  • Distributed query processing, where computations are shipped to the data
  • Distributed utility commands, such as index creation, Vacuum / Analyze
  • Ability to read from and write to any one of the nodes in the cluster
  • Online shard rebalancing & isolating noisy tenants / shards
  • And more

Using these features, you can scale out many types of applications. Real-world customer applications built this way include multi-tenant SaaS, real-time operational analytics, and high throughput transactional apps. These apps span across various verticals such as sales & marketing automation, healthcare, IOT/telemetry, asset tracking & logistics, finance, and search.

Performance at scale

The primary benefit of scaling is performance. Since users run PostgreSQL across many workloads, we use various benchmarks in testing our service’s performance. For these workloads and their respective benchmarks, we shared a detailed description here.

Among these benchmarks, HammerDB is an open-source one that implements the TPC-C specification. HammerDB also provides benchmark implementations for a lot of different databases, including Citus database. This makes it easy to compare results across different database engines.

For our tests, we first thought about running HammerDB against a custom hardware config with the goal of showing high performance results. However, we then decided to test our service’s performance in an easily repeatable way, using the exact same setup and features you’d get in production.

So, we open sourced a benchmarking tool that provisions a production cluster in Azure using our service for PostgreSQL. Once you have the benchmark and an Azure subscription, all you need to do is run this simple command:

# IMPORTANT NOTE: Running this command will provision 4 new Citus clusters
# and 4 times a 64-vCore driver VM in your Azure subscription. So, running
# the following command will cost you (or your employer) money!
azure/bulk-run.sh azure/how-to-benchmark-blog.runs | tee -a results.csv

When we ran HammerDB’s TPC-C implementation on a Citus cluster of 20 nodes, we saw results exceeding 2.0 million NOPM. Even more exciting, this result didn’t come with a custom setup, but rather with our regular managed service and all its available features.

You can read more about our 2M NOPM HammerDB results here.

Globally distributed database

Another key benefit to Azure Cosmos DB is global availability. With Azure Cosmos DB, you can create clusters spanning across regions and have your application query the database across those regions. We aspire to bring you the same benefits with our service for PostgreSQL.

Starting today, you can create read replicas for PostgreSQL in any supported region. You can also promote a replica to an independent server group that is readable and writable. Cross-region read replicas along with cluster promotion then brings you the following benefits:

  1. Low latency reads: For geo-distributed applications, you can serve reads from the same or nearest region
  2. Disaster recovery: If you’re observing a regional outage that covers multiple Availability Zones, you can failover to another region by promoting the replica in that region
  3. Migrating to another region: If you want to move to another region, you can create a replica in the new region, wait for the data to catch up, and then promote the replica

screenshot of selecting Azure regions in which to replicate data using the Azure portal
Use the Azure portal to select Azure regions in which to replicate data

Start your journey towards a globally distributed PostgreSQL database 

Today, we’re excited to announce General Availability for Azure Cosmos DB for PostgreSQL. With this service, you can now start your journey in building cloud-native applications using our Free Trial. You can then continue onto using a feature-rich managed database, natively integrate with other Azure cloud services, scale out your database as your workload grows, and globally distribute your database across regions.

Thanks to these features, you can focus on your application and stop worrying about your database.

If this sounds interesting, you can spin up a new instance using our Try Azure Cosmos DB Free trial today. If you’re further along in your journey and need access to all features, you can create a small instance through the Azure Portal instead.

Of course, if you have questions or comments in your journey, we’d be happy to hear from you. Please feel free to reach us anytime.

 

Microsoft CEO Satya Nadella announces Azure Cosmos DB for PostgreSQL
Microsoft CEO Satya Nadella announces Azure Cosmos DB for PostgreSQL at Ignite 2022

 

Postgres, PostgreSQL and the Slonik Logo are trademarks or registered trademarks of the PostgreSQL Community Association of Canada, and used with their permission.

12 comments

Discussion is closed. Login to edit/delete existing comments.

  • Tyler BeckerMicrosoft employee 2

    The link to pg_azure_storage quickstart is broken/non existent.

    • Claire GiordanoMicrosoft employee 1

      Tyler, thanks for letting us know. Umur has removed the pg_azure_storage link temporarily for now—and will get it reinstated soon, ideally in the upcoming days.

  • Thomas Levesque 5

    So, basically, this is just Citus? What does this have to do with Cosmos DB? Or is it just called “Cosmos DB for PostgreSQL” for marketing reasons?

    • Sai Krishna SrirampurMicrosoft employee 2

      Hello Thomas. Thanks for the question. Yes the product is powered by the Citus extension. It is a managed service offering though, making it easy to deploy and manage postgres with citus on azure. It provides features like geo-replication, auto-HA, latest pg/citus versions with in-place upgrades, in-built pgbouncer, security related features like private end-points, monitoring.

      It is under the Cosmos DB umbrella because of horizontal scale capabilities that it offers. Historically Cosmos DB has been the highly scalable NoSQL offering for Azure. With this addition, goal is to offer the same scale but now also in the Relational/SQL world.

      I do want to call out that, while implementation of NoSQL and Postgres (powered by citus) APIs differ, ideas, algorithms and IP were reused and will continue to be reused across each of them. Few examples are reflected in the latest version of Citus – it supports non-blocking create_distributed_table for online scale out, online shard splitting etc. More in this blog https://www.citusdata.com/blog/2022/09/19/citus-11-1-shards-postgres-tables-without-interruption/ . Overtime we also plan to provide a consistent managed service experience across all Cosmos DB APIs.

      Aligned with the content in the above blogpost, I had created a short overview of the product – https://www.youtube.com/watch?v=nT64dFSfiUo to capture in a nutshell what it offers. Do check it out when possible 🙂

      • Ruben Garrigos Dominguez 1

        I’m with Thomas, this just seem a marketing rebranding. Even the link to Azure Database for PostgreSQL – Hyperscale (https://learn.microsoft.com/en-us/azure/postgresql/hyperscale/overview) is pointing now to the “Cosmos DB for PostgreSQL”.

        What will be next? Azure Cosmos DB for SQL Server instead of Azure SQL Database Hyperscale? Azure Cosmos DB for MySQL?

      • David Martinez Rada 0

        But how does this work with the partition model of Cosmos? Because that is the main feature of it.
        Cosmos is a Nosql database as long as you are in a single partition.

        With postgress you will still have partitions and each one will be a postgress db? or no partitions at all?

        • Sai Krishna SrirampurMicrosoft employee 0

          Cosmos DB for PostgreSQL also has a concept similar to partitioning. It is called sharding (a.k.a distributing tables).

          Sharding is based on the hash of a column, which is called distribution column. Distributing a table based on a distribution column decomposes the table into shards. Shards are plain postgres tables residing on nodes in the cluster. Each shard holds a subset of data – subset of distribution column values.

          There is inbuilt distributed planning which decides how to execute queries – either route them to a node(s) (common in transactional workloads) or parallelize them across the nodes (common in real-time analytics workloads). More details on distributing tables can be found in the below articles:

          https://learn.microsoft.com/en-us/azure/cosmos-db/postgresql/quickstart-build-scalable-apps-concepts#architectural-overview

          https://www.youtube.com/watch?v=kCCDRRrN1r0

  • Charles Chen 3

    This branding is incredibly misleading and confusing.

    I thought it was Postgres wire compatibility for CosmosDB when in fact it has nothing to do with CosmosDB.

    There’s no way this branding makes any sense.

    • Martin M 1

      I agree.

      This nomenclature makes no senses whatsoever. If it wasn’t for this forum I would walk away completely bamboozled as to what the product is and what it does.
      C’mon Microsoft – call a spade and shovel.

    • Jason Nadrowski 1

      I agree. Very confusing.

  • Chris Harris (PYTHON)Microsoft employee 0

    What if I just need the columnar storage features of the Citus extension without the data partitioning. Is that part of “Azure Cosmos DB for PostgreSQL”?

  • Vibeeshan Mahadeva 0

    Does it support “Reserved Pricing”

Feedback usabilla icon