Kubernetes and the Distributed Database

Author: Charleste King | 9 min read | November 18, 2021

The use of containers and their Kubernetes orchestrations have revolutionized (again) how software is deployed and managed in the vast and frenetic ecosystems of today’s global ‘interweb.’

However, the databases storing the information upon which those containers rely haven’t evolved at the same pace, which complicates the sophisticated scale and deployment efforts of developers the world over. New tools and opportunities are becoming available, though and rethinking the ‘database’ and database administration as elements of your Kubernetes container pod is no longer just a far-off dream.

Contrasting SQL’s ‘Oil’ With Container’s ‘Water’

The primary reason why people consider containers and databases to be at odds with each other is that they (traditionally) work so differently from each other:

Databases are, by definition, ‘stateful’ – they retain records of previous transactions so users can return to them to obtain the reliable information they seek. And because they perform each transaction within the context of prior transactions, they use the same server every time they process a user’s query.
Containers, on the other hand, were initially designed to be ‘stateless’ – each transaction is a stand-alone event, with no association to previous activities. The container’s full functionality was directed by its individual parameters. The isolated nature of that single function facilitated the flexibility and portability that allowed its use in virtually any environment. (Eventually, developers added small elements of appropriate ‘stateful’ operations, such as storage capacities, that allowed the individual container to function in a more ‘stateful’ way.)

The fluidity of the container, then, would be constrained by the fixed nature of the traditional relational database.

Consequently, many developers weren’t interested in utilizing databases within a container or Kubernetes environment because of those restrictions. They didn’t want their applications to be limited by the single server deployment required for a traditional database and they did want the assets promised by Kubernetes orchestrated containers:

They wanted inter-host communication of containers for instances when they’re running containers on multiple hosts.
They wanted reliable deployment and maintenance of their apps at scale.
They liked the Kubernetes auto-healing properties at the application, component and infrastructure layers;
They wanted reliable logging of service activities and consistent uptime monitoring of the containers themselves.
They wanted better manageability through modularity, which allows them to adjust container elements without adversely affecting the entire application.

Traditional SQL databases couldn’t provide these services in the container environment.

Gartner predicts that by 2022, 75% of global organizations will be running containers in production. Learn more about our Kubernetes Consulting Services.

Learn More

Adding New Options to the Mix

The digital world has changed, however, and the emergence of new cloud computing tools facilitates a compromise between the fixed nature of the database and the fluidity of the containerized application. In a distributed database (DDB), developers place two or more DB files in different locations in a network or even on different networks, then spread the processing functions among several database nodes. Adding an equally emergent database administration tool, a dedicated database management system (DDBMS) which centralizes DB control across networks, the system acts as though all data were stored on the same computer. The DDBMS automatically synchronizes it across locations, so it acts as a ‘single’ database.

The DDB offers several benefits:

Unlike a single database file, the distributed database doesn’t bottleneck when multiple users access it – having data spread across the database system means more users have more access at any one time.
The physical proximity of distributed data stores speeds user access times.
As is with containers, recoveries using replicated versions from other locations eliminate delays and downtimes when one site fails.

The development of the distributed database reduces or eliminates developer concerns about mixing their traditional database system with their Kubernetes container system.

Embracing the Kubernetes Distributed Database Option

Developers new to the distributed database concept are rightfully excited about its opportunities and equally curious about how to best apply it to their work. Datavail’s Global Practice Lead & Director, MongoDB Services, Charleste King, answered questions on that very subject during a recent webinar. Her advice to beginners: Answer a few key questions about the app before starting the containerized architecture of its database:

What function will your Kubernetes pod accomplish?
Will it need access to transient data? A caching layer? Or, will it be a local store for nearby containers? The different functions require different programming.
How will the database be configured into the Kubernetes orchestration?
Who has access to the database? Who can modify it?
Will it have or need to have failover elections, replication, sharding and cluster management concerns?
What are the tools you’ll use to handle failover and switchover, coordinated node operations, routing, load balancing and connection pooling?

All of these elements are significant to attain DDB success and must be considered and designed in theory before beginning the architecting of your distributed database.

King also suggests becoming very familiar with Kubernetes challenges that will pose barriers to you as you lean into a distributed database for your container pods:

Most notably, the replication of data from one distributed DB pod across the network of pods happens asynchronously, not instantly, and can potentially cause data loss.
Also, as is the nature of the smaller infrastructure of the containerized system, there are memory and storage constraints in a DDB. Ensuring the full function and capacity of your Kubernetes systems requires factoring in the size and space limitations of the distributed database.
Traditional resiliency practices are not available in the Kubernetes environment because there is simply too much going on within the containerized pod and network for standard recovery programming to manage. Fortunately, Datavail’s professional database administrators are up to speed on the backup tools needed within the Kubernetes system to ensure resiliency in those apps when using a distributed database system.

The emergence of distributed databases that can match the flexibility and fluidity of Kubernetes systems revolutionizes (again) how the world manages its digital workflows and productivity.

To learn more, watch the webinar. Better yet, contact Datavail to speak directly to a Kubernetes distributed database administrator yourself.

Contact an Expert »

Blog Author

Charleste King

MongoDB Practice Lead

Charleste has more than 15 years of experience in the IT industry in a myriad of areas from software development to data analysis, architecture, and administration. She has worked supporting organizations from the very small to enterprise level in aerospace, agriculture, medicine, education, and other industries. She has developed solutions to unique problems for clients ranging from multi-level upgrades with minimal downtime, compliance conversions, documentation, monitoring, alerting, stabilization, trending, and forecasting problem areas, as well as tuning and performance monitoring.