This article addresses the benefits provided from database
virtualization. Before we proceed however, it is important to explain that
database virtualization does NOT mean simply running a DBMS inside a virtual
machine.
Database Virtualization, More Than Running a DBMS in a
Virtual Machine
While running a DBMS in a VM can provide advantages (and
disadvantages) it is NOT database virtualization. Typical databases fuse
together the data (or I/O) with the processing (CPU utilization) to operate as
a single unit. Simply running that single unit in a VM does not provide the
benefits detailed below. That is not database virtualization that is merely
server virtualization.
An Example of the Database Virtualization Problem
Say you have a database handling banking and I have $10MM in
the bank (I wish). Now let’s assume that the bank is busy, so it bursts that
database across 3 VM nodes in typical cloud-style. Now each of those 3 nodes gets a command to
wire out the full $10MM. Each node sees its balance at $10MM, so each one wires
out the full amount, for a total wire transfer of $30MM…see the problem? In
order to dynamically burst your database across nodes, you need a distributed
locking mechanism so that all nodes see the same data and can lock other nodes
from altering the same data independently. This sounds easy, but making it
perform well is a massive undertaking. Only two companies have solved this
problem: Oracle RAC and ScaleDB (for MySQL or MariaDB).
Defining Database Virtualization
- It should enable the application to talk to a
single virtual instance of the database, when in fact there are N number of
actual nodes acting over the data.
- It should separate the data processing (CPU)
from the data (I/O) so that each can scale on demand and independently from the
other.
- For performance it should enable the actual
processing of the data to be distributed to the various nodes on the storage
tier (function shipping) to achieve maximum performance. Note: in practice, this
is similar to MapReduce.
- It should provide tiered caching, for
performance, but also ensure cache coherence across the entire cluster.
Benefits of Database Virtualization
Higher Server Utilization: When the data
is fused to the CPU, as a single unit, that one node is responsible for
handling all usage spikes for its collection of data. This forces you to spit
the data thinly, across many servers (siloes), forcing you to run each server
at a low utilization rate. Database Virtualization decouples the data from the processing
so that the spike in usage can be shared across many nodes on the fly. This
enables you to run a virtualized database at a very high utilization rate.
Reduced Infrastructure Costs: Database
virtualization enables you to use fewer servers, less power, less OS, tools,
application licenses, network switches and storage, among other things.
Reduced Manpower Costs: Database
virtualization simplifies the DBA’s job, since it uses only one schema and no
sharding, it also simplifies backup processes, enabling the DBA to handle more
databases. It reduces the application developer’s job because it eliminates
code related to sharding: e.g. database routing, rebuilding relationships
between shards (e.g. joins), and more. It also simplifies the network admin’s
job because he manages fewer servers and they are identical.
Reduced Complexity: You only have a
single database image, so elastically scaling up/down is simple and fast.
Increased Flexibility: Database
virtualization brings the same flexibility to the database that server
virtualization brings to the application tier. Resources are allocated and
reallocated on the fly. If your usage profile changes, e.g. payroll one day,
benefits the next, a virtual database uses the same shared infrastructure for
any workload, while a traditional database does not.
Quality of Service: Since database images
can move on the fly, without downtime, a noisy neighbor or noisy network is
solved by simply moving the database to another node in your pool.
Availability: Unlike a traditional
database, virtualized database nodes see all of the data, so they inherently
provide failover for one another, addressing unplanned downtime. In regards to
planned downtime, simply move the process to another server and take down the
one that needs service, again without interruption.
Improved Performance: Because the pooled
cache across the storage tier uses a Least Recently Used (LRU) algorithm, it
can free up huge amounts of pooled cache to the then current workload, enabling
near in-memory performance. Also, as
mentioned above, the distribution of processing to the storage tier enables
high-performance parallel processing.
True database virtualization delivers a huge set of
advantages that in many ways mirror the benefits server virtualization provides
to applications. For this reason, we expect database virtualization to be the
next big thing, following in the footsteps of server, storage and network
virtualization.
Additional Resources: