Head in the Clouds: DBaaS

This article addresses the benefits provided from database virtualization. Before we proceed however, it is important to explain that database virtualization does NOT mean simply running a DBMS inside a virtual machine.

Database Virtualization, More Than Running a DBMS in a Virtual Machine

While running a DBMS in a VM can provide advantages (and disadvantages) it is NOT database virtualization. Typical databases fuse together the data (or I/O) with the processing (CPU utilization) to operate as a single unit. Simply running that single unit in a VM does not provide the benefits detailed below. That is not database virtualization that is merely server virtualization.

An Example of the Database Virtualization Problem

Say you have a database handling banking and I have $10MM in the bank (I wish). Now let’s assume that the bank is busy, so it bursts that database across 3 VM nodes in typical cloud-style. Now each of those 3 nodes gets a command to wire out the full $10MM. Each node sees its balance at $10MM, so each one wires out the full amount, for a total wire transfer of $30MM…see the problem? In order to dynamically burst your database across nodes, you need a distributed locking mechanism so that all nodes see the same data and can lock other nodes from altering the same data independently. This sounds easy, but making it perform well is a massive undertaking. Only two companies have solved this problem: Oracle RAC and ScaleDB (for MySQL or MariaDB).

Defining Database Virtualization

It should enable the application to talk to a single virtual instance of the database, when in fact there are N number of actual nodes acting over the data.
It should separate the data processing (CPU) from the data (I/O) so that each can scale on demand and independently from the other.
For performance it should enable the actual processing of the data to be distributed to the various nodes on the storage tier (function shipping) to achieve maximum performance. Note: in practice, this is similar to MapReduce.
It should provide tiered caching, for performance, but also ensure cache coherence across the entire cluster.

Benefits of Database Virtualization

Higher Server Utilization: When the data is fused to the CPU, as a single unit, that one node is responsible for handling all usage spikes for its collection of data. This forces you to spit the data thinly, across many servers (siloes), forcing you to run each server at a low utilization rate. Database Virtualization decouples the data from the processing so that the spike in usage can be shared across many nodes on the fly. This enables you to run a virtualized database at a very high utilization rate.

Reduced Infrastructure Costs: Database virtualization enables you to use fewer servers, less power, less OS, tools, application licenses, network switches and storage, among other things.

Reduced Manpower Costs: Database virtualization simplifies the DBA’s job, since it uses only one schema and no sharding, it also simplifies backup processes, enabling the DBA to handle more databases. It reduces the application developer’s job because it eliminates code related to sharding: e.g. database routing, rebuilding relationships between shards (e.g. joins), and more. It also simplifies the network admin’s job because he manages fewer servers and they are identical.

Reduced Complexity: You only have a single database image, so elastically scaling up/down is simple and fast.

Increased Flexibility: Database virtualization brings the same flexibility to the database that server virtualization brings to the application tier. Resources are allocated and reallocated on the fly. If your usage profile changes, e.g. payroll one day, benefits the next, a virtual database uses the same shared infrastructure for any workload, while a traditional database does not.

Quality of Service: Since database images can move on the fly, without downtime, a noisy neighbor or noisy network is solved by simply moving the database to another node in your pool.

Availability: Unlike a traditional database, virtualized database nodes see all of the data, so they inherently provide failover for one another, addressing unplanned downtime. In regards to planned downtime, simply move the process to another server and take down the one that needs service, again without interruption.

Improved Performance: Because the pooled cache across the storage tier uses a Least Recently Used (LRU) algorithm, it can free up huge amounts of pooled cache to the then current workload, enabling near in-memory performance. Also, as mentioned above, the distribution of processing to the storage tier enables high-performance parallel processing.

True database virtualization delivers a huge set of advantages that in many ways mirror the benefits server virtualization provides to applications. For this reason, we expect database virtualization to be the next big thing, following in the footsteps of server, storage and network virtualization.

Additional Resources:

ScaleDB Database Virtualization Page

Wikipedia

Everybody loves free. It is the best marketing term one could use. Once you say “FREE” the people come running. Free makes you very popular. Whether you are a politician offering something for free, or a company providing free stuff, you gain instant popularity.

Xeround is shutting down their MySQL Database as a Service (DBaaS) because their free instances, while popular, simply did not convert into sufficient paid instances to support the company. While I am sad to see them fail, because I appreciate the hard work required to deliver database technology, this announcement was not unexpected.

My company was at Percona Live, the MySQL conference, and I had some additional conversations along these same lines. One previously closed source company announced that they were open sourcing their code, it was a very popular announcement. A keynote speaker mentioned it and the crowd clapped excitedly. Was it because they couldn’t wait to edit the code? Probably not. Was it because now the code would evolve faster? Probably not, since it is very low-level and niche oriented, and there will be few committers. No, I think it was the excitement of “free”. The company was excited about a 49X increase in web traffic, but had no idea what the impact would be on actual revenues.

I spoke with another company, also a low-level and niche product, and they have been open source from the start. I asked about their revenues, they are essentially non-existent. Bottom line is that the plan was for them to make money on services…well Percona, Pythian, SkySQL and others have the customer relationships and they scoop up all of the consulting and support revenue while this company makes bupkis. I feel for them.

I had a friend tell me that ScaleDB should open source our code to get more customers. Yes open source gets you a lot of free users…not customers. It is a hard path to sell your first 10...25…50…etc. customers, but the revenue from those customers fuels additional development and makes you a fountain of technology. Open source and free are great for getting big quickly and getting acquired, but it seems that if the acquisition doesn’t happen, then you can quickly run out of money using this model (see Xeround).

I realize that this is an unpopular position. I realize that everybody loves free. I realize that open source has additional advantages (no lock-in, rapid development, etc.), but in my opinion, open source works in only two scenarios: (1) where the absolute volume is huge, creating a funnel for conversion (e.g. Linux); (2) where you need to unseat an entrenched competitor and you have other sources of revenue (e.g. OpenStack).

I look forward to your comments. We also look forward to working with Xeround customers who are looking for another solution.

Head in the Clouds

Friday, July 12, 2013

Why You Should Embrace Database Virtualization

Thursday, May 2, 2013

Thoughts on Xeround and Free!

Search This Blog

My Blog List

About Me

Blog Archive

Followers