Friday, November 27, 2009

Virtual Databases: The Face of the New Cloud Database

Shared-disk databases can be virtualized—making them cloud-friendly—while shared-nothing databases are tied to a specific computer and a specific data set or data partition.

The underlying principle of the shared-nothing RDBMS is that a single master server owns its specific set of data. That data is not shared, hence the name shared-nothing. Because there is no ability to share the data, there is also no ability to virtualize the computing of that data. Instead the shared-nothing RDBMS ties the data and the computing to a specific computer. This association with a physical machine is then reinforced at the application level. Applications leveraging a shared-nothing database, that is partitioned across more than one server, use routing code. Routing code simply directs the various database requests to the servers that own the data being requested. In other words, the application must know which server owns which piece of data. This further reinforces the mismatch between shared-nothing databases and virtualization.

This is not to say that it is impossible to virtualize a shared-nothing database. As any software architect will tell you, “You can do anything in software…” The second part of that statement is “…but it may not perform or scale well, and it may make maintenance very painful.” The latter part of that statement is exactly what you will find with any effort to virtualize a shared-nothing database. Attempts to insert layers of indirection will result in added complexity that makes maintenance a nightmare. Finding bugs, tuning performance, recovering from failure, all of these issues are severely compounded when you introduce layers of indirection in a shared-nothing database.

The performance, and hence the scalability are also undermined in this model. In order to support dynamic virtualization, you must mediate the requests from the application before they hit the database. This requires a piece of middleware that sniffs each database request and routes it to the appropriate server. What happens when a database request spans multiple servers? Suffice it to say it isn’t pretty, and it doesn’t perform well. This sort of request will result in a lot of data shipping and joins. The bottom line is that partitioning your database to achieve performance, scalability and maintenance is a black art, all attempts to automate this process have failed.

Compare this to the shared-disk DBMS. Shared-disk separates the compute from the storage. The data is stored in one big trough, while you can have any number of compute instances feeding on the entirety of that data. Because each node has access to all of the data, you don't need any middleware to route the database requests to specific servers. Furthermore, each of the compute nodes is identical, making them virtualization-friendly. If one node fails, the others recover the transactions, while the application continues uninterrupted. You can also add nodes on the fly, again without interrupting the application. For these reasons, the shared-disk RDBMS is ideal for virtualization, while the shared-nothing RDBMS is anathema to virtualization.

This is an excerpt from a white paper I'm writing that addresses virtualized cloud databases.

Friday, November 20, 2009

Who owns the customer in the cloud?

In the world of technology, customer ownership has always been a huge issue. The company that owns the relationship is able to influence purchasing decisions that surround their product(s). For example, if the customer is tied into a specific application, that application can influence purchases down the stack (database, operating system, etc.). In these cases the specific (e.g. vertical applications) had an inherent advantage over the more generic or interchangeable (e.g. databases).

Then companies began to standardize on certain infrastructure elements. For example, a company might say “We are a Windows shop” or “We are an Oracle shop” and unless you had a REALLY compelling reason, you had to run on that infrastructure.

Cloud computing introduces a new dynamic. For example, you might be an HP equipment company, but if you use Amazon AWS, what is their equipment? They won’t tell you. What is the storage equipment used in S3 and EBS? Sorry, can’t say. You see those pieces are commodities.

In time, the process of commoditization will move right up the stack, and the cloud vendor strongly influences this process. Sure you can run various databases on AWS, just create your own AMI. But if you want the vertically integrated “out-of-the-box” database, that would be RDS running MySQL.

So, who owns that MySQL customer? Amazon, not MySQL/Sun/Oracle. The Amazon package bundles EBS, automation and more, and only Amazon can support it properly.

Who owns the customer when an application like SugarCRM is run on Amazon? That depends. Who sold the customer. If SugarCRM sold the customer and offered it as SaaS, then they own the customer. If the customer goes directly to Amazon, then Amazon owns the customer. If Amazon takes it one step further by integrating various other services, tools, etc. and then brands the resulting package as Amazon CRM, then Amazon will increasingly own the customers of the future.

In fact, I wouldn’t be surprised to see Amazon offer a suite of fully integrated business applications based upon open source applications and the LAMP infrastructure, sort of a back-office suite. Then you don’t have to worry about all those messy integration issues. Your CRM application will work with your document management application, etc. Sure, you could assemble your own solution in a piecemeal fashion, but why would you?

At that point, the question is not what CRM do you use, but rather which cloud suite do you use and is it compatible with my cloud suite. In other words, does the Amazon suite interoperate with the Cisco, EMC/VMWare, HP, IBM, and Rackspace suites.

The starting point for all of these solutions will be LAMP and open source applications. But each will focus initially on automation to simplify the effort, like RDS. Then they will deliver interoperability. Finally they will innovate in proprietary ways to deliver a better experience. And with each step they will further establish their ownership of the customer. This is why every major technology company, with the exception of Oracle, is assembling their own cloud solution, because they don’t want someone else owning their customers.