Monday, December 21, 2009

Oracle/Sun vs. The Cloud

Larry Ellison makes it very clear that Oracle believes in a back to the future model where software and hardware meld together into “systems”, purpose-built, integrated solutions. In other words you won’t buy an Oracle database and a server and configure it to run a data warehouse, instead you’ll buy the “Oracle Data Warehouse Server.” The first such system is Exadata, which is apparently doing quite well, according to Ellison.

This is a classic bundling, although some may call it a tying strategy. Microsoft, seeing that they couldn’t win each office productivity segment individually—including word processing, spreadsheet and presentations—decided to play to their strength and bundle them into a solution that no individual company could compete with. This is bundling. The tying strategy is where Microsoft used their dominance in the operating system to tie the browser to the OS, thereby owning the browser market. In the case of Oracle, one could make a case either bundling or tying. I’m making neither a value, nor a legal judgment about Oracle’s strategy; I am just providing historical context.

Ellison points to Cisco and IBM, under T.J. Watson Jr., as examples of successful systems companies. But my question is simple: Will this back to the future strategy work against the cloud? Assembling solutions with pre-packaged systems is certainly easier than starting with more granular components like hardware and software. But does it really stack up against today’s benchmark, the cloud.

Let me use a transportation analogy:

Assembling all of the components (hardware, software, etc.): Like building a car piece by piece

Assembling systems (a la Oracle's Exadata and Cisco): Like building a car by installing large grain items, the chassis, wheels, engine, etc.

Using the cloud: Like buying a pre-built car off the lot

SaaS Applications: Like riding the subway

Most people are perfectly happy either buying a car or riding the subway. For really high-end performance, some may want to build their own car with components or by hand, but it’s a relatively small market.

I don’t expect any public cloud offerings to satisfy high-end enterprise demands…yet. But I have to admit, the cloud is evolving quite rapidly. Just look at Amazon and their introduction of Virtual Private Clouds, Elastic Block Services (a SAN in the sky), Boot from EBS, etc. I can launch an entire cluster with a mouse-click, without talking to IT. How can you beat that? Historical precedence is also on the side of commodity technologies, like the cloud, growing up to cannibalize the high-end. The PC cannibalized the workstation, which cannibalized the mini, which cannibalized the mainframe. From the clou's perspective, the trend is their friend.

The cloud won’t seriously threaten large enterprise systems for quite some time, but I believe it is just a matter of time. Oracle can certainly ride a strong wave of current demand for systems. I expect that in time they will also provide a compelling suite of solutions in the cloud. But if I were a bettin’ man I’d have to bet on the cloud; they have simplicity and history on their side. On the other hand, it is hard to bet against Ellison.

Monday, December 14, 2009

We Really Need TPC Benchmarks for the Cloud

TPC database benchmarks—which database vendors tune specifically for—are a useful objective comparison for buyers of databases. Unfortunately, there is no such comparison in the cloud, and the current cost/comparison approach used by TPC doesn’t fit the cloud.

Here are the problems:

1. TPC doesn’t include costs that are included in the cloud: Public cloud services bundle the costs of everything into their pricing. TPC eliminates things like: electricity, network connectivity, people to run the service, networking equipment (e.g. switches, cables, internet connectivity, etc.), load balancers, modems, Ethernet cards, etc. The public cloud is really a total cost of ownership, while TPC costs are not. So any cost/performance between onsite and cloud solutions compares apples to oranges.

2. TPC assumes that the expenses included above are paid in advance for three full years. Public clouds use a pay-as-you-go model. To compare apples-to-apples here, you would need to do a net present value (NPV) calculation to account for the time-value of money.

3. And this is the BIGGEST issue. TPC tests assume full utilization of the computer. Everyone knows that in the real world you (a) only use at most 80% of the CPU to accommodate usage growth and extreme peaks in performance; (b) between typical peaks and valleys in the remaining 80% your average usage in the real world is typically 10%-20% of the server’s capacity. The cloud, on the other hand, provides an elastic environment where you only pay for what you use. If you use a cloud-ready database that scales elastically, an average load factor of 10%-20% per server translates into saving 80%-90% of the costs versus a dedicated machine. In other words, an elastic cloud environment should reduce cost/transaction by 80%-90%, relative to a dedicated machine.

The cloud provides a normalized cost structure that reflects a more realistic total cost of ownership (TCO). It bundles costs like networking, personnel, electricity, internet access, and more, all in a pay-per-use model. But most importantly, we can get a transactions/instance, and then we elastically scale the instances as needed. This elastic pricing model gives us a real world cost scenario, instead of assuming that a single server is utilized at 100% capacity, which never happens in the real world.

For these reasons, I would like to see TPC create a category of benchmarks that measure cost/performance on standard cloud infrastructures.

Here is a link to Nobu, who ran TPC-C on Amazon’s RDS.

Friday, November 27, 2009

Virtual Databases: The Face of the New Cloud Database

Shared-disk databases can be virtualized—making them cloud-friendly—while shared-nothing databases are tied to a specific computer and a specific data set or data partition.

The underlying principle of the shared-nothing RDBMS is that a single master server owns its specific set of data. That data is not shared, hence the name shared-nothing. Because there is no ability to share the data, there is also no ability to virtualize the computing of that data. Instead the shared-nothing RDBMS ties the data and the computing to a specific computer. This association with a physical machine is then reinforced at the application level. Applications leveraging a shared-nothing database, that is partitioned across more than one server, use routing code. Routing code simply directs the various database requests to the servers that own the data being requested. In other words, the application must know which server owns which piece of data. This further reinforces the mismatch between shared-nothing databases and virtualization.

This is not to say that it is impossible to virtualize a shared-nothing database. As any software architect will tell you, “You can do anything in software…” The second part of that statement is “…but it may not perform or scale well, and it may make maintenance very painful.” The latter part of that statement is exactly what you will find with any effort to virtualize a shared-nothing database. Attempts to insert layers of indirection will result in added complexity that makes maintenance a nightmare. Finding bugs, tuning performance, recovering from failure, all of these issues are severely compounded when you introduce layers of indirection in a shared-nothing database.

The performance, and hence the scalability are also undermined in this model. In order to support dynamic virtualization, you must mediate the requests from the application before they hit the database. This requires a piece of middleware that sniffs each database request and routes it to the appropriate server. What happens when a database request spans multiple servers? Suffice it to say it isn’t pretty, and it doesn’t perform well. This sort of request will result in a lot of data shipping and joins. The bottom line is that partitioning your database to achieve performance, scalability and maintenance is a black art, all attempts to automate this process have failed.

Compare this to the shared-disk DBMS. Shared-disk separates the compute from the storage. The data is stored in one big trough, while you can have any number of compute instances feeding on the entirety of that data. Because each node has access to all of the data, you don't need any middleware to route the database requests to specific servers. Furthermore, each of the compute nodes is identical, making them virtualization-friendly. If one node fails, the others recover the transactions, while the application continues uninterrupted. You can also add nodes on the fly, again without interrupting the application. For these reasons, the shared-disk RDBMS is ideal for virtualization, while the shared-nothing RDBMS is anathema to virtualization.

This is an excerpt from a white paper I'm writing that addresses virtualized cloud databases.

Friday, November 20, 2009

Who owns the customer in the cloud?

In the world of technology, customer ownership has always been a huge issue. The company that owns the relationship is able to influence purchasing decisions that surround their product(s). For example, if the customer is tied into a specific application, that application can influence purchases down the stack (database, operating system, etc.). In these cases the specific (e.g. vertical applications) had an inherent advantage over the more generic or interchangeable (e.g. databases).

Then companies began to standardize on certain infrastructure elements. For example, a company might say “We are a Windows shop” or “We are an Oracle shop” and unless you had a REALLY compelling reason, you had to run on that infrastructure.

Cloud computing introduces a new dynamic. For example, you might be an HP equipment company, but if you use Amazon AWS, what is their equipment? They won’t tell you. What is the storage equipment used in S3 and EBS? Sorry, can’t say. You see those pieces are commodities.

In time, the process of commoditization will move right up the stack, and the cloud vendor strongly influences this process. Sure you can run various databases on AWS, just create your own AMI. But if you want the vertically integrated “out-of-the-box” database, that would be RDS running MySQL.

So, who owns that MySQL customer? Amazon, not MySQL/Sun/Oracle. The Amazon package bundles EBS, automation and more, and only Amazon can support it properly.

Who owns the customer when an application like SugarCRM is run on Amazon? That depends. Who sold the customer. If SugarCRM sold the customer and offered it as SaaS, then they own the customer. If the customer goes directly to Amazon, then Amazon owns the customer. If Amazon takes it one step further by integrating various other services, tools, etc. and then brands the resulting package as Amazon CRM, then Amazon will increasingly own the customers of the future.

In fact, I wouldn’t be surprised to see Amazon offer a suite of fully integrated business applications based upon open source applications and the LAMP infrastructure, sort of a back-office suite. Then you don’t have to worry about all those messy integration issues. Your CRM application will work with your document management application, etc. Sure, you could assemble your own solution in a piecemeal fashion, but why would you?

At that point, the question is not what CRM do you use, but rather which cloud suite do you use and is it compatible with my cloud suite. In other words, does the Amazon suite interoperate with the Cisco, EMC/VMWare, HP, IBM, and Rackspace suites.

The starting point for all of these solutions will be LAMP and open source applications. But each will focus initially on automation to simplify the effort, like RDS. Then they will deliver interoperability. Finally they will innovate in proprietary ways to deliver a better experience. And with each step they will further establish their ownership of the customer. This is why every major technology company, with the exception of Oracle, is assembling their own cloud solution, because they don’t want someone else owning their customers.

Friday, October 30, 2009

The Cloud vs. Open Source

Let’s get ready to rumble! Providing cloud services (a la Amazon AWS) is a business of slim margins. Because of this, cloud vendors are more than happy to exploit open source to keep their costs low. However, what happens when they siphon off support business from the open source vendors themselves? The cloud vendor becomes the single point of contact/support for the entire collection of tools, so who needs a support contract with the individual open source vendors? What revenue crumbs does this leave for the FOSS companies to live on? Not much.

The latest example of this trend is Amazon’s Relational Database Services (RDS). It is essentially a packaging and automation of vanilla MySQL. They automate set-up and administration. They also restrict things like slaves and replication, because they are a pain to manage. But they provide a failover solution (basically attaching your data to a fresh machine), which will address some use cases. The out-of-the-box integration with EBS makes it a breeze to work with. RDS makes it quick and painless to get MySQL running, so why roll your own on premise solution?

As competition in the cloud accelerates, I suspect that this trend will accelerate. Cloud vendors will integrate various tools, provide automation and become the single point of contact for support. This approach lowers the ultimate cost to consumers, simplifies their support process, and creates barriers to exit by customers, while maintaining the cloud vendors’ margins. In short, if you think the cloud is cannibalizing FOSS revenues now, you ain’t seen nuthin yet.

Soon we will see round two in this battle of the titans. Cloud vendors, in an effort to differentiate from one another will offer proprietary extensions/modifications to open source. It’s just a matter of time. These extensions may be developed in-house, or they may be acquired from third parties. What is the motivation to provide these extensions back to the open source community? Legally, the cloud vendors are fine, since they don’t redistribute the code. So why provide them back just to have your competitors integrate them into their own cloud services?

How could the FOSS community fight back? The only approach I see is the legal approach. If the FOSS license agreements redefine cloud/SaaS as being a form of distribution that requires open sourcing any extensions or modifications, they might have a chance. Maybe this comes in the form of a “hosting for third-parties use” clause or something. Otherwise, just like the classic Buggles song “Video Killed the Radio Star”, Cloud just might kill FOSS.

Thursday, October 29, 2009

Open Source Licensing Considerations

The two predominant forms of open source licenses are BSD and GPL. PostgreSQL is licensed under the BSD license , while MySQL is licensed under GPL . While the details are arcane, the business impact is significant, and that is what this post addresses.

The BSD (or BSD-style) License: This license basically says: ‘This code is provided as is, do what you want with it, and include this copyright in your resulting product.’

The GPL License: This license, also known as the copyleft license, essentially says: ‘This is free and distributed as source code, and any addition or extension must also be distributed under these exact terms.’

BSD essentially says I prefer open source code, so I’m making my source code open and freely available, but what you do with it is your own business. GPL is based upon the belief that all software should be open source as espoused by Richard Stallman and the Free Software Foundation (FSF). The GPL license acts like a virus, attaching itself to anything that combines with that GPL code. I don’t mean to imply negative intentions, their intentions are to ensure that open source does not become perverted through the insertion of proprietary code, which is a very admirable goal.

Companies that want to operate in the ecosystem of a GPL product must agree to forgo the most common and most profitable business model used in software, namely licensing of closed source applications. (Note: This excludes those companies that utilize hybrid licensing, of course.) While the intentions behind GPL are good, there are unintended consequences. Consider the following situations:

1. Inbound License Conflicts: My application might include licensed images, code, linked libraries, etc. that is not GPL and refuses to accept the GPL license. I cannot use this in a GPL environment.

2. Reuse of Code in Other Products: If a GPL product (e.g. MySQL) has a really cool piece of code I want to deploy in another product, whether closed source or open source, that is licensed under a different model, I cannot do so.

3. If I donate my code to a GPL effort, giving them the copyright, I cannot reuse that code in a non-GPL product, unless the GPL product uses a dual license for the copyright (AKA shared copyright).

4. Niche Markets: All companies must make money in order to survive. If your software is free, then you can make money in ancillary ways, such as charging for support or consulting. This is fine if you have a large number of users. Consider that only 1 in 14,000 MySQL users pay for support. Let’s assume that you invest considerable effort into building a niche product that appeals to a total addressable market of 10,000 customers, and over time you get 50% market share or 5,000 customers. Now, if you charged $1,000 license subscription, you would make $5,000,000 per year. Even if your user base was 10% of this size, as a result of charging, you would still make $500,000 off of 500 customers. But if you charged $1,000 per year for support and 1 in 5,000 paid, you would make $1,000 per year. In conclusion, it is very difficult to recoup your investment of time and money if you invest in a niche market and you are prohibited from charging a license fee.

These are just a few of the challenges one faces when working within the restrictions of a GPL license, as opposed to other less restrictive open source licenses. GPL makes it more difficult to assemble a thriving ecosystem because it limits the types of applications and business models the ecosystem can use.

GPL extends the terms of its license to cover additions and extensions to GPL products in an effort to ensure that the code remains open source. But is this really necessary? If we look at Postgres, it uses the more permissive BSD-style license. Yet Postgres remains open and supports a thriving ecosystem. One might argue that MySQL is larger, thus validating its licensing model. I believe that this is not tied to the license. The real impact of the license is tested when the companies build sustainable ecosystems around their products, and in that realm, the jury is still out.

Being a Platform is Everything

… when you get the money, you get the power. Then when you get the power, then you get the women.” --Tony “Scarface Montana

In the world of computing, first you get the users, then you get the applications, then you get the power. What do I mean by power? In a word “platform”. If the only way for users to get applications is through you, and the only way for application developers to get to users is through you, then you are a platform. If you continue to nurture and grow your platform, your company is immortal and it is a goose that will continue to lay golden eggs.

To get the users, you need to deliver immediate value. Once you achieve critical mass of users, the developers will start showing up, whether you want them or not. A good example of this was Myspace. They attracted so many users, that developers started providing extensions directly to these users without Myspace’s blessing. But instead of embracing these developers and their applications—and thereby achieving immortality—Myspace took the perspective that these applications were leaches cutting in on their franchise. Distant second place contender Facebook embraced developers and the rest is history. Facebook growing, Myspace shrinking.

Another classic example of the power of the developer is the iPhone. Before the iPhone, the carriers would pick and choose which applications would be “on deck” and thereby available to that carrier’s users. It was a long and expensive process and you had to run separate processes for each carrier.

The iPhone came along and made it easy for users to find and use any number of applications and load them on their phone. This has turned the phone industry on its head. Building a developer-friendly platform is in the DNA of Apple, clearly it wasn’t in the DNA of mobile carriers, but they are learning.

The challenge in dealing with developers is that if you invite them in the front door, they will then want access to the back door and the side doors. By this I mean that if you provide a base platform with a certain set of functions, the initial wave of applications will build on top of this platform. Then others will find deficiencies in the platform and they will want to extend the core platform. This is analogous to going in the back door. Then others developers will want to connect your platform to other applications or services (the side door).

If you only open the front door and barricade the side doors and the back door, you expose yourself to the risk of a more open platform stealing your users and application developers. Apple has historically opened only the front door to developers. They are following this model once again with the iPhone, resulting in jailbreaking efforts. If another device comparable to the iPhone comes along and provides a more open and flexible platform it too could displace the iPhone. The Palm Pre is just being released and they are talking about opening it to all applications. Google’s Android operating system is also a threat.

How does this topic apply to ScaleDB? MySQL has become a platform, but will they continue to nurture and grow the platform, or will they barricade the side and back doors, thus driving developers into the open arms of competitors like PostgreSQL? See my next post…