Friday, September 17, 2010

ScaleDB Cache Accelerator Server (CAS): A Game Changer for Clustered Databases

ScaleDB and Oracle RAC are both clustered databases that use a shared-disk architecture. As I have mentioned previously, they both actually share data via a shared cache, so it might be more appropriate to call them shared-cache databases.

Whether it is called shared-disk or shared-cache, these databases must orchestrate the sharing of a single set of data amongst multiple nodes. This introduces two challenges: the physical sharing of the data and the logical sharing of the data.

Physical Sharing:
Raw storage is meant to work on a 1:1 basis with a single server. In order to share that data amongst multiple servers, you need either a Network File System (NFS), which shares whole files, or a Cluster File System (CFS), which shares data blocks.

Logical Sharing:
This is specific to databases. A database may request a single block of data from the storage and then it may coordinate multiple sequential changes to that block, with only the final results being written back to the storage. The database can also discriminate between reading the data and writing the data, to facilitate parallelizing these actions.

Databases must control the logical sharing of data, in order to ensure that the database doesn’t become corrupted or inconsistent, and to ensure that it provides good performance. Because logical sharing is very specific to the database, it is something that clustered databases must handle themselves. This function is addressed by a lock manager.

Physical sharing of data requires less integration with the database logic. As such, you can use a general-purpose NFS or a CFS to provide the physical file sharing capabilities. This is what Oracle RAC does, they rely upon Oracle Cluster File System 2 (OCFS2) to provide generic physical file sharing. OCFS2 then relies upon a SAN or NAS that supports multi-attach, since all of the database nodes must share the same physical files. The NAS or SAN then handles the data duplication for high availability and other services like back-up and more.

ScaleDB takes a different approach. ScaleDB not only handles the logical data sharing—with its lock manager—but it also handles the physical data sharing with its Cache Accelerator Server (CAS). CAS connects directly to the storage and handles the sharing of that data among the database nodes. Because CAS is purpose-built for the ScaleDB database it does not need services such as membership management, which create complexity and overhead in a general purpose CFS. Furthermore, ScaleDB is able to tune the CAS, in conjunction with the lock manager, to extract superior performance.

CAS also offers additional benefits. It provides a scalable shared-cache that enables the database nodes to share via the cache, which is much faster than sharing via the disk. Furthermore, since it eliminates the need for an NFS or CFS, it enables you to work with any storage. You can choose to use local storage—inside the CAS—cloud storage, or a SAN or NAS. Many in the MySQL community balk at the high cost of SAN storage with fiber channel and switches and high-cost storage. CAS supports low-cost local storage, while providing a seamless path to high-end storage as needed. Furthermore, the CAS are deployed in pairs, so the data is mirrored. Because the data is mirrored, you have redundant storage, even when using local storage inside the servers running CAS. Because it can operate on commodity hardware and because it works with any storage, CAS is ideal for cloud computing.

In summary, clustered databases like Oracle RAC and ScaleDB must implement their own lock managers to manage the logical sharing of data amongst the database nodes. Providing a purpose-built solution for the physical sharing of the data, while not required, does provide some significant advantages over using a general purpose NFS or CFS.

Tuesday, August 31, 2010

Cloud Insight: HP, Dell, 3PAR, VMWare & ScaleDB

The bidding war between HP and Dell for 3PAR has created great theater. The rationale is simple, both HP and Dell want a complete set of products to sell into the new cloud space and 3PAR is the only bitsized morsel among EMC, IBM and Hitachi that addresses this space. What is the compelling advantage they offer in storage? Elasticity. 3PAR provides the ability for companies to add/remove storage in thin slices (AKA thin provisioning). How does this relate to ScaleDB? We do the exact same thing for databases in the cloud and we do it for the most popular database in the cloud, MySQL.

How does VMWare play into this? Their CEO Paul Maritz was on CNBC talking about the hybrid cloud and how companies want to run core cloud capabilities on premise and then use the public cloud providers to handle compute overflow during peak usage. This means that public cloud value to corporations, assuming Maritz is correct, is based largely on their ability to provide elasticity. It will no longer be sufficient for public cloud companies to provide reserved servers, because the reserved servers will be run in the company’s data center. The public cloud will add/remove servers to handle peaks in usage. So elasticity is EVERYTHING. ScaleDB is all about elasticity for the database.

It is also interesting to note from the Maritz interview that he sees the next wave of cloud (and hence the next wave of cloud consolidation) coming from the software sector. More specifically, the ability to take existing applications and make them run on the cloud. In other words, to make them elastic. Again, this is exactly what ScaleDB does. We take existing MySQL applications and make them elastic.

It is also interesting to note that HP and Dell have decimated their own R&D and are now looking to acquire that expertise from outside, and they are willing to pay for the expertise.

Another theme playing out in the background makes this situation even more interesting. Oracle has adopted a systems approach, where they combine their hardware and software:

“The heart of the interview focused on Oracle's interest in Sun. By combining Sun's expertise in hardware with Oracle's software, Ellison suggested, the combined company can become a powerful "systems" company that sells solutions to businesses. The competitor that Ellison wants to beat: IBM.”

Summary: Cloud is the next battle ground. It all starts with the hardware/infrastructure (e.g. 3PAR) and then moves upstream to software. Oracle will be focused on selling complete systems, alienating HP & Dell, among others. This is compounded by the fact that HP and Dell have decimated their R&D, so they are forced to partner/acquire. At the same time, if Maritz’s vision of public clouds becoming effectively excess capacity for handling peaks from corporations is realized, then elasticity in the cloud will become critical as well. This obviously plays to ScaleDB’s strengths.

Tuesday, August 10, 2010

Comparing ScaleDB’s Shared Cache Tier vs. NFS and CFS

Prior posts addressed the performance benefits of a shared cache tier (ScaleDB CAS) and also the storage flexibility it enables.This post compares the ScaleDB CAS purpose-built file storage sharing system against off-the-shelf solutions like NFS and various cluster file systems (CFS).

When using a clustered database, like ScaleDB, each node has full access to all of the data in the database. This means that the file system (SAN, NAS, Cloud, etc.) must allow multiple nodes to share the data in the file system.

Options include:
1. Network File System (NFS)
2. Cluster File System (CFS)
3. Purpose-built file storage interface

Locking Granularity:
I won’t get deeply into the nuances of CFS (block-level ) and NFS (file-level, but you can address within the file), suffice it to say that generally speaking NFS and CFS will allow you operate on blocks of data, which are typically 8KB. Let’s say you want to operate on a record that is 200 bytes within an 8KB block. You are locking 8KB instead of 200 bytes, or 40X more than necessary.

ScaleDB’s CAS uses a purpose-built interface to storage that is optimized to leverage insight from the cluster lock manager. This enables it to lock the storage on the record level. In situations where multiple nodes are concurrently accessing data from the same block, this can be a significant performance advantage. This reduces the contention between threads/nodes enabling superior performance and nodal scalability.

Intelligent Control of RAM vs. Disk:
When writing data to storage, you can either flush it directly to disk or you can store it in cache, allowing the disk flushing to occur later, outside of the transaction. Some things, like log writing require the former, while other things work just fine (and faster) with the latter. Unfortunately, generic file systems like NFS and CFS are not privy to this insight, so they must err on the side of caution and flush everything to disk inside the transaction.

ScaleDB’s CAS is privy to the intelligence inside the database. It is therefore able to push more data into cache for improved performance. Furthermore, this optimization can be configured by users, based on their own requirements. The net result is superior performance.

As general purpose solutions, NFS and CFS cannot benefit from the insight and intelligence from the internal operation of the database. Instead, NFS and CFS must act in a generalized manner. ScaleDB’s Cluster Accelerator Server (CAS), leverages insight gleaned from the cluster lock manager, and from user configurations, to optimize its interaction with storage. This makes CAS more efficient and scalable, and it improves performance.

Wednesday, August 4, 2010

Shared Cache Tier & Storage Flexibility

Any time you can get two for the price of one (a “2Fer”), you’re ahead of the game. By implementing our shared cache as a separate tier, you get (1) improved performance and (2) storage flexibility…a 2Fer.

What do I mean by storage flexibility? It means you can use enterprise storage, cloud storage or PC-based storage. Other shared-disk cluster databases require high-end enterprise storage like a NAS or SAN. This requirement was driven by the need for:

1. High-performance storage
2. Highly available storage
3. Multi-attach, or sharing data from a single volume of LUN across multiple nodes in the cluster.

Quite simply, you won’t see other shared-disk clustering databases using cloud storage or PC-based storage. However, the vast majority of MySQL users rely on PC-based storage, and most are not willing to pay the big bucks for high-end storage.

ScaleDB’s Cache Accelerator Server (CAS) enables users to choose the storage solution that fits their needs. See the diagram below:

Because all data is mirrored across paired CAS servers, it delivers high-availability, because if one fails the other continues running. Built-in recovery completes the HA solution. If you want further reassurance you can use a third CAS as a hot standby. This means that you can use the internal hard drives on your CAS on servers to provide highly-available storage.

The next post in this series on CAS will compare ScaleDB CAS, Network File System (NFS) and Cluster File System (CFS).

Monday, August 2, 2010

Database Architectures & Performance II

As described in the prior post, the shared-disk performance dilemma is simple:

1. If each node stores/processes data in memory, versus disk, it is much faster.
2. Each node must expose the most recent data to the other nodes, so those other nodes are not using old data.

In other words, #1 above says flush data to disk VERY INFREQUENTLY for better performance, while #2 says flush everything to disk IMMEDIATELY for data consistency.

Oracle recognized this dilemma when they built Oracle Parallel Server (OPS), the precursor to Oracle Real Application Cluster (RAC). In order to address the problem, Oracle developed Cache Fusion.

Cache fusion is a peer-based shared cache. Each node works with a certain set of data in its local cache, until another node needs that data. When one node needs data from another node, it requests it directly from the cache, bypassing the disk completely. In order to minimize this data swapping between the local caches, RAC applications are optimized for data locality. Data locality means routing certain data requests to certain nodes, thereby enjoying a higher cache hit ratio and reducing data swapping between caches. Static data locality, built into the application, severely complicates the process of adding/removing nodes to the cluster.

ScaleDB encountered the same conflict between performance and consistency (or RAM vs. disk). However, ScaleDB’s shared cache was designed with the cloud in mind. The cloud imposes certain additional design criteria:

1. The number of nodes will increase/decrease in an elastic fashion.
2. A large percentage of MySQL users will require low-cost PC-based storage

Clearly, some type of shared-cache is imperative. Memcached demonstrates the efficiency of utilizing a separate cache tier above the database, so why not do something very similar beneath the database (between the database nodes and the storage)? The local cache on each database node is the most efficient use of cache, since it avoids a network hop. The shared cache tier, in ScaleDB’s case, the Cache Accelerator Server (CAS), then serves as a fast cache for data swapping.

The Cluster Manager coordinates the interactions between nodes. This includes both database locking and data swapping via the CAS. Nodes maintain the data in their local cache until that data is required by another node. The Cluster Manager then coordinates that data swapping via the CAS. This approach is more dynamic, because it doesn’t rely on a prior knowledge about the location of that data. This enables ScaleDB to support dynamic elasticity of database nodes, which is critical for cloud computing.

The following diagram describes how ScaleDB’s Cache Accelerator Server (CAS) is implemented.

While the diagram above shows a variety of physical servers, these can be virtual servers. An entire cluster, including the lock manager, database nodes and CAS could be implemented on just two physical servers.

Both ScaleDB and Oracle rely upon a shared cache to improve performance, while maintaining data consistency. Both approaches have their relative pros and cons. ScaleDB’s tier-based approach to shared cache is optimized for cloud environments, where dynamic elasticity is important. ScaleDB’s approach also enables some very interesting advantages in the storage tier, which will be enumerated in subsequent posts.

Tuesday, July 20, 2010

Database Architectures & Performance

For decades the debate between shared-disk and shared-nothing databases has raged. The shared-disk camp points to the laundry list of functional benefits such as improved data consistency, high-availability, scalability and elimination of partitioning/replication/promotion. The shared-nothing camp shoots back with superior performance and reduced costs. Both sides have a point.

First, let’s look at the performance issue. RAM (average access time of 200 nanoseconds) is considerably faster than disk (average access time of 12,000,000 nanoseconds). Let me put this 200:12,000,000 ratio into perspective. A task that takes a single minute in RAM would take 41 days in disk. So why do I bring this up?

Shared-Nothing: Since the shared-nothing database has sole ownership of its data—it doesn’t share the data with other nodes—it can operate in the machine’s local RAM, only writing infrequently to disk (flushing the data to disk). This makes shared-nothing databases very fast.

Shared-Disk: Cannot rely on the machine’s local RAM, because every write by one node must be instantly available to the other nodes, to ensure that they don’t use stale data and corrupt the database. So instead of relying on local RAM, all write transactions must be written to disk. This is where the 1 minute to 41 days ratio above comes into play and kills performance of shared-disk databases.

Let’s look at some of the ways databases can utilize RAM instead of disk to improve performance:

Read Cache: Databases typically use the RAM as a fast read cache. Upon reading data from the disk, this data is stored in the read cache so that subsequent use of that data is satisfied from RAM instead of the disk. For example, upon reading a person’s name from disk, that name is stored in the cache for fast access. The database wouldn’t need to read that name from disk again until that person’s name is changed (rare), or that RAM space is reused for a piece of data that is used more frequently. Read cache can significantly improve database performance.

BOTH shared-disk and shared-nothing databases can exploit read cache. The shared-disk database just needs a system to either invalidate or update the data in read cache when one of the nodes has made a change. This is pretty standard in shared-disk databases.

Background Writing: Writing data to the disk is by far the most time consuming process in a write transaction. During the transaction, that portion of the data is locked, meaning it is unavailable for other functions. So, if you can move the writing of the data outside of the transaction—write the data in the background—you get faster transactions, which means less locking contention, which means faster throughput.

SHARED-NOTHING can exploit this performance enhancement, since each server owns the data in its RAM. However, shared-disk databases cannot do this because they need to share that updated data with the other database nodes in the cluster. Since the local node’s cache is not shared, in a shared-disk database, the only option is to use the shared disk to share that data across the nodes.

Transactional Cache: The next step in utilizing RAM instead of disk is to use it in a transactional manner. This means that the database can make multiple changes to data in RAM prior to writing the final results to disk. For example, if you have 100 widgets, you can store that inventory count in RAM, and then decrement it with each sale. If you sell 23 widgets, then instead of writing each transaction to disk, you update it in RAM. When you flush this data to disk, it results in a single disk write, writing the inventory number 77, instead of writing each of the 23 transactions individually to disk.

SHARED-NOTHING can perform transactions on data while it is in RAM. Once again, shared-disk databases cannot do this because you might have multiple nodes updating the inventory. Since they cannot look into each others local RAM, they must once again write each transaction to disk.

As you can see, shared-nothing databases have an inherent performance advantage. The next blog post will address how modern shared-disk databases address these performance challenges.

Thursday, April 1, 2010

ScaleDB Introduces Clustered Database Based Upon Water Vapor

ScaleDB is proud to announce the introduction of a database that takes data storage to a new level, and a new altitude. ScaleDB’s patent pending “molecular-flipping technology” enables low energy molecular flipping that changes selected water molecules from H20 to HOH, representing positive and negative states that mimic the storage mechanism used on hard drive disks.

“Because we act at the molecular level, we achieve massive storage density with minimal energy consumption, which is critical in today’s data centers, where energy consumption is the primary cost,” said Mike Hogan, ScaleDB CEO. “A single thimble of water vapor provides the same storage capacity as a high-end SAN.”

The technology does have one small challenge: persistence. Clouds are not known for their persistence. ScaleDB relies on the Cumulus formation, since it is far beefier than some of those wimpy cirrus clouds. However, when deployed in the data center, the dry heat can be particularly damaging to cloud maintenance. One of the company’s patents centers around using heavy water, which resists evaporation and is therefore far more persistent than its lighter brethren. The company has already received approval from the IAEA to commercialize this technique.

This new technology considerably improves ScaleDB’s “green cred”. By greatly reducing energy consumption in data centers, it cuts their carbon footprint, leaving little more than a toeprint. Once the cloud storage—which has a 3-year half-life—is worn out, you can release it into the atmosphere. There is mingles with natural clouds making them denser and more reflective. Leading IPCC climate scientists have modeled the effects of this mingling and the scientific consensus is that it will reduce global temperatures by 5-6 degrees centigrade within 20 years (+/- 10 degrees centigrade). The company is in negotiations with Al Gore to promote this new technology, but they cannot comment on these negotiations because the mere fact that such negotiations are in fact happening is covered by a strict NDA and the even more legally binding pinky promise.

ScaleDB set out to become THE cloud database company and today’s announcement really takes that to a whole new level. The tentative name for this new database is VaporWare.

Friday, February 26, 2010

Will the NoSQL Movement Unseat the Database Behemoths?

With the introduction of each new platform, comes the opportunity for new thinking, new applications and new winners. DEC and Oracle were beneficiaries of the move to the minicomputer. Microsoft was the main beneficiary of the move to the PC. Sun rode the workstation to fame. Today’s exciting new platform is the cloud, and one of the upstart contenders is NoSQL.

One might argue that the cloud is merely the hosting of well established platforms such as the PC. Larry Ellison has made this very claim. However, the cloud is very different.

How is the cloud different? Sometimes when you combine things, the combination is very different than the components. For example, Salt (NaCl) is very different from its poisonous individual components. Cloud computing enjoys a similar combinatory effect. Sure it is merely a mixture of PC platforms, virtualization, lots of Linux and low-cost scalable disk arrays. But the combination is more about dynamic on-demand elasticity, elimination of capital expense, instant access to compute resources (versus slow hardware requisitioning), reduced IT headcount hassles, etc. In other words, cloud computing is no longer about the components, it is more about changing how we think about and use computing resources; it is a new paradigm for the consumption of computing resources.

With this new paradigm, comes a new mentality. Cloud developers expect that all aspects of the cloud to scale dynamically. This is where the shared-nothing SQL database comes up short. It is also where the NoSQL option excels.

We in the SQL world could easily dismiss NoSQL, saying NoSQL = NoEnterprise. How can you build a real application on something that doesn’t offer transactions, data consistency, SQL, etc. Real database people turn up their noses at those little key-value pair NoSQL toys. Not so fast.

SimpleDB just fired a shot across the bow of the database big boys with forced consistency. Sure you pay a price for this, and it should only be invoked when it is truly needed, but the point is you CAN do it. The history of technology is littered with the bodies of high-end products that were cannibalized from below, as lighter-weight platforms won the price/volume game. Cloud will definitely win the price/volume game; you simply cannot beat the economics. The question is who will win the cloud database war.

NoSQL databases (e.g. Cassandra, SimpleDB, BigTable, CouchDB, Mongo DB, etc.) will continue to nibble away at the rationale for sticking with big SQL databases. As the leading web database, MySQL became the de facto cloud database, since web and Web 2.0 applications were the early adopters of the cloud. But MySQL cannot rest on its laurels. NoSQL solutions are nipping at MySQL’s heels and their dynamic elasticity is quite appealing.

Now enterprise customers are beginning to move to the cloud. At the same time, NoSQL solutions are adding capabilities once reserved to relational databases. This raises a LOT of questions:

1. Will NoSQL undermine its scalability as it adds more enterprise capabilities (Will these extensions bolt on smoothly or will they result in an awkward and ultimately unscalable Frankenstein)?

2. Will the big SQL database vendors continue to dismiss NoSQL as toys, or will they see them for the threat they are becoming (Should we expect the commercial database vendors to start buying NoSQL solutions)?

3. Will MySQL be the first to succumb to the NoSQL onslaught (Did Oracle just buy yesterday’s cloud database leader)?

4. Will a third-party candidate like ScaleDB, with its shared-disk architecture win with a “best of both worlds” approach that scales dynamically and provides enterprise SQL capabilities?

5. Will SQL and NoSQL co-exist as different tools for different problems, or with they evolve into direct competitors across most major segments?

My Thoughts:
At the moment, SQL databases and NoSQL are different tools for different problems. I think this remains the case, but I believe that NoSQL will spread its reach by adding capabilities that begin to eat into traditional relational database segments. I suspect that the large commercial database companies, after ignoring NoSQL for too long, will resort to buying some of them and integrating them into their product portfolios. Companies focused solely on worldwide scalability like Google, will remain wedded to NoSQL, because any technology that doesn’t scale to 10,000 servers is a non-starter. Enterprises will take a “right tool for the job” approach, employing all of the above.

NoSQL and map-reduce technologies will excel in non-transactional roles like data warehouses, business intelligence (DW/BI). In the OLTP space, SQL databases will remain far more prominent. However, the pain of dynamically scaling shared-nothing databases—and sharding is a pain—will create a need for the dynamically elastic shared-disk databases like ScaleDB. The sweet spot for shared-disk probably peaks at about 80-100 database servers. This level of scaling should be sufficient for all but the largest companies. Beyond that, NoSQL (utilizing little or no scale-limiting constraints like forced consistency) will be the only option.

I would love to hear your thoughts in the comments section below…

Wednesday, February 10, 2010

Cloud Computing: Shared-Disk vs. Shared-Nothing

Anant Jhingran (IBM’s CTO, Information Management, Analytics and Optimization) challenged our assertion that the cloud benefits the shared-disk database architecture. For me to enter into a battle of technical vision with Anant is equivalent to bringing a knife to a gun battle, but I enjoy a good challenge.

1. Cloud storage: Anant argues that (a) SANs won’t beat local disk in costs; (b) many shared-nothing databases use SANs anyway. To quote Inigo Montoya from Princess Bride: “Let me ‘splain. No there is too much. Let me sum up

Response: (a) While some clouds use traditional SAN or NAS storage, the trend among clouds is to assemble large collections of low-cost disks using a cluster file system to handle disk striping and data redundancy thereby providing SAN-like capabilities. As a result, the economics are quite similar to those of local disk; (b) We play in the MySQL market, where the vast majority of the databases use the local disk, making the comparison quite valid…for us. That said, we find that MySQL also commands a large percentage of the installed base on the cloud, making the comparison valid in general.

My broader point: Historically, the shared-nothing database had many advantages over the shared-disk database, particularly in the area of shared storage. Two major factors were at play: (1) shared storage was very expensive; (2) shared-disk databases split the storage performance across multiple nodes, meaning that performance of “Z” meant that a 4-node shared-disk database would only deliver 1/4 x Z to each node, making it expensive to deliver comparable performance on a per node basis. The cloud minimizes (and is on a trajectory to eliminate) shared-nothing’s historical advantage in these areas, by getting cheaper and faster. By rendering these traditional shared-nothing advantages moot, the two architectures are able to compete on other attributes, where shared-disk excels, such as operational simplicity and dynamic elasticity. These advantages are particularly relevant to the cloud. Shared-disk actually reduces costs in the cloud because it: (i) eliminates the need for redundant slaves (since each node provides fail-over to the other nodes); (ii) provides more evenly balanced load, since nodes are not specialized; (iii) supports dynamic elasticity at the database node level, where you only use/pay for the instances you need at the time.

2. Network Bandwidth: Anant suggests that this point is moot in comparing traditional and cloud computing.

Response: Maybe in the IBM/DB2 world where “many of the shared-nothing implementations of our clients use SANs”, but this is not the case in the MySQL world. Network performance plays a huge part in comparing shared-storage vs. local storage. Again from a historical perspective, back when shared-nothing became all the rage and MySQL took off as the M in LAMP, Ethernet and Fast Ethernet were a serious bottleneck on shared-disk performance. Now, with Gigabit Ethernet, Fiber Channel and Infiniband, there is further leveling of the playing field. This is not cloud specific. Improvements in network performance leveled the playing field between the two database architectures, but the storage costs described above still played a big part. It was after the cloud changed the economics on storage that we began to see a reassessment of the traditional bias for shared-nothing.

3. Virtualization: Scaling up/down stateless CPUs is easier in the shared-disk architecture. But global state (e.g. locks) undermine the independence of the virtualized nodes. In addition, the database typically likes to take control of the entire stack.

Response: ScaleDB does not take control of the entire stack, instead it is VM-friendly. ScaleDB’s implementation of the shared-disk model relies on a centralized lock manager, which also coordinates buffers and recovery among the nodes. It serves to coordinate the independent actions of the nodes, not to control them. They continue to act independently, from the perspective of the application. This combination makes ScaleDB very cloud friendly. You can surely argue that shared-nothing can scale to a larger number of nodes, but (a) most applications can get by with 50 or fewer database nodes; and (b) the process of scaling database nodes and maintaining those nodes in a shared-nothing cluster is quite painful.

If your argument is that shared-nothing has less state, it does, but it imposes more state information on the application, load balancer and the storage than shared-disk, so it is a trade-off. The key is to manage state in a scalable manner as we do in the ScaleDB lock manager.

4. As I understand it, the argument is for duplicate machines and distributed data that are loosely coupled, enabling rapid kill/restart in case of failure. The argument being that this is easier in shared-nothing.

Response: If I understand your point correctly, this would be easier in shared-disk. Shared-nothing introduces complexity in keeping replicates, backups, general database file reorganization, and QOS issues in a multi-tenant environment. By avoiding this pain, shared-disk is easier to maintain than shared-nothing. In short, the kill/redirect model of shared-disk provides faster response to failure that the kill/restart model employed by shared-nothing, and it is far easier to maintain.

Conclusion: In answer to points #1 and #2 above, advances in networking and storage have narrowed the gap between shared-disk. Cloud economics have then made this powerful shared storage economically compelling. For points, #3 and #4, the advantage goes to shared-disk. In addition, the natural synergy between cloud computing and shared-disk database go much further:

a. Instead of using a fixed partitioning model like shared-nothing, shared-disk is dynamically elastic. You can add storage capacity and compute capacity on the fly without interruption or additional work. In addition to the flexibility this affords to the developer, it also enables scaling on demand. The static partitioning model of shared-nothing invariably results in reserving over-capacity to accommodate for usage spikes and future growth. Since the cloud enables on-demand allocation of resources on a pay-per-use model, shared-disk is simply more compatible with the cloud.

b. The elimination of the partitioning/sharding of data and the replication, promotion and synching of slaves reduces the burden on the user and on the cloud administrator. Look closely at Amazon’s RDS and you’ll see that these things are disabled because they are a pain to maintain. The simplicity of the shared-disk architecture wins this as well.

c. Economics 1: See the cloud database white paper I wrote on this. Compute instances are more expensive than storage in the cloud. Since shared-disk generally uses fewer compute instances—by eliminating slaves and through better distribution of database requests via cluster-level load balancing—the cost of a shared-disk system will, in most cases, be lower than shared-nothing.

d. Economics 2: Since shared-disk is more dynamic, enabling scaling on the fly, one can replace a large instance used by the more rigid shared-nothing database, with a collection of smaller instances. Given the disproportionate increase in pricing of large instances, relative to aggregate performance of less expensive smaller instances, it is more economical to use shared-disk in the cloud. Consider, for example, using a 10-node shared-disk cluster costing $.85 per hour versus a single Quadruple Extra Large Instance for a shared-nothing database costing $2.40 per hour (costing almost three times as much). Then consider that you could scale down to two nodes in the shared-disk example during slow times, paying only $.17, instead of maintaining the $2.40 per hour shared-nothing database.

I maintain my assertion that both network performance and cloud storage have leveled the playing field for the underlying economic and performance comparisons between shared-nothing and shared-disk databases. On such a level technical and economic field, the functionality, availability and operational ease-of-use delivered by shared-disk make it a superior solution for OLTP clustering in the cloud.

Friday, January 29, 2010

What is Cloud Computing? A Brief Answer

At the Oracle-Sun merger coming-out party, Larry Ellison asked “what is cloud computing?” suggesting it is the same old stuff of hardware, software and the Internet. Let me try to answer this question from various perspectives.

Cloud computing is an umbrella term that describes:
• Provisioning of compute services;
• Billing of the compute services

Provisioning of Compute Services:
Compute services are provisioned from a pool of hardware/networking/power. In other words you don’t buy or lease individual hardware and accoutrements; you simply use what you need from a pool of such resources.

The above describes the hardware layer; the software layer can also be shared or sandboxed. For example, Google offers a shared software layer, they provide the file system, key-value store, operating system, etc. Each of these are designed for multi-tenancy and all users run on this same shared software layer. Amazon provides a sandboxed approach. You get your own sandbox with your choice of software (including their options like SimpleDB and RDS, or your own like loading MySQL).

Billing of Computer Services:
There are two primary ways to pay for cloud computing: utility and subscription. Utility means you only pay for what you use. Subscription means you pay a recurring amount for the right to use something. Amazon provides examples of both. You can pay a subscription fee for an instance of a computer, regardless of how much you utilize it. But when paying for storage, you pay for usage only in a utility model. Private clouds might use chargeback billing, charging the departments that use their services according to utility or subscription models.

Cloud computing is an umbrella term that addresses the provisioning—ideally on demand—of compute resources, where the hardware layer supports multi-tenancy and the software layer can be shared or sandboxed. Cloud computing is usually billed, or charged back, according to a pay-as-you go or utility model, a subscription model, or a hybrid of the two.

Definition by Perspective (Consumer):

Cloud consumer: I only pay for what I need at the time in small increments (e.g. hourly or GB transferred) and many annoying things like automated back-ups are automatically handled for me. I have no fixed costs (hardware, software, switches), just variable costs.

Definition by Contrast:
Traditional (dare I say legacy) computing relies on dedicated resources. You might share the networking, but you have a dedicated computer and probably dedicated storage, not to mention dedicated software. Your average utilization of the system might be 10%, with the excess capacity waiting for spikes in usage or allocated for future growth. In other words, you are paying 10X more than you should.

If you have a better definition of cloud computing please provide it in the comments.

Monday, January 25, 2010

Oracle, MySQL, the EU and Wayne Gretzky

“A good hockey player plays where the puck is. A great hockey player plays where the puck is going to be” -- Wayne Gretzky

Technically speaking the EU did a good job. They recognized that, in its current state, there is little market overlap between MySQL and Oracle products. Sure, there was some overlap and some Oracle customers would use the competitive threat of MySQL to extract lower pricing from Oracle. But looking at what the current installed bases are doing, they are not too competitive. And as the EU points out, Postgres and Ingres provide open source alternatives to Oracle’s high-end products.

Oracle, on the other hand did a great job. They saw where the puck was heading—namely that MySQL had their sights set on the Enterprise market—and Oracle intercepted the pass.

The most telling story was what happened at the MySQL partners meeting at the 2009 MySQL conference in April. Oracle had just announced that they were acquiring Sun/MySQL. The partner meeting was kicked off by a presentation on MySQL’s future where every other word was scalable or enterprise. They clearly had their sights set on the enterprise market. Obviously, this presentation was created before the acquisition announcement.

Then came the QA period. Of course, the first question was “What does this acquisition mean to MySQL?” The answer went on about how Oracle was a scalable enterprise database and MySQL is really focused on smaller web applications. It was a very telling 180-degree strategic pivot.

Was this a good thing, a bad thing…that question is now moot. It is what it is. The EU did a good job—based upon the current status—while Oracle did a great job of seeing the future direction.

Does Drizzle now skate to where the puck is going in the cloud? Does MariaDB make a run at the Enterprise by itself? Does MySQL drive forward into the enterprise market with Oracle’s support, or in spite of Oracle? Do Postgres and Ingres get a lift from this, as the only viable open source enterprise databases? Will we see the rise of other competitive threats in the enterprise database market? I’m happy to hear your comments, but ultimately time will tell.

Friday, January 22, 2010

Your Opinion Please: Did Oracle Make Concessions to the EU?

Back when the EU started the investigation of the Oracle-Sun deal, I made a bet. The bet hinged on whether Oracle would make concessions to get the EU’s approval. Please review the arguments, pro and con, and help us settle the bet.

Issue #1: The 10-Point Commitment to Customers Developers & Users of MySQL:
PRO CONCESSIONS: After meeting with the EU, Oracle issues this list of 10 concessions. Oracle prefaces the 10 points with the line: “In order further to reassure the Commission, Oracle hereby publicly commits to the following:” It then goes on to make certain commitments including #2 the non-assertion policy where is says “Oracle will change Sun’s current policy” and commit not to assert their copyright against storage engine vendors for 5 years. And continues to say: “Oracle shall reproduce this commitment in contractual commitments to storage vendors who at present have a commercial license with Sun.” Why would ANY company give up their legal rights without pressure. Clearly they made a concession. The press release includes other commitments and then closes with “The geographic scope of these commitments shall be worldwide and these commitments shall continue until the fifth anniversary of the closing of the transaction.”

CON CONCESSIONS: This is a press release and nothing more. There is no binding legal agreement. At the bottom of this simple press release it says: “When used in this press release, the words “shall,” “plans,” “commits” and “will” and other similar expressions and any other statements that are not historical facts are intended to identify those assertions as forward-looking statements. Any such statement is subject to a number of potential risks and uncertainties…”

Issue #2: Oracle’s Press Release About EU Approval
CON CONCESSIONS: It is very clear in the title “European Commission Unconditionally Approves Oracle’s Acquisition of Sun” It is unconditional, case closed, no conditions.

PRO CONCESSIONS: Of course Oracle will say unconditional. This is tantamount to person #1 suing person #2 for $10M. Instead of taking it to court, they settle where neither person admits or denies guilt, but person #2 pays person #1 $5M. Just because it settled out of court, and they “agree” that nobody is guilty, it is pretty clear that if there was no guilt, person #2 wouldn’t have paid $5M. This face-saving way that Oracle presents the approval to the world is meaningless.

Issue #3: The EU’s Press Release About Approval of the Deal
PRO CONCESSIONS: “The Commission also took into account Oracle's public announcement of 14 December 2009 of a series of pledges to customers, users and developers of MySQL concerning issues such as the continued release of future versions of MySQL under the GPL (General Public License) open source license. Oracle has already taken action to implement some of its pledges by making binding offers to third parties who currently have a licensing contract for MySQL with Sun to amend contracts.” The EU took into account “pledges” by Oracle and the fact that Oracle is already changing binding agreements. These steps were clearly a concession and the binding legal agreements that have been fixed are legal and binding proof of these concessions.

CON CONCESSIONS: Oracle did NOT enter into any binding agreement with the EU, therefore they made no concessions to get the deal done. Any flimsy pledges in a press release are not enforceable and therefore, no concessions were made. The fact that they changed individual agreements does not mean that they made a concession to the EU at all.

We have a lunch bet riding on the argument. Did Oracle make concessions under pressure from the EU in order to close the deal to acquire Sun? Please vote in the comments section, leading with YES (Oracle made concessions) or NO (Oracle did not make concessions). Feel free to elaborate on why ;-).

Please vote on the facts, not on your opinion about whether it was sufficient or not ;-)

Thank you for helping us settle this bet.

Wednesday, January 13, 2010

HP Needs a Linux OLTP Database...FAST

Oracle, after dating HP, Dell, Netapp and EMC has found its mate in Sun. Oracle is now becoming a systems company, and unceremoniously dumping these former paramours. These leaves the spurned lovers to find alternate accommodations, especially in the area of the database.

As I have stated previously on this blog, the clear partner of choice on the Windows front is Microsoft. This is demonstrated by today’s partner announcement around MS SQL Server for OLTP. But who is their partner in the Linux segment?

The following are contenders:
* Postgres (HP rolls their own)
* EnterpriseDB (pre-rolled Postgres)
* Ingres or Sybase—Oracle has felled them both in the past, but they are hoping for new life with a big sugar daddy like HP.
* ScaleDB, If HP is going after the cloud and the MySQL market

I don’t see them going for a NoSQL solution because NoSQL = NoEnterprise, making it a non-starter for HP. One way or the other, HP needs a solution for OLTP on Linux and they are on the clock.

For OLAP, HP has NeoView. If they felt the need, there are a number of OLAP solutions out there such a Greenplum, Netizza, Asterdata, Paraccel, Ingres/Vectorwise and others. That said, I think HP feels that they are holding a good hand on in the OLAP space, but Linux-based OLTP just became a gaping hole in their product suite. Today's partnership with Microsoft confirms this problem, but only solves the Windows half not the Linux half.

Monday, January 4, 2010

VMWare, Zimbra and the Virtualized Software Stack

VMWare appears to be positioning itself to provide the virtualized or cloud-based alternative to Oracle, Microsoft and IBM. This is a very interesting approach, and it will be interesting to see it play out over time. With Oracle and IBM taking a more systems-centric approach, meaning they are both providing the storage, computing and software stacks in the form of a system, this leaves Oracle’s traditional hardware partners out in the cold (HP, Dell, EMC, Netapp, etc.) along with budding potential partner Cisco. VMWare may envision themselves providing the Linux-based alternative to Microsoft in this game of strategic positioning. VMWare’s strategic advantage is that their entire stack is virtualization- and cloud-friendly. That would make sense given Maritz's Microsoft experience.

This diagram compares the various stacks from VMWare's perspective (e.g. they are all on top of VMWare instead of their own respective virtualization offerings). It compares Microsoft (orange), Oracle (red), VMWare (green) and IBM (wait or

If that is the case, there are some open holes and some questions.
1. Does VMWare need their own flavor of Linux (a la Novell’s Suse)?

2. What database does VMWare include? There are open source alternatives such as MySQL, Postgres and the recently wounded Ingres. The problem is that these all employ a shared-nothing architecture which doesn’t fit the virtualization model. They could look at some of the NoSQL alternatives, but NoSQL = NoEnterprise and the enterprise is where VMWare makes their money. (shameless plug) They could look at using ScaleDB’s shared-disk storage engine for MySQL, which is virtualization friendly.

3. Does VMWare go after higher-level applications like Zoho, SugarCRM, etc.?

4. Does VMWare partner with SAP to provide the applications layer and would that work in a virtualized stack? Certainly the proximity of their Menlo Park campi is convenient.

The opportunity for VMWare to partner with HP, Dell, Cisco, and obviously EMC but probably not Netapp, seems very compelling. Combine this with a built-in cloud play for these potential partners and it makes a lot of sense. Oracle is enamored with Sun and their systems strategy. They are walking away from HP and Dell. It would be interesting to see VMWare walk into those companies with the grand partnering strategy and a complete cloud stack ready to go. It would then increase the stakes for Oracle’s systems play, because it would cut-off their fallback position.

It is always interesting to watch the industry giants try to out flank each other.