Tuesday, August 31, 2010

Cloud Insight: HP, Dell, 3PAR, VMWare & ScaleDB

The bidding war between HP and Dell for 3PAR has created great theater. The rationale is simple, both HP and Dell want a complete set of products to sell into the new cloud space and 3PAR is the only bitsized morsel among EMC, IBM and Hitachi that addresses this space. What is the compelling advantage they offer in storage? Elasticity. 3PAR provides the ability for companies to add/remove storage in thin slices (AKA thin provisioning). How does this relate to ScaleDB? We do the exact same thing for databases in the cloud and we do it for the most popular database in the cloud, MySQL.

How does VMWare play into this? Their CEO Paul Maritz was on CNBC talking about the hybrid cloud and how companies want to run core cloud capabilities on premise and then use the public cloud providers to handle compute overflow during peak usage. This means that public cloud value to corporations, assuming Maritz is correct, is based largely on their ability to provide elasticity. It will no longer be sufficient for public cloud companies to provide reserved servers, because the reserved servers will be run in the company’s data center. The public cloud will add/remove servers to handle peaks in usage. So elasticity is EVERYTHING. ScaleDB is all about elasticity for the database.

It is also interesting to note from the Maritz interview that he sees the next wave of cloud (and hence the next wave of cloud consolidation) coming from the software sector. More specifically, the ability to take existing applications and make them run on the cloud. In other words, to make them elastic. Again, this is exactly what ScaleDB does. We take existing MySQL applications and make them elastic.

It is also interesting to note that HP and Dell have decimated their own R&D and are now looking to acquire that expertise from outside, and they are willing to pay for the expertise.

Another theme playing out in the background makes this situation even more interesting. Oracle has adopted a systems approach, where they combine their hardware and software:

“The heart of the interview focused on Oracle's interest in Sun. By combining Sun's expertise in hardware with Oracle's software, Ellison suggested, the combined company can become a powerful "systems" company that sells solutions to businesses. The competitor that Ellison wants to beat: IBM.”

Summary: Cloud is the next battle ground. It all starts with the hardware/infrastructure (e.g. 3PAR) and then moves upstream to software. Oracle will be focused on selling complete systems, alienating HP & Dell, among others. This is compounded by the fact that HP and Dell have decimated their R&D, so they are forced to partner/acquire. At the same time, if Maritz’s vision of public clouds becoming effectively excess capacity for handling peaks from corporations is realized, then elasticity in the cloud will become critical as well. This obviously plays to ScaleDB’s strengths.

Tuesday, August 10, 2010

Comparing ScaleDB’s Shared Cache Tier vs. NFS and CFS

Prior posts addressed the performance benefits of a shared cache tier (ScaleDB CAS) and also the storage flexibility it enables.This post compares the ScaleDB CAS purpose-built file storage sharing system against off-the-shelf solutions like NFS and various cluster file systems (CFS).

When using a clustered database, like ScaleDB, each node has full access to all of the data in the database. This means that the file system (SAN, NAS, Cloud, etc.) must allow multiple nodes to share the data in the file system.

Options include:
1. Network File System (NFS)
2. Cluster File System (CFS)
3. Purpose-built file storage interface

Locking Granularity:
I won’t get deeply into the nuances of CFS (block-level ) and NFS (file-level, but you can address within the file), suffice it to say that generally speaking NFS and CFS will allow you operate on blocks of data, which are typically 8KB. Let’s say you want to operate on a record that is 200 bytes within an 8KB block. You are locking 8KB instead of 200 bytes, or 40X more than necessary.

ScaleDB’s CAS uses a purpose-built interface to storage that is optimized to leverage insight from the cluster lock manager. This enables it to lock the storage on the record level. In situations where multiple nodes are concurrently accessing data from the same block, this can be a significant performance advantage. This reduces the contention between threads/nodes enabling superior performance and nodal scalability.

Intelligent Control of RAM vs. Disk:
When writing data to storage, you can either flush it directly to disk or you can store it in cache, allowing the disk flushing to occur later, outside of the transaction. Some things, like log writing require the former, while other things work just fine (and faster) with the latter. Unfortunately, generic file systems like NFS and CFS are not privy to this insight, so they must err on the side of caution and flush everything to disk inside the transaction.

ScaleDB’s CAS is privy to the intelligence inside the database. It is therefore able to push more data into cache for improved performance. Furthermore, this optimization can be configured by users, based on their own requirements. The net result is superior performance.

As general purpose solutions, NFS and CFS cannot benefit from the insight and intelligence from the internal operation of the database. Instead, NFS and CFS must act in a generalized manner. ScaleDB’s Cluster Accelerator Server (CAS), leverages insight gleaned from the cluster lock manager, and from user configurations, to optimize its interaction with storage. This makes CAS more efficient and scalable, and it improves performance.

Wednesday, August 4, 2010

Shared Cache Tier & Storage Flexibility

Any time you can get two for the price of one (a “2Fer”), you’re ahead of the game. By implementing our shared cache as a separate tier, you get (1) improved performance and (2) storage flexibility…a 2Fer.

What do I mean by storage flexibility? It means you can use enterprise storage, cloud storage or PC-based storage. Other shared-disk cluster databases require high-end enterprise storage like a NAS or SAN. This requirement was driven by the need for:

1. High-performance storage
2. Highly available storage
3. Multi-attach, or sharing data from a single volume of LUN across multiple nodes in the cluster.

Quite simply, you won’t see other shared-disk clustering databases using cloud storage or PC-based storage. However, the vast majority of MySQL users rely on PC-based storage, and most are not willing to pay the big bucks for high-end storage.

ScaleDB’s Cache Accelerator Server (CAS) enables users to choose the storage solution that fits their needs. See the diagram below:

Because all data is mirrored across paired CAS servers, it delivers high-availability, because if one fails the other continues running. Built-in recovery completes the HA solution. If you want further reassurance you can use a third CAS as a hot standby. This means that you can use the internal hard drives on your CAS on servers to provide highly-available storage.

The next post in this series on CAS will compare ScaleDB CAS, Network File System (NFS) and Cluster File System (CFS).

Monday, August 2, 2010

Database Architectures & Performance II

As described in the prior post, the shared-disk performance dilemma is simple:

1. If each node stores/processes data in memory, versus disk, it is much faster.
2. Each node must expose the most recent data to the other nodes, so those other nodes are not using old data.

In other words, #1 above says flush data to disk VERY INFREQUENTLY for better performance, while #2 says flush everything to disk IMMEDIATELY for data consistency.

Oracle recognized this dilemma when they built Oracle Parallel Server (OPS), the precursor to Oracle Real Application Cluster (RAC). In order to address the problem, Oracle developed Cache Fusion.

Cache fusion is a peer-based shared cache. Each node works with a certain set of data in its local cache, until another node needs that data. When one node needs data from another node, it requests it directly from the cache, bypassing the disk completely. In order to minimize this data swapping between the local caches, RAC applications are optimized for data locality. Data locality means routing certain data requests to certain nodes, thereby enjoying a higher cache hit ratio and reducing data swapping between caches. Static data locality, built into the application, severely complicates the process of adding/removing nodes to the cluster.

ScaleDB encountered the same conflict between performance and consistency (or RAM vs. disk). However, ScaleDB’s shared cache was designed with the cloud in mind. The cloud imposes certain additional design criteria:

1. The number of nodes will increase/decrease in an elastic fashion.
2. A large percentage of MySQL users will require low-cost PC-based storage

Clearly, some type of shared-cache is imperative. Memcached demonstrates the efficiency of utilizing a separate cache tier above the database, so why not do something very similar beneath the database (between the database nodes and the storage)? The local cache on each database node is the most efficient use of cache, since it avoids a network hop. The shared cache tier, in ScaleDB’s case, the Cache Accelerator Server (CAS), then serves as a fast cache for data swapping.

The Cluster Manager coordinates the interactions between nodes. This includes both database locking and data swapping via the CAS. Nodes maintain the data in their local cache until that data is required by another node. The Cluster Manager then coordinates that data swapping via the CAS. This approach is more dynamic, because it doesn’t rely on a prior knowledge about the location of that data. This enables ScaleDB to support dynamic elasticity of database nodes, which is critical for cloud computing.

The following diagram describes how ScaleDB’s Cache Accelerator Server (CAS) is implemented.

While the diagram above shows a variety of physical servers, these can be virtual servers. An entire cluster, including the lock manager, database nodes and CAS could be implemented on just two physical servers.

Both ScaleDB and Oracle rely upon a shared cache to improve performance, while maintaining data consistency. Both approaches have their relative pros and cons. ScaleDB’s tier-based approach to shared cache is optimized for cloud environments, where dynamic elasticity is important. ScaleDB’s approach also enables some very interesting advantages in the storage tier, which will be enumerated in subsequent posts.