Monday, August 2, 2010

Database Architectures & Performance II

As described in the prior post, the shared-disk performance dilemma is simple:

1. If each node stores/processes data in memory, versus disk, it is much faster.
2. Each node must expose the most recent data to the other nodes, so those other nodes are not using old data.

In other words, #1 above says flush data to disk VERY INFREQUENTLY for better performance, while #2 says flush everything to disk IMMEDIATELY for data consistency.

Oracle recognized this dilemma when they built Oracle Parallel Server (OPS), the precursor to Oracle Real Application Cluster (RAC). In order to address the problem, Oracle developed Cache Fusion.

Cache fusion is a peer-based shared cache. Each node works with a certain set of data in its local cache, until another node needs that data. When one node needs data from another node, it requests it directly from the cache, bypassing the disk completely. In order to minimize this data swapping between the local caches, RAC applications are optimized for data locality. Data locality means routing certain data requests to certain nodes, thereby enjoying a higher cache hit ratio and reducing data swapping between caches. Static data locality, built into the application, severely complicates the process of adding/removing nodes to the cluster.

ScaleDB encountered the same conflict between performance and consistency (or RAM vs. disk). However, ScaleDB’s shared cache was designed with the cloud in mind. The cloud imposes certain additional design criteria:

1. The number of nodes will increase/decrease in an elastic fashion.
2. A large percentage of MySQL users will require low-cost PC-based storage

Clearly, some type of shared-cache is imperative. Memcached demonstrates the efficiency of utilizing a separate cache tier above the database, so why not do something very similar beneath the database (between the database nodes and the storage)? The local cache on each database node is the most efficient use of cache, since it avoids a network hop. The shared cache tier, in ScaleDB’s case, the Cache Accelerator Server (CAS), then serves as a fast cache for data swapping.

The Cluster Manager coordinates the interactions between nodes. This includes both database locking and data swapping via the CAS. Nodes maintain the data in their local cache until that data is required by another node. The Cluster Manager then coordinates that data swapping via the CAS. This approach is more dynamic, because it doesn’t rely on a prior knowledge about the location of that data. This enables ScaleDB to support dynamic elasticity of database nodes, which is critical for cloud computing.

The following diagram describes how ScaleDB’s Cache Accelerator Server (CAS) is implemented.

While the diagram above shows a variety of physical servers, these can be virtual servers. An entire cluster, including the lock manager, database nodes and CAS could be implemented on just two physical servers.

Both ScaleDB and Oracle rely upon a shared cache to improve performance, while maintaining data consistency. Both approaches have their relative pros and cons. ScaleDB’s tier-based approach to shared cache is optimized for cloud environments, where dynamic elasticity is important. ScaleDB’s approach also enables some very interesting advantages in the storage tier, which will be enumerated in subsequent posts.

No comments:

Post a Comment