Anant Jhingran (IBM’s CTO, Information Management, Analytics and Optimization) challenged our assertion that the cloud benefits the shared-disk database architecture. For me to enter into a battle of technical vision with Anant is equivalent to bringing a knife to a gun battle, but I enjoy a good challenge.
1. Cloud storage: Anant argues that (a) SANs won’t beat local disk in costs; (b) many shared-nothing databases use SANs anyway. To quote Inigo Montoya from Princess Bride: “Let me ‘splain. No there is too much. Let me sum up
Response: (a) While some clouds use traditional SAN or NAS storage, the trend among clouds is to assemble large collections of low-cost disks using a cluster file system to handle disk striping and data redundancy thereby providing SAN-like capabilities. As a result, the economics are quite similar to those of local disk; (b) We play in the MySQL market, where the vast majority of the databases use the local disk, making the comparison quite valid…for us. That said, we find that MySQL also commands a large percentage of the installed base on the cloud, making the comparison valid in general.
My broader point: Historically, the shared-nothing database had many advantages over the shared-disk database, particularly in the area of shared storage. Two major factors were at play: (1) shared storage was very expensive; (2) shared-disk databases split the storage performance across multiple nodes, meaning that performance of “Z” meant that a 4-node shared-disk database would only deliver 1/4 x Z to each node, making it expensive to deliver comparable performance on a per node basis. The cloud minimizes (and is on a trajectory to eliminate) shared-nothing’s historical advantage in these areas, by getting cheaper and faster. By rendering these traditional shared-nothing advantages moot, the two architectures are able to compete on other attributes, where shared-disk excels, such as operational simplicity and dynamic elasticity. These advantages are particularly relevant to the cloud. Shared-disk actually reduces costs in the cloud because it: (i) eliminates the need for redundant slaves (since each node provides fail-over to the other nodes); (ii) provides more evenly balanced load, since nodes are not specialized; (iii) supports dynamic elasticity at the database node level, where you only use/pay for the instances you need at the time.
2. Network Bandwidth: Anant suggests that this point is moot in comparing traditional and cloud computing.
Response: Maybe in the IBM/DB2 world where “many of the shared-nothing implementations of our clients use SANs”, but this is not the case in the MySQL world. Network performance plays a huge part in comparing shared-storage vs. local storage. Again from a historical perspective, back when shared-nothing became all the rage and MySQL took off as the M in LAMP, Ethernet and Fast Ethernet were a serious bottleneck on shared-disk performance. Now, with Gigabit Ethernet, Fiber Channel and Infiniband, there is further leveling of the playing field. This is not cloud specific. Improvements in network performance leveled the playing field between the two database architectures, but the storage costs described above still played a big part. It was after the cloud changed the economics on storage that we began to see a reassessment of the traditional bias for shared-nothing.
3. Virtualization: Scaling up/down stateless CPUs is easier in the shared-disk architecture. But global state (e.g. locks) undermine the independence of the virtualized nodes. In addition, the database typically likes to take control of the entire stack.
Response: ScaleDB does not take control of the entire stack, instead it is VM-friendly. ScaleDB’s implementation of the shared-disk model relies on a centralized lock manager, which also coordinates buffers and recovery among the nodes. It serves to coordinate the independent actions of the nodes, not to control them. They continue to act independently, from the perspective of the application. This combination makes ScaleDB very cloud friendly. You can surely argue that shared-nothing can scale to a larger number of nodes, but (a) most applications can get by with 50 or fewer database nodes; and (b) the process of scaling database nodes and maintaining those nodes in a shared-nothing cluster is quite painful.
If your argument is that shared-nothing has less state, it does, but it imposes more state information on the application, load balancer and the storage than shared-disk, so it is a trade-off. The key is to manage state in a scalable manner as we do in the ScaleDB lock manager.
4. As I understand it, the argument is for duplicate machines and distributed data that are loosely coupled, enabling rapid kill/restart in case of failure. The argument being that this is easier in shared-nothing.
Response: If I understand your point correctly, this would be easier in shared-disk. Shared-nothing introduces complexity in keeping replicates, backups, general database file reorganization, and QOS issues in a multi-tenant environment. By avoiding this pain, shared-disk is easier to maintain than shared-nothing. In short, the kill/redirect model of shared-disk provides faster response to failure that the kill/restart model employed by shared-nothing, and it is far easier to maintain.
Conclusion: In answer to points #1 and #2 above, advances in networking and storage have narrowed the gap between shared-disk. Cloud economics have then made this powerful shared storage economically compelling. For points, #3 and #4, the advantage goes to shared-disk. In addition, the natural synergy between cloud computing and shared-disk database go much further:
a. Instead of using a fixed partitioning model like shared-nothing, shared-disk is dynamically elastic. You can add storage capacity and compute capacity on the fly without interruption or additional work. In addition to the flexibility this affords to the developer, it also enables scaling on demand. The static partitioning model of shared-nothing invariably results in reserving over-capacity to accommodate for usage spikes and future growth. Since the cloud enables on-demand allocation of resources on a pay-per-use model, shared-disk is simply more compatible with the cloud.
b. The elimination of the partitioning/sharding of data and the replication, promotion and synching of slaves reduces the burden on the user and on the cloud administrator. Look closely at Amazon’s RDS and you’ll see that these things are disabled because they are a pain to maintain. The simplicity of the shared-disk architecture wins this as well.
c. Economics 1: See the cloud database white paper I wrote on this. Compute instances are more expensive than storage in the cloud. Since shared-disk generally uses fewer compute instances—by eliminating slaves and through better distribution of database requests via cluster-level load balancing—the cost of a shared-disk system will, in most cases, be lower than shared-nothing.
d. Economics 2: Since shared-disk is more dynamic, enabling scaling on the fly, one can replace a large instance used by the more rigid shared-nothing database, with a collection of smaller instances. Given the disproportionate increase in pricing of large instances, relative to aggregate performance of less expensive smaller instances, it is more economical to use shared-disk in the cloud. Consider, for example, using a 10-node shared-disk cluster costing $.85 per hour versus a single Quadruple Extra Large Instance for a shared-nothing database costing $2.40 per hour (costing almost three times as much). Then consider that you could scale down to two nodes in the shared-disk example during slow times, paying only $.17, instead of maintaining the $2.40 per hour shared-nothing database.
I maintain my assertion that both network performance and cloud storage have leveled the playing field for the underlying economic and performance comparisons between shared-nothing and shared-disk databases. On such a level technical and economic field, the functionality, availability and operational ease-of-use delivered by shared-disk make it a superior solution for OLTP clustering in the cloud.
2 hours ago