Friday, February 26, 2010

Will the NoSQL Movement Unseat the Database Behemoths?

With the introduction of each new platform, comes the opportunity for new thinking, new applications and new winners. DEC and Oracle were beneficiaries of the move to the minicomputer. Microsoft was the main beneficiary of the move to the PC. Sun rode the workstation to fame. Today’s exciting new platform is the cloud, and one of the upstart contenders is NoSQL.

One might argue that the cloud is merely the hosting of well established platforms such as the PC. Larry Ellison has made this very claim. However, the cloud is very different.

How is the cloud different? Sometimes when you combine things, the combination is very different than the components. For example, Salt (NaCl) is very different from its poisonous individual components. Cloud computing enjoys a similar combinatory effect. Sure it is merely a mixture of PC platforms, virtualization, lots of Linux and low-cost scalable disk arrays. But the combination is more about dynamic on-demand elasticity, elimination of capital expense, instant access to compute resources (versus slow hardware requisitioning), reduced IT headcount hassles, etc. In other words, cloud computing is no longer about the components, it is more about changing how we think about and use computing resources; it is a new paradigm for the consumption of computing resources.

With this new paradigm, comes a new mentality. Cloud developers expect that all aspects of the cloud to scale dynamically. This is where the shared-nothing SQL database comes up short. It is also where the NoSQL option excels.

We in the SQL world could easily dismiss NoSQL, saying NoSQL = NoEnterprise. How can you build a real application on something that doesn’t offer transactions, data consistency, SQL, etc. Real database people turn up their noses at those little key-value pair NoSQL toys. Not so fast.

SimpleDB just fired a shot across the bow of the database big boys with forced consistency. Sure you pay a price for this, and it should only be invoked when it is truly needed, but the point is you CAN do it. The history of technology is littered with the bodies of high-end products that were cannibalized from below, as lighter-weight platforms won the price/volume game. Cloud will definitely win the price/volume game; you simply cannot beat the economics. The question is who will win the cloud database war.

NoSQL databases (e.g. Cassandra, SimpleDB, BigTable, CouchDB, Mongo DB, etc.) will continue to nibble away at the rationale for sticking with big SQL databases. As the leading web database, MySQL became the de facto cloud database, since web and Web 2.0 applications were the early adopters of the cloud. But MySQL cannot rest on its laurels. NoSQL solutions are nipping at MySQL’s heels and their dynamic elasticity is quite appealing.

Now enterprise customers are beginning to move to the cloud. At the same time, NoSQL solutions are adding capabilities once reserved to relational databases. This raises a LOT of questions:

1. Will NoSQL undermine its scalability as it adds more enterprise capabilities (Will these extensions bolt on smoothly or will they result in an awkward and ultimately unscalable Frankenstein)?

2. Will the big SQL database vendors continue to dismiss NoSQL as toys, or will they see them for the threat they are becoming (Should we expect the commercial database vendors to start buying NoSQL solutions)?

3. Will MySQL be the first to succumb to the NoSQL onslaught (Did Oracle just buy yesterday’s cloud database leader)?

4. Will a third-party candidate like ScaleDB, with its shared-disk architecture win with a “best of both worlds” approach that scales dynamically and provides enterprise SQL capabilities?

5. Will SQL and NoSQL co-exist as different tools for different problems, or with they evolve into direct competitors across most major segments?

My Thoughts:
At the moment, SQL databases and NoSQL are different tools for different problems. I think this remains the case, but I believe that NoSQL will spread its reach by adding capabilities that begin to eat into traditional relational database segments. I suspect that the large commercial database companies, after ignoring NoSQL for too long, will resort to buying some of them and integrating them into their product portfolios. Companies focused solely on worldwide scalability like Google, will remain wedded to NoSQL, because any technology that doesn’t scale to 10,000 servers is a non-starter. Enterprises will take a “right tool for the job” approach, employing all of the above.

NoSQL and map-reduce technologies will excel in non-transactional roles like data warehouses, business intelligence (DW/BI). In the OLTP space, SQL databases will remain far more prominent. However, the pain of dynamically scaling shared-nothing databases—and sharding is a pain—will create a need for the dynamically elastic shared-disk databases like ScaleDB. The sweet spot for shared-disk probably peaks at about 80-100 database servers. This level of scaling should be sufficient for all but the largest companies. Beyond that, NoSQL (utilizing little or no scale-limiting constraints like forced consistency) will be the only option.

I would love to hear your thoughts in the comments section below…

Wednesday, February 10, 2010

Cloud Computing: Shared-Disk vs. Shared-Nothing

Anant Jhingran (IBM’s CTO, Information Management, Analytics and Optimization) challenged our assertion that the cloud benefits the shared-disk database architecture. For me to enter into a battle of technical vision with Anant is equivalent to bringing a knife to a gun battle, but I enjoy a good challenge.

1. Cloud storage: Anant argues that (a) SANs won’t beat local disk in costs; (b) many shared-nothing databases use SANs anyway. To quote Inigo Montoya from Princess Bride: “Let me ‘splain. No there is too much. Let me sum up

Response: (a) While some clouds use traditional SAN or NAS storage, the trend among clouds is to assemble large collections of low-cost disks using a cluster file system to handle disk striping and data redundancy thereby providing SAN-like capabilities. As a result, the economics are quite similar to those of local disk; (b) We play in the MySQL market, where the vast majority of the databases use the local disk, making the comparison quite valid…for us. That said, we find that MySQL also commands a large percentage of the installed base on the cloud, making the comparison valid in general.

My broader point: Historically, the shared-nothing database had many advantages over the shared-disk database, particularly in the area of shared storage. Two major factors were at play: (1) shared storage was very expensive; (2) shared-disk databases split the storage performance across multiple nodes, meaning that performance of “Z” meant that a 4-node shared-disk database would only deliver 1/4 x Z to each node, making it expensive to deliver comparable performance on a per node basis. The cloud minimizes (and is on a trajectory to eliminate) shared-nothing’s historical advantage in these areas, by getting cheaper and faster. By rendering these traditional shared-nothing advantages moot, the two architectures are able to compete on other attributes, where shared-disk excels, such as operational simplicity and dynamic elasticity. These advantages are particularly relevant to the cloud. Shared-disk actually reduces costs in the cloud because it: (i) eliminates the need for redundant slaves (since each node provides fail-over to the other nodes); (ii) provides more evenly balanced load, since nodes are not specialized; (iii) supports dynamic elasticity at the database node level, where you only use/pay for the instances you need at the time.

2. Network Bandwidth: Anant suggests that this point is moot in comparing traditional and cloud computing.

Response: Maybe in the IBM/DB2 world where “many of the shared-nothing implementations of our clients use SANs”, but this is not the case in the MySQL world. Network performance plays a huge part in comparing shared-storage vs. local storage. Again from a historical perspective, back when shared-nothing became all the rage and MySQL took off as the M in LAMP, Ethernet and Fast Ethernet were a serious bottleneck on shared-disk performance. Now, with Gigabit Ethernet, Fiber Channel and Infiniband, there is further leveling of the playing field. This is not cloud specific. Improvements in network performance leveled the playing field between the two database architectures, but the storage costs described above still played a big part. It was after the cloud changed the economics on storage that we began to see a reassessment of the traditional bias for shared-nothing.

3. Virtualization: Scaling up/down stateless CPUs is easier in the shared-disk architecture. But global state (e.g. locks) undermine the independence of the virtualized nodes. In addition, the database typically likes to take control of the entire stack.

Response: ScaleDB does not take control of the entire stack, instead it is VM-friendly. ScaleDB’s implementation of the shared-disk model relies on a centralized lock manager, which also coordinates buffers and recovery among the nodes. It serves to coordinate the independent actions of the nodes, not to control them. They continue to act independently, from the perspective of the application. This combination makes ScaleDB very cloud friendly. You can surely argue that shared-nothing can scale to a larger number of nodes, but (a) most applications can get by with 50 or fewer database nodes; and (b) the process of scaling database nodes and maintaining those nodes in a shared-nothing cluster is quite painful.

If your argument is that shared-nothing has less state, it does, but it imposes more state information on the application, load balancer and the storage than shared-disk, so it is a trade-off. The key is to manage state in a scalable manner as we do in the ScaleDB lock manager.

4. As I understand it, the argument is for duplicate machines and distributed data that are loosely coupled, enabling rapid kill/restart in case of failure. The argument being that this is easier in shared-nothing.

Response: If I understand your point correctly, this would be easier in shared-disk. Shared-nothing introduces complexity in keeping replicates, backups, general database file reorganization, and QOS issues in a multi-tenant environment. By avoiding this pain, shared-disk is easier to maintain than shared-nothing. In short, the kill/redirect model of shared-disk provides faster response to failure that the kill/restart model employed by shared-nothing, and it is far easier to maintain.

Conclusion: In answer to points #1 and #2 above, advances in networking and storage have narrowed the gap between shared-disk. Cloud economics have then made this powerful shared storage economically compelling. For points, #3 and #4, the advantage goes to shared-disk. In addition, the natural synergy between cloud computing and shared-disk database go much further:

a. Instead of using a fixed partitioning model like shared-nothing, shared-disk is dynamically elastic. You can add storage capacity and compute capacity on the fly without interruption or additional work. In addition to the flexibility this affords to the developer, it also enables scaling on demand. The static partitioning model of shared-nothing invariably results in reserving over-capacity to accommodate for usage spikes and future growth. Since the cloud enables on-demand allocation of resources on a pay-per-use model, shared-disk is simply more compatible with the cloud.

b. The elimination of the partitioning/sharding of data and the replication, promotion and synching of slaves reduces the burden on the user and on the cloud administrator. Look closely at Amazon’s RDS and you’ll see that these things are disabled because they are a pain to maintain. The simplicity of the shared-disk architecture wins this as well.

c. Economics 1: See the cloud database white paper I wrote on this. Compute instances are more expensive than storage in the cloud. Since shared-disk generally uses fewer compute instances—by eliminating slaves and through better distribution of database requests via cluster-level load balancing—the cost of a shared-disk system will, in most cases, be lower than shared-nothing.

d. Economics 2: Since shared-disk is more dynamic, enabling scaling on the fly, one can replace a large instance used by the more rigid shared-nothing database, with a collection of smaller instances. Given the disproportionate increase in pricing of large instances, relative to aggregate performance of less expensive smaller instances, it is more economical to use shared-disk in the cloud. Consider, for example, using a 10-node shared-disk cluster costing $.85 per hour versus a single Quadruple Extra Large Instance for a shared-nothing database costing $2.40 per hour (costing almost three times as much). Then consider that you could scale down to two nodes in the shared-disk example during slow times, paying only $.17, instead of maintaining the $2.40 per hour shared-nothing database.

I maintain my assertion that both network performance and cloud storage have leveled the playing field for the underlying economic and performance comparisons between shared-nothing and shared-disk databases. On such a level technical and economic field, the functionality, availability and operational ease-of-use delivered by shared-disk make it a superior solution for OLTP clustering in the cloud.