Showing posts with label NewSQL. Show all posts
Showing posts with label NewSQL. Show all posts

Monday, January 7, 2013

Database Virtualization, What it Really Means


This is a response to a blog post by analyst and marketing consultant Curt Monash.

Originally virtualization meant running one operating system in a window inside of another operating system, e.g. running a Linux on a Windows machine using Microsoft Virtual PC or VMWare. Then virtualization evolved to mean slicing a single server into many for more granular resource allocation (Curt’s ex uno plures, translated: out of one, many). It has since expanded to include e pluribus unum (from many, one) and e pluribus ad pluribus (from many to many). This is evidenced in the use of the term “virtualization” to create the compound words: server virtualization, storage virtualization, network virtualization and now database virtualization.

Server Virtualization: Abstracts the physical (servers), presenting it as a logical entity or entities. VMWare enables dividing single physical resources (compute or storage) into multiple smaller units (one-to-many), as well as combining multiple physical units into a single logical unit, which they call clustering (many-to-one). Since a clustered collection of physical servers can address a clustered collection of physical storage devices, it therefore also supports the many-to-many configuration. If we extract the essence of virtualization it is the ability to address compute and storage resources logically, while abstracting the underlying physical representation.

This modern definition of virtualization is also evident in the following terms:

Storage Virtualization: Splitting a single disk into multiple virtual partitions (one-to many), a single logical view that spans multiple physical disks (RAID) and splitting multiple disks, often for high-availability, across multiple logical storage devices (mirroring or LUNs).  see also Logical Volume Management

Network Virtualization: “In computing, network virtualization is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network.” This is a many-to-one model. 

Virtual IP Addresses: This enables the application to have a single IP address that actually maps to multiple NICs. 

So this brings us to the topic of defining “Database Virtualization”. We believe that the most comprehensive description of database virtualization is the abstracting from physical resources (compute, data, RAM) to the logical representation, supporting many-to-one, one-to-many and many-to-many relationships. This is exactly what ScaleDB does better than any other database company, and this is why we are considered leaders in the nascent field of database virtualization.

ScaleDB provides a single logical view (of a single database) while that database is actually comprised of a cluster of multiple database instances operating over shared data. Whether you call this many-to-one (many database nodes acting as one logical database) or one-to-many (one logical database split across many nodes) is a matter of perspective. In either case, this enables independent scaling of both compute and I/O, eliminating the need for painful sharding, while also supporting multi-tenancy.

Curt then puts words in our mouths stating that we claim: “Any interesting database topology should be called “database virtualization”.” We make no such a claim. In fact, we state very clearly on our database virtualization page: “Database virtualization means different things to different people; from simply running the database executable in a virtual machine, or using virtualized storage, to a fully virtualized elastic database cluster composed of modular compute and storage components that are assembled on the fly to accommodate your database needs.”

In the marketing world, perception is reality. Since people are making claims of providing database virtualization, it is only prudent to include and compare their products, in a comprehensive evaluation of the space. Just as Curt addresses many things that are not databases (e.g. Memcached) in order to provide the reader with a comprehensive understanding, so do we when talking about database virtualization. One need only consider that we include “running the database executable in a virtual machine” as one of the approaches that some consider to be “database virtualization.” While we consider our approach to be the best solution for database virtualization, ignoring what other people consider to be database virtualization would have displayed extreme hubris on our part.

We appreciate Curt shining the light on database virtualization and we agree that it is “a hot subject right now.” It is a new field and therefore requires a comprehensive evaluation of all claims and approaches, enabling customers to decide which is the best approach for their needs. We remain quite confident that we will continue to lead the database virtualization market based upon our architectural advantages.

As the database virtualization market heats up, and as we enhance our solution set, we remain confident that Curt and other analysts will come to appreciate our unique advantages in this market. 

Thursday, September 22, 2011

Lack of Business Visibility Cripples Traditional SQL DaaS, Drives NewSQL

More and more public cloud companies are moving to managed cloud services to improve their value-add (price premium) and the stickiness of their solution. However, the shift to a database as a service (DaaS) severely reduces the DBAs visibility into the business, thus limiting the ability to hand tune the database to the requirements of the application and the database. The solution is a cloud database that eliminates the hand-tuning of the database, thereby enabling the DBA to be equally effective even with limited visibility into the business and application needs. It is these unique needs, particularly for SQL databases, that is fueling the NewSQL movement.

DBAs traditionally have insight into the company, enabling them to hand tune the database in a collaborative basis with the development team, such as:

1. Performance Trade-offs/Tuning: The database is partitioned and tuned to address business requirements, maximizing performance of certain critical processes, while slowing less critical processes.

2. System Maintenance Planning: You need to shut down the database to do an application/database upgrade, repartition, etc. but you may need to coordinate with the development team, and the schedule may change up until the last moment.

3. Application Evolution: The database must be designed and tuned to accommodate the planned changes in the application.

4. Consulting with Application Developers: Since the database partitioning and performance are hand-tuned, the DBA must collaborate closely with the application development team on design, development and deployment.

5. Partitioning Requirements: When partitioning (or repartitioning) your database you’ll need to partition the data to suit application requirements avoiding things like cross partition joins, range scans, aggregates, etc., which can cause tremendous performance penalties if not implemented correctly.

6. Moving Processes to the Application Layer: Single server databases can handle joins, range scans, aggregates and the like internally. However, when you partition the database these functions are typically moved to the application layer. The application must add the logic to accomplish these things. It must also add the routing code to point to the correct partitions. As a result, an application written to a single server API does not work in a multi-server configuration.

When moving from a self-managed database—either in the cloud or on premise—to a DaaS, the “DBA-in-the-cloud” doesn’t have that visibility into the business requirements, performance requirements, development schedule, and more. This lack of visibility turns the already challenging task of hand-tuning the database into a near impossibility using traditional databases.

A real world example: You are the DBA-in-the-cloud. Your customer has been running on a single server, but they need to now scale-out across multiple database nodes to accommodate growth. How do you split the data? You don’t know which queries have higher performance priority. You don’t know the development plan for new version of the application. You don’t know when it is convenient to shut down the application to implement the partitioning. You need to inform the application developers how they should implement their routing code to send database requests to the right nodes. You need to inform the application developers that they need to handle joins, range scans and aggregates at the application level, since the database can no longer handle those.

DBAs have enough of a challenge scaling and maintaining databases when they have full visibility into the business. To address these challenges without that level of visibility is unrealistic. I refer to this problem as the “Blind DBA” challenge, because the DaaS DBA has a serious lack of visibility into the requirements and inner workings of the company.

ScaleDB is a NewSQL database uniquely designed to handle the Blind DBA challenge inherent in DaaS implementations. ScaleDB is based on a shared-disk architecture that scales in a manner that is invisible to the application. It eliminates the need to push functions (joins, range scans, aggregates) up to the application layer. It eliminates the need to coordinate your application and database design to work around partitions, because there are no partitions. It eliminates the performance tradeoffs in partitioning design, you get consistent performance across the database. It eliminates scheduling and coordinating application shut-down to repartition, because it doesn’t use partitions. In short, with ScaleDB the DBA needn’t have any visibility into the business to deliver optimal database performance via a single API. This is what makes ScaleDB an ideal solution for cloud companies implementing the database as a managed service or DaaS.

Tuesday, September 20, 2011

Cloud DaaS Managed Service Fuels NewSQL Market

As public clouds are commoditized, the public cloud vendors are increasingly moving to higher margin and stickier managed services. In the early days of the public cloud, renting compute and storage was unique, exciting, sticky and profitable. It has quickly become a commodity. In order to provide differentiation, maintain margins and create barriers to customer exit, against increasing competition, the cloud is moving toward a collection of managed services.

Public clouds are growing beyond simple compute instances to platform as a service (PaaS). PaaS is then comprised of various modules, including database as a service (DaaS). In the early days you rented a number of compute instances, loaded your database software and you were the DBA managing all aspects of that database. Increasingly, public clouds are moving toward a DaaS model, where the cloud customer writes to a simple database API and the cloud provider is the DBA.

If the database resides in a single server and does not require high-availability, providing that as a managed service is no problem. Of course, if this is the use case, then it is no problem for the customer to manage their own database. In other words, there is little value to a managed service.

The real value-add for the customer, and hence the real price premium, is derived by offering things like auto-scaling across multiple servers, hot backup, high-availability, etc. If the public cloud provider can offer a SQL-based DaaS, where the customer writes to a simple API and everything else is handled for them, that is a tremendous value and customers will pay a premium for it.

While this sounds simple, public cloud companies soon learn that the Devil is in the details. Managing someone else’s database, without insight into their business processes, performance demands, scaling demands, evolving application requirements, and more, is extremely challenging and demands a new class of DBMS. These demands have created a market need that is now being filled by companies using the moniker “NewSQL”.

In short, when it comes to DaaS, public cloud vendors want the following:
• Simple “write to our API and we’ll handle the messy stuff like scaling, HA, etc.”
• Premium value that translates to a higher profit margin business
• Barriers to customer exit

Future posts will delve into the operational demands of DaaS, and how these demands a driving NewSQL DBMS architectures and features.

Wednesday, August 24, 2011

The Future of NoSQL (Companies)…

A friend recently bought a GM car. I proceeded to inform him that I am shorting GM stock (technically a put option). He was shocked. “But they make great cars,” he exclaimed. I responded, “I’m not shorting the cars, I’m shorting the company.” Why am I recounting this exchange? Because I believe that the new wave of NoSQL companies—as opposed to the rebranded ODBMS—presents the same situation. I am long the products, but short the companies.

Let me explain. NoSQL companies have built some very cool products that solve real business problems. The challenge is that they are all open source products serving niche markets. They have customer funnels that are simply too small to sustain the companies given their low conversion/monetization rates.

These companies could certainly be tasty acquisition targets for companies that actually make money. But as standalone companies, sadly, I would short them. On that note, I am off to the NoSQL Now! Conference. Hopefully, this post won't get me beat-up while cruising the conference.


Wednesday, August 3, 2011

Cloud Elasticity & Databases

The primary reasons people are moving to the public cloud are: (1) replace capital expenses with operating expenses (pay as you go); (2) use shared resources for processes like back-up, maintenance, networking (shared expenses); (3) use shared infrastructure that enables you to pay only for those resources you actually use, instead of consuming your maximum load resources at all times (pay-per-use). The first thing you’ll notice is that all 3 cloud benefits have their basis in finances or the cloud business model.

We will focus in on #3 above: Pay-Per-Use. The old school model was to build your compute infrastructure for the maximum load today, plus growth over the life-cycle of the equipment, plus some buffer so the systems don’t get overloaded from spikes in usage. The net result is that your average usage might run 10% of the potential for the infrastructure you mortgaged your home to buy. In other words, you were paying 10X more than you would pay if you only paid by usage. In reality, you might pay half as much to run on the cloud, with the balance of the savings going to the cloud company in the form of profits. This works and it is a win-win for both you and the public cloud.

To achieve this Pay-Per-Use ideal, and the compelling financial advantages it enables, the infrastructure must scale elastically. You must be able to add compute power seamlessly and on the fly, without shut-down. How important is this elasticity? Amazon named their service “EC2” for “Elastic Cloud Computing”. Elastic is the first word, I would say it is pretty important. Besides, if the cloud weren’t elastic, you would simply be paying for the same computer costs, plus the public cloud company’s markup for expenses and profit.

So how elastic are public clouds? The entire cloud stack is elastic, except for one piece, the SQL database. Cloud companies recognized that the SQL database was the Achilles heel of cloud elasticity. To address this problem, they created NoSQL, which delivers database-like capabilities, but removes the things that make a SQL database inelastic; namely SQL, ACID-compliance, data consistency, transactions, etc.

NewSQL appears to be the response from the database vendors, who believe that there is a market for SQL databases that provide cloud elasticity. Not all NewSQL solutions address elasticity, but a few of us do. In my next blog post, I’ll address whether or not database elasticity is important…hint: it depends upon your needs.

Monday, July 25, 2011

ScaleDB: Shared-Disk / Shared-Nothing Hybrid

The primary database architectures—shared-disk and shared-nothing—each have their advantages. Shared-disk has functional advantages such as high-availability, elasticity, ease of set-up and maintenance, eliminates partitioning/sharding, eliminates master-slave, etc. The shared-nothing advantages are better performance and lower costs. What if you could offer a database that is a hybrid of the two; one that offers the advantages of both. This sounds too good to be true, but it is fact what ScaleDB has done.

The underlying architecture is shared-disk, but in many situations it can operate like shared-nothing. You see the problems with shared-disk arise from the messaging necessary to (a) ship data among nodes and storage; and (b) synchronize the nodes in the cluster. The trick is to move the messaging outside of the transaction so it doesn’t impact performance. The way to achieve that is to exploit locality. Let me explain.

When using a shared-disk database, if your application or load balancer just randomly sprays the database requests to any node in the cluster, all of the nodes end up sharing all of the data. This involves a lot of data shipping between nodes and messaging to keep track of which node has what data and what they have done to it. This is at the core of the challenge for companies like ours to build shared-disk databases…it ain’t easy. There are many things you can do to optimize performance in such a scenario like local caching, shared cache (we use CAS, Oracle uses CacheFusion), etc. However, the bottom line is that even with these optimizations, random distribution of database requests results in suboptimal database performance for some scenarios.

Once you have solved the worst case scenario of random database requests, you can start optimizing for the intelligent routing of database requests. By this I mean that either the application or the load balancer sends specific database requests to specific nodes in the cluster. Intelligent database request routing results in something we in the shared-database world call locality. The database nodes are able to operate on local data while only updating the rest of the cluster asynchronously. In this scenario, the database nodes, which are still using a shared-disk architecture, operate much more independently, like shared-nothing. As a result, data shipping and messaging are almost completely eliminated, resulting in performance comparable to shared-nothing, while still maintaining the advantages of shared-disk.

The trick is for the database to recognize on-the-fly when the separate nodes can and cannot operate in this independent fashion. This is complicated by the fact that the database must recognize and adapt to locality which can evolve as database usage changes, nodes are added or removed, etc. This is one aspect of the secret sauce that is built into ScaleDB.

Note: Now that we’ve built a shared-disk database that can recognize locality and respond by acting (and performing) like a shared-nothing database, how do we achieve locality? There are many ways to achieve locality. It can be built into the application, or you can rely on a SQL-aware routing/caching solution like those available from Netscaler and Scalarc that handle this for you.