Showing posts with label scaledb. Show all posts
Showing posts with label scaledb. Show all posts

Thursday, May 2, 2013

Thoughts on Xeround and Free!


Everybody loves free. It is the best marketing term one could use. Once you say “FREE” the people come running. Free makes you very popular. Whether you are a politician offering something for free, or a company providing free stuff, you gain instant popularity.

Xeround is shutting down their MySQL Database as a Service (DBaaS) because their free instances, while popular, simply did not convert into sufficient paid instances to support the company. While I am sad to see them fail, because I appreciate the hard work required to deliver database technology, this announcement was not unexpected.

My company was at Percona Live, the MySQL conference, and I had some additional conversations along these same lines. One previously closed source company announced that they were open sourcing their code, it was a very popular announcement. A keynote speaker mentioned it and the crowd clapped excitedly. Was it because they couldn’t wait to edit the code? Probably not. Was it because now the code would evolve faster? Probably not, since it is very low-level and niche oriented, and there will be few committers. No, I think it was the excitement of “free”. The company was excited about a 49X increase in web traffic, but had no idea what the impact would be on actual revenues.

I spoke with another company, also a low-level and niche product, and they have been open source from the start. I asked about their revenues, they are essentially non-existent. Bottom line is that the plan was for them to make money on services…well Percona, Pythian, SkySQL and others have the customer relationships and they scoop up all of the consulting and support revenue while this company makes bupkis. I feel for them.

I had a friend tell me that ScaleDB should open source our code to get more customers. Yes open source gets you a lot of free users…not customers. It is a hard path to sell your first 10...25…50…etc. customers, but the revenue from those customers fuels additional development and makes you a fountain of technology. Open source and free are great for getting big quickly and getting acquired, but it seems that if the acquisition doesn’t happen, then you can quickly run out of money using this model (see Xeround).

I realize that this is an unpopular position. I realize that everybody loves free. I realize that open source has additional advantages (no lock-in, rapid development, etc.), but in my opinion, open source works in only two scenarios: (1) where the absolute volume is huge, creating a funnel for conversion (e.g. Linux); (2) where you need to unseat an entrenched competitor and you have other sources of revenue (e.g. OpenStack).

I look forward to your comments. We also look forward to working with Xeround customers who are looking for another solution.

Monday, January 7, 2013

Database Virtualization, What it Really Means


This is a response to a blog post by analyst and marketing consultant Curt Monash.

Originally virtualization meant running one operating system in a window inside of another operating system, e.g. running a Linux on a Windows machine using Microsoft Virtual PC or VMWare. Then virtualization evolved to mean slicing a single server into many for more granular resource allocation (Curt’s ex uno plures, translated: out of one, many). It has since expanded to include e pluribus unum (from many, one) and e pluribus ad pluribus (from many to many). This is evidenced in the use of the term “virtualization” to create the compound words: server virtualization, storage virtualization, network virtualization and now database virtualization.

Server Virtualization: Abstracts the physical (servers), presenting it as a logical entity or entities. VMWare enables dividing single physical resources (compute or storage) into multiple smaller units (one-to-many), as well as combining multiple physical units into a single logical unit, which they call clustering (many-to-one). Since a clustered collection of physical servers can address a clustered collection of physical storage devices, it therefore also supports the many-to-many configuration. If we extract the essence of virtualization it is the ability to address compute and storage resources logically, while abstracting the underlying physical representation.

This modern definition of virtualization is also evident in the following terms:

Storage Virtualization: Splitting a single disk into multiple virtual partitions (one-to many), a single logical view that spans multiple physical disks (RAID) and splitting multiple disks, often for high-availability, across multiple logical storage devices (mirroring or LUNs).  see also Logical Volume Management

Network Virtualization: “In computing, network virtualization is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network.” This is a many-to-one model. 

Virtual IP Addresses: This enables the application to have a single IP address that actually maps to multiple NICs. 

So this brings us to the topic of defining “Database Virtualization”. We believe that the most comprehensive description of database virtualization is the abstracting from physical resources (compute, data, RAM) to the logical representation, supporting many-to-one, one-to-many and many-to-many relationships. This is exactly what ScaleDB does better than any other database company, and this is why we are considered leaders in the nascent field of database virtualization.

ScaleDB provides a single logical view (of a single database) while that database is actually comprised of a cluster of multiple database instances operating over shared data. Whether you call this many-to-one (many database nodes acting as one logical database) or one-to-many (one logical database split across many nodes) is a matter of perspective. In either case, this enables independent scaling of both compute and I/O, eliminating the need for painful sharding, while also supporting multi-tenancy.

Curt then puts words in our mouths stating that we claim: “Any interesting database topology should be called “database virtualization”.” We make no such a claim. In fact, we state very clearly on our database virtualization page: “Database virtualization means different things to different people; from simply running the database executable in a virtual machine, or using virtualized storage, to a fully virtualized elastic database cluster composed of modular compute and storage components that are assembled on the fly to accommodate your database needs.”

In the marketing world, perception is reality. Since people are making claims of providing database virtualization, it is only prudent to include and compare their products, in a comprehensive evaluation of the space. Just as Curt addresses many things that are not databases (e.g. Memcached) in order to provide the reader with a comprehensive understanding, so do we when talking about database virtualization. One need only consider that we include “running the database executable in a virtual machine” as one of the approaches that some consider to be “database virtualization.” While we consider our approach to be the best solution for database virtualization, ignoring what other people consider to be database virtualization would have displayed extreme hubris on our part.

We appreciate Curt shining the light on database virtualization and we agree that it is “a hot subject right now.” It is a new field and therefore requires a comprehensive evaluation of all claims and approaches, enabling customers to decide which is the best approach for their needs. We remain quite confident that we will continue to lead the database virtualization market based upon our architectural advantages.

As the database virtualization market heats up, and as we enhance our solution set, we remain confident that Curt and other analysts will come to appreciate our unique advantages in this market. 

Tuesday, September 20, 2011

Cloud DaaS Managed Service Fuels NewSQL Market

As public clouds are commoditized, the public cloud vendors are increasingly moving to higher margin and stickier managed services. In the early days of the public cloud, renting compute and storage was unique, exciting, sticky and profitable. It has quickly become a commodity. In order to provide differentiation, maintain margins and create barriers to customer exit, against increasing competition, the cloud is moving toward a collection of managed services.

Public clouds are growing beyond simple compute instances to platform as a service (PaaS). PaaS is then comprised of various modules, including database as a service (DaaS). In the early days you rented a number of compute instances, loaded your database software and you were the DBA managing all aspects of that database. Increasingly, public clouds are moving toward a DaaS model, where the cloud customer writes to a simple database API and the cloud provider is the DBA.

If the database resides in a single server and does not require high-availability, providing that as a managed service is no problem. Of course, if this is the use case, then it is no problem for the customer to manage their own database. In other words, there is little value to a managed service.

The real value-add for the customer, and hence the real price premium, is derived by offering things like auto-scaling across multiple servers, hot backup, high-availability, etc. If the public cloud provider can offer a SQL-based DaaS, where the customer writes to a simple API and everything else is handled for them, that is a tremendous value and customers will pay a premium for it.

While this sounds simple, public cloud companies soon learn that the Devil is in the details. Managing someone else’s database, without insight into their business processes, performance demands, scaling demands, evolving application requirements, and more, is extremely challenging and demands a new class of DBMS. These demands have created a market need that is now being filled by companies using the moniker “NewSQL”.

In short, when it comes to DaaS, public cloud vendors want the following:
• Simple “write to our API and we’ll handle the messy stuff like scaling, HA, etc.”
• Premium value that translates to a higher profit margin business
• Barriers to customer exit

Future posts will delve into the operational demands of DaaS, and how these demands a driving NewSQL DBMS architectures and features.

Monday, July 25, 2011

ScaleDB: Shared-Disk / Shared-Nothing Hybrid

The primary database architectures—shared-disk and shared-nothing—each have their advantages. Shared-disk has functional advantages such as high-availability, elasticity, ease of set-up and maintenance, eliminates partitioning/sharding, eliminates master-slave, etc. The shared-nothing advantages are better performance and lower costs. What if you could offer a database that is a hybrid of the two; one that offers the advantages of both. This sounds too good to be true, but it is fact what ScaleDB has done.

The underlying architecture is shared-disk, but in many situations it can operate like shared-nothing. You see the problems with shared-disk arise from the messaging necessary to (a) ship data among nodes and storage; and (b) synchronize the nodes in the cluster. The trick is to move the messaging outside of the transaction so it doesn’t impact performance. The way to achieve that is to exploit locality. Let me explain.

When using a shared-disk database, if your application or load balancer just randomly sprays the database requests to any node in the cluster, all of the nodes end up sharing all of the data. This involves a lot of data shipping between nodes and messaging to keep track of which node has what data and what they have done to it. This is at the core of the challenge for companies like ours to build shared-disk databases…it ain’t easy. There are many things you can do to optimize performance in such a scenario like local caching, shared cache (we use CAS, Oracle uses CacheFusion), etc. However, the bottom line is that even with these optimizations, random distribution of database requests results in suboptimal database performance for some scenarios.

Once you have solved the worst case scenario of random database requests, you can start optimizing for the intelligent routing of database requests. By this I mean that either the application or the load balancer sends specific database requests to specific nodes in the cluster. Intelligent database request routing results in something we in the shared-database world call locality. The database nodes are able to operate on local data while only updating the rest of the cluster asynchronously. In this scenario, the database nodes, which are still using a shared-disk architecture, operate much more independently, like shared-nothing. As a result, data shipping and messaging are almost completely eliminated, resulting in performance comparable to shared-nothing, while still maintaining the advantages of shared-disk.

The trick is for the database to recognize on-the-fly when the separate nodes can and cannot operate in this independent fashion. This is complicated by the fact that the database must recognize and adapt to locality which can evolve as database usage changes, nodes are added or removed, etc. This is one aspect of the secret sauce that is built into ScaleDB.

Note: Now that we’ve built a shared-disk database that can recognize locality and respond by acting (and performing) like a shared-nothing database, how do we achieve locality? There are many ways to achieve locality. It can be built into the application, or you can rely on a SQL-aware routing/caching solution like those available from Netscaler and Scalarc that handle this for you.

Monday, August 2, 2010

Database Architectures & Performance II

As described in the prior post, the shared-disk performance dilemma is simple:

1. If each node stores/processes data in memory, versus disk, it is much faster.
2. Each node must expose the most recent data to the other nodes, so those other nodes are not using old data.

In other words, #1 above says flush data to disk VERY INFREQUENTLY for better performance, while #2 says flush everything to disk IMMEDIATELY for data consistency.

Oracle recognized this dilemma when they built Oracle Parallel Server (OPS), the precursor to Oracle Real Application Cluster (RAC). In order to address the problem, Oracle developed Cache Fusion.

Cache fusion is a peer-based shared cache. Each node works with a certain set of data in its local cache, until another node needs that data. When one node needs data from another node, it requests it directly from the cache, bypassing the disk completely. In order to minimize this data swapping between the local caches, RAC applications are optimized for data locality. Data locality means routing certain data requests to certain nodes, thereby enjoying a higher cache hit ratio and reducing data swapping between caches. Static data locality, built into the application, severely complicates the process of adding/removing nodes to the cluster.

ScaleDB encountered the same conflict between performance and consistency (or RAM vs. disk). However, ScaleDB’s shared cache was designed with the cloud in mind. The cloud imposes certain additional design criteria:

1. The number of nodes will increase/decrease in an elastic fashion.
2. A large percentage of MySQL users will require low-cost PC-based storage

Clearly, some type of shared-cache is imperative. Memcached demonstrates the efficiency of utilizing a separate cache tier above the database, so why not do something very similar beneath the database (between the database nodes and the storage)? The local cache on each database node is the most efficient use of cache, since it avoids a network hop. The shared cache tier, in ScaleDB’s case, the Cache Accelerator Server (CAS), then serves as a fast cache for data swapping.

The Cluster Manager coordinates the interactions between nodes. This includes both database locking and data swapping via the CAS. Nodes maintain the data in their local cache until that data is required by another node. The Cluster Manager then coordinates that data swapping via the CAS. This approach is more dynamic, because it doesn’t rely on a prior knowledge about the location of that data. This enables ScaleDB to support dynamic elasticity of database nodes, which is critical for cloud computing.

The following diagram describes how ScaleDB’s Cache Accelerator Server (CAS) is implemented.

While the diagram above shows a variety of physical servers, these can be virtual servers. An entire cluster, including the lock manager, database nodes and CAS could be implemented on just two physical servers.

Both ScaleDB and Oracle rely upon a shared cache to improve performance, while maintaining data consistency. Both approaches have their relative pros and cons. ScaleDB’s tier-based approach to shared cache is optimized for cloud environments, where dynamic elasticity is important. ScaleDB’s approach also enables some very interesting advantages in the storage tier, which will be enumerated in subsequent posts.

Thursday, April 1, 2010

ScaleDB Introduces Clustered Database Based Upon Water Vapor

ScaleDB is proud to announce the introduction of a database that takes data storage to a new level, and a new altitude. ScaleDB’s patent pending “molecular-flipping technology” enables low energy molecular flipping that changes selected water molecules from H20 to HOH, representing positive and negative states that mimic the storage mechanism used on hard drive disks.

“Because we act at the molecular level, we achieve massive storage density with minimal energy consumption, which is critical in today’s data centers, where energy consumption is the primary cost,” said Mike Hogan, ScaleDB CEO. “A single thimble of water vapor provides the same storage capacity as a high-end SAN.”

The technology does have one small challenge: persistence. Clouds are not known for their persistence. ScaleDB relies on the Cumulus formation, since it is far beefier than some of those wimpy cirrus clouds. However, when deployed in the data center, the dry heat can be particularly damaging to cloud maintenance. One of the company’s patents centers around using heavy water, which resists evaporation and is therefore far more persistent than its lighter brethren. The company has already received approval from the IAEA to commercialize this technique.

This new technology considerably improves ScaleDB’s “green cred”. By greatly reducing energy consumption in data centers, it cuts their carbon footprint, leaving little more than a toeprint. Once the cloud storage—which has a 3-year half-life—is worn out, you can release it into the atmosphere. There is mingles with natural clouds making them denser and more reflective. Leading IPCC climate scientists have modeled the effects of this mingling and the scientific consensus is that it will reduce global temperatures by 5-6 degrees centigrade within 20 years (+/- 10 degrees centigrade). The company is in negotiations with Al Gore to promote this new technology, but they cannot comment on these negotiations because the mere fact that such negotiations are in fact happening is covered by a strict NDA and the even more legally binding pinky promise.

ScaleDB set out to become THE cloud database company and today’s announcement really takes that to a whole new level. The tentative name for this new database is VaporWare.