Friday, July 12, 2013

Why You Should Embrace Database Virtualization

This article addresses the benefits provided from database virtualization. Before we proceed however, it is important to explain that database virtualization does NOT mean simply running a DBMS inside a virtual machine.

Database Virtualization, More Than Running a DBMS in a Virtual Machine
While running a DBMS in a VM can provide advantages (and disadvantages) it is NOT database virtualization. Typical databases fuse together the data (or I/O) with the processing (CPU utilization) to operate as a single unit. Simply running that single unit in a VM does not provide the benefits detailed below. That is not database virtualization that is merely server virtualization.

An Example of the Database Virtualization Problem
Say you have a database handling banking and I have $10MM in the bank (I wish). Now let’s assume that the bank is busy, so it bursts that database across 3 VM nodes in typical cloud-style.  Now each of those 3 nodes gets a command to wire out the full $10MM. Each node sees its balance at $10MM, so each one wires out the full amount, for a total wire transfer of $30MM…see the problem? In order to dynamically burst your database across nodes, you need a distributed locking mechanism so that all nodes see the same data and can lock other nodes from altering the same data independently. This sounds easy, but making it perform well is a massive undertaking. Only two companies have solved this problem: Oracle RAC and ScaleDB (for MySQL or MariaDB).

Defining Database Virtualization
  • It should enable the application to talk to a single virtual instance of the database, when in fact there are N number of actual nodes acting over the data.
  • It should separate the data processing (CPU) from the data (I/O) so that each can scale on demand and independently from the other.
  • For performance it should enable the actual processing of the data to be distributed to the various nodes on the storage tier (function shipping) to achieve maximum performance. Note: in practice, this is similar to MapReduce.
  • It should provide tiered caching, for performance, but also ensure cache coherence across the entire cluster.
Benefits of Database Virtualization

Higher Server Utilization: When the data is fused to the CPU, as a single unit, that one node is responsible for handling all usage spikes for its collection of data. This forces you to spit the data thinly, across many servers (siloes), forcing you to run each server at a low utilization rate. Database Virtualization decouples the data from the processing so that the spike in usage can be shared across many nodes on the fly. This enables you to run a virtualized database at a very high utilization rate.

Reduced Infrastructure Costs: Database virtualization enables you to use fewer servers, less power, less OS, tools, application licenses, network switches and storage, among other things.

Reduced Manpower Costs: Database virtualization simplifies the DBA’s job, since it uses only one schema and no sharding, it also simplifies backup processes, enabling the DBA to handle more databases. It reduces the application developer’s job because it eliminates code related to sharding: e.g. database routing, rebuilding relationships between shards (e.g. joins), and more. It also simplifies the network admin’s job because he manages fewer servers and they are identical.
Reduced Complexity: You only have a single database image, so elastically scaling up/down is simple and fast.

Increased Flexibility: Database virtualization brings the same flexibility to the database that server virtualization brings to the application tier. Resources are allocated and reallocated on the fly. If your usage profile changes, e.g. payroll one day, benefits the next, a virtual database uses the same shared infrastructure for any workload, while a traditional database does not.

Quality of Service: Since database images can move on the fly, without downtime, a noisy neighbor or noisy network is solved by simply moving the database to another node in your pool.

Availability: Unlike a traditional database, virtualized database nodes see all of the data, so they inherently provide failover for one another, addressing unplanned downtime. In regards to planned downtime, simply move the process to another server and take down the one that needs service, again without interruption.

Improved Performance: Because the pooled cache across the storage tier uses a Least Recently Used (LRU) algorithm, it can free up huge amounts of pooled cache to the then current workload, enabling near in-memory performance.  Also, as mentioned above, the distribution of processing to the storage tier enables high-performance parallel processing.

True database virtualization delivers a huge set of advantages that in many ways mirror the benefits server virtualization provides to applications. For this reason, we expect database virtualization to be the next big thing, following in the footsteps of server, storage and network virtualization.

Additional Resources:

Saturday, July 6, 2013

Don't Fall for the Fake Loan Fraud



We have been approached by a Sheik (claimed to be Sheikh A. R. Khalid Bin Mahfouz) and an investment banker out of London (who claimed his name is Harry Holt). But they change names faster than you change your underwear. Both were very excited to invest in the company. The Sheik wanted an equity investment, but we had to set-up a bank account in Asia somewhere first, which would have had a minimum account deposit. The “investment banker” needed his upfront money for the attorney to draft the agreement. I think he even had a real attorney who said she did need money before she would draft any agreement.

What to look for in a fake loan scam:

  1. Minimal due diligence, eager to invest large sums of money
  2. They use generic email address (Yahoo, Gmail, etc.) not tied to a company
  3. They have little to no Internet footprints (linkedin, search, etc.)
  4. They have minimal if any documents/brochures. The one from “Harry” was amusing. I copied sections of its text and found them word for word on various investment and VC websites. It was a plagiarized patchwork quilt of other websites.
  5. The term sheet from “Harry” looked pretty good because it was word for word from  book on the Internet (Note: Search for this line, on Google: “Bullet repayment at the Redemption Price, plus any and all accrued but unpaid interest at the Maturity Date, subject to the mandatory prepayment provisions”)
  6. You can ask for references (which they will ignore or maybe give fake ones) but don’t waste your time.
If you want to check deeper, look into the email headers. If you are using gmail you just click the drop down on the right and select show original to see the headers. http://www.projecthoneypot.org) or Interfraud (http://interfraud.org/ni_fake_loans.htm). If you are contacted for one of these fake loan/investment scams, copy the headers and send them to the email address posted at the interfraud site so they can have their email shut down.

Then you can copy their IP address and search for it. It will probably show up in Project Honeypot.

If you still aren’t sure because you want to believe that someone fell in love with you or your company and cannot wait to invest, inform them that: “My board of directors will not disburse any upfront money for any purpose, or set-up any bank account. Any fees MUST be paid out of proceeds from your investment and payment will occur no sooner than 30-days after the wire clears our account.” That should cause them to decide to pass on your opportunity because you are “unreasonable”.

It is truly despicable that these kind of heartless scum play on the hopes and dreams of entrepreneurs to extract their money. But I guess they wouldn’t play this game if there weren’t suckers out there falling for it regularly enough. Don’t be a sucker.

Wednesday, June 5, 2013

Problems with Open Source: Part 2

In my prior post on the problems with open source, I wrote that one issue that impacts open source revenues is the macro economy, and how a declining or difficult macro economy can result in reduction of revenues to open source companies. The following article talks about how financially troubled Spain is saving a "fortune" by moving to open source. The Spanish government's savings are coming at the expense of proprietary server software companies--most likely Microsoft--but I would be willing to bet that none of this "savings" is flowing to the open source vendors. That is what happens in a difficult macro economy.

Thursday, May 30, 2013

Problems with Open Source



Monty Widenius wrote about the problems with the open sourcemodel, or more specifically the problems he is experiencing with his open source project MariaDB. In a nutshell, it lacks two things: (1) developers committing code; (2) users paying. He then focuses primarily on #2, lack of paying customers.

I believe that Monty’s concerns are the result of a number of factors: 

  1.  Maturity (coolness factor): When a product is new and cool, developers want to work on it and customers get in the spirit and want to pay for it to continue to evolve. But once it becomes mature…eh not so much.
  2. Maturity (downstream revenues): When a product is new and cutting-edge, “experts” make a ton of money. Look at Hadoop experts now. But as it becomes mainstream, the experts are making far less and feel less charitable toward their respective open source project.
  3. Maturity (market adoption): When you are one of the few early adopters of an open source project you may be more charitable toward the company in an effort to see it survive. Once it gains universal appeal, you figure that the rest of the people will pay so you don’t need to…in other words, “they are a success now, no need to continue funding them.”
  4. Macro Economy: If the macro economy is tight, as it is now, and companies are looking for where to cut, it is easier to cut funding to an “optional donation” than to cut one more individual. This is similar to the “downstream revenues” issue above but at the company level.

Open source projects follow a cycle, just like most everything in life. Commercial products achieve peak revenues with maturity and broad adoption. I believe that open source projects are the inverse, with maturity comes a decline in revenues. Ironically, it could well be that success is dangerous to a company's health.

Monty has some interesting ideas on separating "free" from open source...at least to some degree.

Monday, May 6, 2013

Large Database



Just a heads-up that we have added a Large Database page on ScaleDB that talks about many of the issues facing people trying to implement a large database, such as design, backup/restore, to index or not to index, and much more. Enjoy.

Thursday, May 2, 2013

Thoughts on Xeround and Free!


Everybody loves free. It is the best marketing term one could use. Once you say “FREE” the people come running. Free makes you very popular. Whether you are a politician offering something for free, or a company providing free stuff, you gain instant popularity.

Xeround is shutting down their MySQL Database as a Service (DBaaS) because their free instances, while popular, simply did not convert into sufficient paid instances to support the company. While I am sad to see them fail, because I appreciate the hard work required to deliver database technology, this announcement was not unexpected.

My company was at Percona Live, the MySQL conference, and I had some additional conversations along these same lines. One previously closed source company announced that they were open sourcing their code, it was a very popular announcement. A keynote speaker mentioned it and the crowd clapped excitedly. Was it because they couldn’t wait to edit the code? Probably not. Was it because now the code would evolve faster? Probably not, since it is very low-level and niche oriented, and there will be few committers. No, I think it was the excitement of “free”. The company was excited about a 49X increase in web traffic, but had no idea what the impact would be on actual revenues.

I spoke with another company, also a low-level and niche product, and they have been open source from the start. I asked about their revenues, they are essentially non-existent. Bottom line is that the plan was for them to make money on services…well Percona, Pythian, SkySQL and others have the customer relationships and they scoop up all of the consulting and support revenue while this company makes bupkis. I feel for them.

I had a friend tell me that ScaleDB should open source our code to get more customers. Yes open source gets you a lot of free users…not customers. It is a hard path to sell your first 10...25…50…etc. customers, but the revenue from those customers fuels additional development and makes you a fountain of technology. Open source and free are great for getting big quickly and getting acquired, but it seems that if the acquisition doesn’t happen, then you can quickly run out of money using this model (see Xeround).

I realize that this is an unpopular position. I realize that everybody loves free. I realize that open source has additional advantages (no lock-in, rapid development, etc.), but in my opinion, open source works in only two scenarios: (1) where the absolute volume is huge, creating a funnel for conversion (e.g. Linux); (2) where you need to unseat an entrenched competitor and you have other sources of revenue (e.g. OpenStack).

I look forward to your comments. We also look forward to working with Xeround customers who are looking for another solution.

Monday, January 7, 2013

Database Virtualization, What it Really Means


This is a response to a blog post by analyst and marketing consultant Curt Monash.

Originally virtualization meant running one operating system in a window inside of another operating system, e.g. running a Linux on a Windows machine using Microsoft Virtual PC or VMWare. Then virtualization evolved to mean slicing a single server into many for more granular resource allocation (Curt’s ex uno plures, translated: out of one, many). It has since expanded to include e pluribus unum (from many, one) and e pluribus ad pluribus (from many to many). This is evidenced in the use of the term “virtualization” to create the compound words: server virtualization, storage virtualization, network virtualization and now database virtualization.

Server Virtualization: Abstracts the physical (servers), presenting it as a logical entity or entities. VMWare enables dividing single physical resources (compute or storage) into multiple smaller units (one-to-many), as well as combining multiple physical units into a single logical unit, which they call clustering (many-to-one). Since a clustered collection of physical servers can address a clustered collection of physical storage devices, it therefore also supports the many-to-many configuration. If we extract the essence of virtualization it is the ability to address compute and storage resources logically, while abstracting the underlying physical representation.

This modern definition of virtualization is also evident in the following terms:

Storage Virtualization: Splitting a single disk into multiple virtual partitions (one-to many), a single logical view that spans multiple physical disks (RAID) and splitting multiple disks, often for high-availability, across multiple logical storage devices (mirroring or LUNs).  see also Logical Volume Management

Network Virtualization: “In computing, network virtualization is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network.” This is a many-to-one model. 

Virtual IP Addresses: This enables the application to have a single IP address that actually maps to multiple NICs. 

So this brings us to the topic of defining “Database Virtualization”. We believe that the most comprehensive description of database virtualization is the abstracting from physical resources (compute, data, RAM) to the logical representation, supporting many-to-one, one-to-many and many-to-many relationships. This is exactly what ScaleDB does better than any other database company, and this is why we are considered leaders in the nascent field of database virtualization.

ScaleDB provides a single logical view (of a single database) while that database is actually comprised of a cluster of multiple database instances operating over shared data. Whether you call this many-to-one (many database nodes acting as one logical database) or one-to-many (one logical database split across many nodes) is a matter of perspective. In either case, this enables independent scaling of both compute and I/O, eliminating the need for painful sharding, while also supporting multi-tenancy.

Curt then puts words in our mouths stating that we claim: “Any interesting database topology should be called “database virtualization”.” We make no such a claim. In fact, we state very clearly on our database virtualization page: “Database virtualization means different things to different people; from simply running the database executable in a virtual machine, or using virtualized storage, to a fully virtualized elastic database cluster composed of modular compute and storage components that are assembled on the fly to accommodate your database needs.”

In the marketing world, perception is reality. Since people are making claims of providing database virtualization, it is only prudent to include and compare their products, in a comprehensive evaluation of the space. Just as Curt addresses many things that are not databases (e.g. Memcached) in order to provide the reader with a comprehensive understanding, so do we when talking about database virtualization. One need only consider that we include “running the database executable in a virtual machine” as one of the approaches that some consider to be “database virtualization.” While we consider our approach to be the best solution for database virtualization, ignoring what other people consider to be database virtualization would have displayed extreme hubris on our part.

We appreciate Curt shining the light on database virtualization and we agree that it is “a hot subject right now.” It is a new field and therefore requires a comprehensive evaluation of all claims and approaches, enabling customers to decide which is the best approach for their needs. We remain quite confident that we will continue to lead the database virtualization market based upon our architectural advantages.

As the database virtualization market heats up, and as we enhance our solution set, we remain confident that Curt and other analysts will come to appreciate our unique advantages in this market.