Showing posts with label cloud database. Show all posts
Showing posts with label cloud database. Show all posts

Friday, July 12, 2013

Why You Should Embrace Database Virtualization

This article addresses the benefits provided from database virtualization. Before we proceed however, it is important to explain that database virtualization does NOT mean simply running a DBMS inside a virtual machine.

Database Virtualization, More Than Running a DBMS in a Virtual Machine
While running a DBMS in a VM can provide advantages (and disadvantages) it is NOT database virtualization. Typical databases fuse together the data (or I/O) with the processing (CPU utilization) to operate as a single unit. Simply running that single unit in a VM does not provide the benefits detailed below. That is not database virtualization that is merely server virtualization.

An Example of the Database Virtualization Problem
Say you have a database handling banking and I have $10MM in the bank (I wish). Now let’s assume that the bank is busy, so it bursts that database across 3 VM nodes in typical cloud-style.  Now each of those 3 nodes gets a command to wire out the full $10MM. Each node sees its balance at $10MM, so each one wires out the full amount, for a total wire transfer of $30MM…see the problem? In order to dynamically burst your database across nodes, you need a distributed locking mechanism so that all nodes see the same data and can lock other nodes from altering the same data independently. This sounds easy, but making it perform well is a massive undertaking. Only two companies have solved this problem: Oracle RAC and ScaleDB (for MySQL or MariaDB).

Defining Database Virtualization
  • It should enable the application to talk to a single virtual instance of the database, when in fact there are N number of actual nodes acting over the data.
  • It should separate the data processing (CPU) from the data (I/O) so that each can scale on demand and independently from the other.
  • For performance it should enable the actual processing of the data to be distributed to the various nodes on the storage tier (function shipping) to achieve maximum performance. Note: in practice, this is similar to MapReduce.
  • It should provide tiered caching, for performance, but also ensure cache coherence across the entire cluster.
Benefits of Database Virtualization

Higher Server Utilization: When the data is fused to the CPU, as a single unit, that one node is responsible for handling all usage spikes for its collection of data. This forces you to spit the data thinly, across many servers (siloes), forcing you to run each server at a low utilization rate. Database Virtualization decouples the data from the processing so that the spike in usage can be shared across many nodes on the fly. This enables you to run a virtualized database at a very high utilization rate.

Reduced Infrastructure Costs: Database virtualization enables you to use fewer servers, less power, less OS, tools, application licenses, network switches and storage, among other things.

Reduced Manpower Costs: Database virtualization simplifies the DBA’s job, since it uses only one schema and no sharding, it also simplifies backup processes, enabling the DBA to handle more databases. It reduces the application developer’s job because it eliminates code related to sharding: e.g. database routing, rebuilding relationships between shards (e.g. joins), and more. It also simplifies the network admin’s job because he manages fewer servers and they are identical.
Reduced Complexity: You only have a single database image, so elastically scaling up/down is simple and fast.

Increased Flexibility: Database virtualization brings the same flexibility to the database that server virtualization brings to the application tier. Resources are allocated and reallocated on the fly. If your usage profile changes, e.g. payroll one day, benefits the next, a virtual database uses the same shared infrastructure for any workload, while a traditional database does not.

Quality of Service: Since database images can move on the fly, without downtime, a noisy neighbor or noisy network is solved by simply moving the database to another node in your pool.

Availability: Unlike a traditional database, virtualized database nodes see all of the data, so they inherently provide failover for one another, addressing unplanned downtime. In regards to planned downtime, simply move the process to another server and take down the one that needs service, again without interruption.

Improved Performance: Because the pooled cache across the storage tier uses a Least Recently Used (LRU) algorithm, it can free up huge amounts of pooled cache to the then current workload, enabling near in-memory performance.  Also, as mentioned above, the distribution of processing to the storage tier enables high-performance parallel processing.

True database virtualization delivers a huge set of advantages that in many ways mirror the benefits server virtualization provides to applications. For this reason, we expect database virtualization to be the next big thing, following in the footsteps of server, storage and network virtualization.

Additional Resources:

Thursday, September 22, 2011

Lack of Business Visibility Cripples Traditional SQL DaaS, Drives NewSQL

More and more public cloud companies are moving to managed cloud services to improve their value-add (price premium) and the stickiness of their solution. However, the shift to a database as a service (DaaS) severely reduces the DBAs visibility into the business, thus limiting the ability to hand tune the database to the requirements of the application and the database. The solution is a cloud database that eliminates the hand-tuning of the database, thereby enabling the DBA to be equally effective even with limited visibility into the business and application needs. It is these unique needs, particularly for SQL databases, that is fueling the NewSQL movement.

DBAs traditionally have insight into the company, enabling them to hand tune the database in a collaborative basis with the development team, such as:

1. Performance Trade-offs/Tuning: The database is partitioned and tuned to address business requirements, maximizing performance of certain critical processes, while slowing less critical processes.

2. System Maintenance Planning: You need to shut down the database to do an application/database upgrade, repartition, etc. but you may need to coordinate with the development team, and the schedule may change up until the last moment.

3. Application Evolution: The database must be designed and tuned to accommodate the planned changes in the application.

4. Consulting with Application Developers: Since the database partitioning and performance are hand-tuned, the DBA must collaborate closely with the application development team on design, development and deployment.

5. Partitioning Requirements: When partitioning (or repartitioning) your database you’ll need to partition the data to suit application requirements avoiding things like cross partition joins, range scans, aggregates, etc., which can cause tremendous performance penalties if not implemented correctly.

6. Moving Processes to the Application Layer: Single server databases can handle joins, range scans, aggregates and the like internally. However, when you partition the database these functions are typically moved to the application layer. The application must add the logic to accomplish these things. It must also add the routing code to point to the correct partitions. As a result, an application written to a single server API does not work in a multi-server configuration.

When moving from a self-managed database—either in the cloud or on premise—to a DaaS, the “DBA-in-the-cloud” doesn’t have that visibility into the business requirements, performance requirements, development schedule, and more. This lack of visibility turns the already challenging task of hand-tuning the database into a near impossibility using traditional databases.

A real world example: You are the DBA-in-the-cloud. Your customer has been running on a single server, but they need to now scale-out across multiple database nodes to accommodate growth. How do you split the data? You don’t know which queries have higher performance priority. You don’t know the development plan for new version of the application. You don’t know when it is convenient to shut down the application to implement the partitioning. You need to inform the application developers how they should implement their routing code to send database requests to the right nodes. You need to inform the application developers that they need to handle joins, range scans and aggregates at the application level, since the database can no longer handle those.

DBAs have enough of a challenge scaling and maintaining databases when they have full visibility into the business. To address these challenges without that level of visibility is unrealistic. I refer to this problem as the “Blind DBA” challenge, because the DaaS DBA has a serious lack of visibility into the requirements and inner workings of the company.

ScaleDB is a NewSQL database uniquely designed to handle the Blind DBA challenge inherent in DaaS implementations. ScaleDB is based on a shared-disk architecture that scales in a manner that is invisible to the application. It eliminates the need to push functions (joins, range scans, aggregates) up to the application layer. It eliminates the need to coordinate your application and database design to work around partitions, because there are no partitions. It eliminates the performance tradeoffs in partitioning design, you get consistent performance across the database. It eliminates scheduling and coordinating application shut-down to repartition, because it doesn’t use partitions. In short, with ScaleDB the DBA needn’t have any visibility into the business to deliver optimal database performance via a single API. This is what makes ScaleDB an ideal solution for cloud companies implementing the database as a managed service or DaaS.

Monday, August 15, 2011

Do you need an elastic database?

Not every company or application needs an elastic database. Some applications can get by just fine on a single database server, rendering database elasticity moot from their perspective. To make this determination, simply ask yourself:

1. Will I need more than a single database server?
Look at your current load and your projected growth and ask yourself whether it will exceed the capacity of a single server. If it doesn’t now, nor will it in the future, then you don’t need an elastic database.

2. Will my load fluctuate sufficiently to warrant the investment in elasticity?
If your database requirements won’t experience fluctuations in demand—e.g. daily, weekly, monthly, seasonal changes in the number of servers required—then elasticity isn’t important. For example, if you have a social networking application that requires 2 database nodes 24x7, but peaks at 10 nodes for 2 hours a night, then elasticity is important. If your database has a steady load that requires 3 database nodes and that load doesn’t change, then elasticity isn’t important.

3. Will my database load grow with time?
I your database load will grow with time, then you need to evaluate the growth pattern and ask yourself whether you can handle this expansion manually, or if you simply want to rely on an elastic database to grow seamlessly.

If your answer to these three questions is no, then you don’t really need an elastic database. Now there are certainly other reasons to consider ScaleDB (e.g. high-availability, eliminate partitioning/sharding, eliminate master-slave issues, etc.) but elasticity may not be one of them.

Wednesday, August 3, 2011

Cloud Elasticity & Databases

The primary reasons people are moving to the public cloud are: (1) replace capital expenses with operating expenses (pay as you go); (2) use shared resources for processes like back-up, maintenance, networking (shared expenses); (3) use shared infrastructure that enables you to pay only for those resources you actually use, instead of consuming your maximum load resources at all times (pay-per-use). The first thing you’ll notice is that all 3 cloud benefits have their basis in finances or the cloud business model.

We will focus in on #3 above: Pay-Per-Use. The old school model was to build your compute infrastructure for the maximum load today, plus growth over the life-cycle of the equipment, plus some buffer so the systems don’t get overloaded from spikes in usage. The net result is that your average usage might run 10% of the potential for the infrastructure you mortgaged your home to buy. In other words, you were paying 10X more than you would pay if you only paid by usage. In reality, you might pay half as much to run on the cloud, with the balance of the savings going to the cloud company in the form of profits. This works and it is a win-win for both you and the public cloud.

To achieve this Pay-Per-Use ideal, and the compelling financial advantages it enables, the infrastructure must scale elastically. You must be able to add compute power seamlessly and on the fly, without shut-down. How important is this elasticity? Amazon named their service “EC2” for “Elastic Cloud Computing”. Elastic is the first word, I would say it is pretty important. Besides, if the cloud weren’t elastic, you would simply be paying for the same computer costs, plus the public cloud company’s markup for expenses and profit.

So how elastic are public clouds? The entire cloud stack is elastic, except for one piece, the SQL database. Cloud companies recognized that the SQL database was the Achilles heel of cloud elasticity. To address this problem, they created NoSQL, which delivers database-like capabilities, but removes the things that make a SQL database inelastic; namely SQL, ACID-compliance, data consistency, transactions, etc.

NewSQL appears to be the response from the database vendors, who believe that there is a market for SQL databases that provide cloud elasticity. Not all NewSQL solutions address elasticity, but a few of us do. In my next blog post, I’ll address whether or not database elasticity is important…hint: it depends upon your needs.

Monday, July 25, 2011

ScaleDB: Shared-Disk / Shared-Nothing Hybrid

The primary database architectures—shared-disk and shared-nothing—each have their advantages. Shared-disk has functional advantages such as high-availability, elasticity, ease of set-up and maintenance, eliminates partitioning/sharding, eliminates master-slave, etc. The shared-nothing advantages are better performance and lower costs. What if you could offer a database that is a hybrid of the two; one that offers the advantages of both. This sounds too good to be true, but it is fact what ScaleDB has done.

The underlying architecture is shared-disk, but in many situations it can operate like shared-nothing. You see the problems with shared-disk arise from the messaging necessary to (a) ship data among nodes and storage; and (b) synchronize the nodes in the cluster. The trick is to move the messaging outside of the transaction so it doesn’t impact performance. The way to achieve that is to exploit locality. Let me explain.

When using a shared-disk database, if your application or load balancer just randomly sprays the database requests to any node in the cluster, all of the nodes end up sharing all of the data. This involves a lot of data shipping between nodes and messaging to keep track of which node has what data and what they have done to it. This is at the core of the challenge for companies like ours to build shared-disk databases…it ain’t easy. There are many things you can do to optimize performance in such a scenario like local caching, shared cache (we use CAS, Oracle uses CacheFusion), etc. However, the bottom line is that even with these optimizations, random distribution of database requests results in suboptimal database performance for some scenarios.

Once you have solved the worst case scenario of random database requests, you can start optimizing for the intelligent routing of database requests. By this I mean that either the application or the load balancer sends specific database requests to specific nodes in the cluster. Intelligent database request routing results in something we in the shared-database world call locality. The database nodes are able to operate on local data while only updating the rest of the cluster asynchronously. In this scenario, the database nodes, which are still using a shared-disk architecture, operate much more independently, like shared-nothing. As a result, data shipping and messaging are almost completely eliminated, resulting in performance comparable to shared-nothing, while still maintaining the advantages of shared-disk.

The trick is for the database to recognize on-the-fly when the separate nodes can and cannot operate in this independent fashion. This is complicated by the fact that the database must recognize and adapt to locality which can evolve as database usage changes, nodes are added or removed, etc. This is one aspect of the secret sauce that is built into ScaleDB.

Note: Now that we’ve built a shared-disk database that can recognize locality and respond by acting (and performing) like a shared-nothing database, how do we achieve locality? There are many ways to achieve locality. It can be built into the application, or you can rely on a SQL-aware routing/caching solution like those available from Netscaler and Scalarc that handle this for you.

Monday, March 14, 2011

The CAP Theorem Event Horizon

The CAP Theorem has become a convenient excuse for throwing data consistency under the bus. It is automatically assumed that every distributed system falls prey to CAP and therefore must sacrifice one of the three objectives, with consistency being the consistent fall guy. This automatic assumption is simply false. I am not debating the validity of the CAP Theorem, but instead positing that the onset of CAP limitations—what I call the CAP event horizon—does not start as soon as you move to a second master database node. Certain approaches can, in fact, extend the CAP event horizon.

Physics tells us that different properties apply at different scales. For example, quantum physics displays properties that do not apply at larger scale. We see similar nuances in scaling databases. For example, if you are running a master slave database, using synchronous replication with a single slave is no problem. Add nine more slaves and it slows the system. Add another ninety slaves and you have a real problem with synchronous replication. In other words, consistency at small scale is no problem, but at large scale it becomes impossible, because your latency goes parabolic.
If you break the database into master (read/write) and slave (read-only) functions. You can operate a handful of slaves using synchronous replication without crossing the CAP event horizon, at least among the slaves. However, the master does present a SPOF (single point of failure), undermining availability.

Using a shared-nothing architecture, as soon as you introduce more than a single master, you hit the CAP event horizon. However, shared-disk / shared-cache systems like Oracle RAC and ScaleDB extend the CAP event horizon. They don’t invalidate the CAP event horizon, they merely extend it, by addressing the CAP issues while maintaining low latency.

Shared-disk databases enable multiple nodes to share the same data. Internal processes ensure that the data remains consistent across all nodes, while providing availability and partition tolerance. These processes entail a certain overhead from inter-nodal messaging. There are techniques that can be applied to dramatically reduce this inter-nodal messaging, resulting in a system that delivers the advantages of shared-disk, while delivering a performance profile rivaling shared-nothing, but that will have to wait for a later post.

While shared-nothing databases cross the CAP event horizon as soon as you add a second master, shared-disk databases extend this event horizon well into a handful of database nodes. Optimizations can further extend this eventuality to address dozens of database nodes (all masters). As you move to web scale applications, you will certainly cross the CAP event horizon, but most OLTP type applications can operate quite effectively on ten or fewer database servers, and in that case there is no need to throw consistency under the bus, solely in the name of the CAP Theorem.

More details on the CAP Theorem here, here and here.

Friday, September 17, 2010

ScaleDB Cache Accelerator Server (CAS): A Game Changer for Clustered Databases

ScaleDB and Oracle RAC are both clustered databases that use a shared-disk architecture. As I have mentioned previously, they both actually share data via a shared cache, so it might be more appropriate to call them shared-cache databases.

Whether it is called shared-disk or shared-cache, these databases must orchestrate the sharing of a single set of data amongst multiple nodes. This introduces two challenges: the physical sharing of the data and the logical sharing of the data.

Physical Sharing:
Raw storage is meant to work on a 1:1 basis with a single server. In order to share that data amongst multiple servers, you need either a Network File System (NFS), which shares whole files, or a Cluster File System (CFS), which shares data blocks.

Logical Sharing:
This is specific to databases. A database may request a single block of data from the storage and then it may coordinate multiple sequential changes to that block, with only the final results being written back to the storage. The database can also discriminate between reading the data and writing the data, to facilitate parallelizing these actions.

Databases must control the logical sharing of data, in order to ensure that the database doesn’t become corrupted or inconsistent, and to ensure that it provides good performance. Because logical sharing is very specific to the database, it is something that clustered databases must handle themselves. This function is addressed by a lock manager.

Physical sharing of data requires less integration with the database logic. As such, you can use a general-purpose NFS or a CFS to provide the physical file sharing capabilities. This is what Oracle RAC does, they rely upon Oracle Cluster File System 2 (OCFS2) to provide generic physical file sharing. OCFS2 then relies upon a SAN or NAS that supports multi-attach, since all of the database nodes must share the same physical files. The NAS or SAN then handles the data duplication for high availability and other services like back-up and more.

ScaleDB takes a different approach. ScaleDB not only handles the logical data sharing—with its lock manager—but it also handles the physical data sharing with its Cache Accelerator Server (CAS). CAS connects directly to the storage and handles the sharing of that data among the database nodes. Because CAS is purpose-built for the ScaleDB database it does not need services such as membership management, which create complexity and overhead in a general purpose CFS. Furthermore, ScaleDB is able to tune the CAS, in conjunction with the lock manager, to extract superior performance.

CAS also offers additional benefits. It provides a scalable shared-cache that enables the database nodes to share via the cache, which is much faster than sharing via the disk. Furthermore, since it eliminates the need for an NFS or CFS, it enables you to work with any storage. You can choose to use local storage—inside the CAS—cloud storage, or a SAN or NAS. Many in the MySQL community balk at the high cost of SAN storage with fiber channel and switches and high-cost storage. CAS supports low-cost local storage, while providing a seamless path to high-end storage as needed. Furthermore, the CAS are deployed in pairs, so the data is mirrored. Because the data is mirrored, you have redundant storage, even when using local storage inside the servers running CAS. Because it can operate on commodity hardware and because it works with any storage, CAS is ideal for cloud computing.

In summary, clustered databases like Oracle RAC and ScaleDB must implement their own lock managers to manage the logical sharing of data amongst the database nodes. Providing a purpose-built solution for the physical sharing of the data, while not required, does provide some significant advantages over using a general purpose NFS or CFS.

Monday, August 2, 2010

Database Architectures & Performance II

As described in the prior post, the shared-disk performance dilemma is simple:

1. If each node stores/processes data in memory, versus disk, it is much faster.
2. Each node must expose the most recent data to the other nodes, so those other nodes are not using old data.

In other words, #1 above says flush data to disk VERY INFREQUENTLY for better performance, while #2 says flush everything to disk IMMEDIATELY for data consistency.

Oracle recognized this dilemma when they built Oracle Parallel Server (OPS), the precursor to Oracle Real Application Cluster (RAC). In order to address the problem, Oracle developed Cache Fusion.

Cache fusion is a peer-based shared cache. Each node works with a certain set of data in its local cache, until another node needs that data. When one node needs data from another node, it requests it directly from the cache, bypassing the disk completely. In order to minimize this data swapping between the local caches, RAC applications are optimized for data locality. Data locality means routing certain data requests to certain nodes, thereby enjoying a higher cache hit ratio and reducing data swapping between caches. Static data locality, built into the application, severely complicates the process of adding/removing nodes to the cluster.

ScaleDB encountered the same conflict between performance and consistency (or RAM vs. disk). However, ScaleDB’s shared cache was designed with the cloud in mind. The cloud imposes certain additional design criteria:

1. The number of nodes will increase/decrease in an elastic fashion.
2. A large percentage of MySQL users will require low-cost PC-based storage

Clearly, some type of shared-cache is imperative. Memcached demonstrates the efficiency of utilizing a separate cache tier above the database, so why not do something very similar beneath the database (between the database nodes and the storage)? The local cache on each database node is the most efficient use of cache, since it avoids a network hop. The shared cache tier, in ScaleDB’s case, the Cache Accelerator Server (CAS), then serves as a fast cache for data swapping.

The Cluster Manager coordinates the interactions between nodes. This includes both database locking and data swapping via the CAS. Nodes maintain the data in their local cache until that data is required by another node. The Cluster Manager then coordinates that data swapping via the CAS. This approach is more dynamic, because it doesn’t rely on a prior knowledge about the location of that data. This enables ScaleDB to support dynamic elasticity of database nodes, which is critical for cloud computing.

The following diagram describes how ScaleDB’s Cache Accelerator Server (CAS) is implemented.

While the diagram above shows a variety of physical servers, these can be virtual servers. An entire cluster, including the lock manager, database nodes and CAS could be implemented on just two physical servers.

Both ScaleDB and Oracle rely upon a shared cache to improve performance, while maintaining data consistency. Both approaches have their relative pros and cons. ScaleDB’s tier-based approach to shared cache is optimized for cloud environments, where dynamic elasticity is important. ScaleDB’s approach also enables some very interesting advantages in the storage tier, which will be enumerated in subsequent posts.

Tuesday, July 20, 2010

Database Architectures & Performance

For decades the debate between shared-disk and shared-nothing databases has raged. The shared-disk camp points to the laundry list of functional benefits such as improved data consistency, high-availability, scalability and elimination of partitioning/replication/promotion. The shared-nothing camp shoots back with superior performance and reduced costs. Both sides have a point.

First, let’s look at the performance issue. RAM (average access time of 200 nanoseconds) is considerably faster than disk (average access time of 12,000,000 nanoseconds). Let me put this 200:12,000,000 ratio into perspective. A task that takes a single minute in RAM would take 41 days in disk. So why do I bring this up?

Shared-Nothing: Since the shared-nothing database has sole ownership of its data—it doesn’t share the data with other nodes—it can operate in the machine’s local RAM, only writing infrequently to disk (flushing the data to disk). This makes shared-nothing databases very fast.

Shared-Disk: Cannot rely on the machine’s local RAM, because every write by one node must be instantly available to the other nodes, to ensure that they don’t use stale data and corrupt the database. So instead of relying on local RAM, all write transactions must be written to disk. This is where the 1 minute to 41 days ratio above comes into play and kills performance of shared-disk databases.

Let’s look at some of the ways databases can utilize RAM instead of disk to improve performance:

Read Cache: Databases typically use the RAM as a fast read cache. Upon reading data from the disk, this data is stored in the read cache so that subsequent use of that data is satisfied from RAM instead of the disk. For example, upon reading a person’s name from disk, that name is stored in the cache for fast access. The database wouldn’t need to read that name from disk again until that person’s name is changed (rare), or that RAM space is reused for a piece of data that is used more frequently. Read cache can significantly improve database performance.

BOTH shared-disk and shared-nothing databases can exploit read cache. The shared-disk database just needs a system to either invalidate or update the data in read cache when one of the nodes has made a change. This is pretty standard in shared-disk databases.

Background Writing: Writing data to the disk is by far the most time consuming process in a write transaction. During the transaction, that portion of the data is locked, meaning it is unavailable for other functions. So, if you can move the writing of the data outside of the transaction—write the data in the background—you get faster transactions, which means less locking contention, which means faster throughput.

SHARED-NOTHING can exploit this performance enhancement, since each server owns the data in its RAM. However, shared-disk databases cannot do this because they need to share that updated data with the other database nodes in the cluster. Since the local node’s cache is not shared, in a shared-disk database, the only option is to use the shared disk to share that data across the nodes.

Transactional Cache: The next step in utilizing RAM instead of disk is to use it in a transactional manner. This means that the database can make multiple changes to data in RAM prior to writing the final results to disk. For example, if you have 100 widgets, you can store that inventory count in RAM, and then decrement it with each sale. If you sell 23 widgets, then instead of writing each transaction to disk, you update it in RAM. When you flush this data to disk, it results in a single disk write, writing the inventory number 77, instead of writing each of the 23 transactions individually to disk.

SHARED-NOTHING can perform transactions on data while it is in RAM. Once again, shared-disk databases cannot do this because you might have multiple nodes updating the inventory. Since they cannot look into each others local RAM, they must once again write each transaction to disk.

As you can see, shared-nothing databases have an inherent performance advantage. The next blog post will address how modern shared-disk databases address these performance challenges.

Thursday, April 1, 2010

ScaleDB Introduces Clustered Database Based Upon Water Vapor

ScaleDB is proud to announce the introduction of a database that takes data storage to a new level, and a new altitude. ScaleDB’s patent pending “molecular-flipping technology” enables low energy molecular flipping that changes selected water molecules from H20 to HOH, representing positive and negative states that mimic the storage mechanism used on hard drive disks.

“Because we act at the molecular level, we achieve massive storage density with minimal energy consumption, which is critical in today’s data centers, where energy consumption is the primary cost,” said Mike Hogan, ScaleDB CEO. “A single thimble of water vapor provides the same storage capacity as a high-end SAN.”

The technology does have one small challenge: persistence. Clouds are not known for their persistence. ScaleDB relies on the Cumulus formation, since it is far beefier than some of those wimpy cirrus clouds. However, when deployed in the data center, the dry heat can be particularly damaging to cloud maintenance. One of the company’s patents centers around using heavy water, which resists evaporation and is therefore far more persistent than its lighter brethren. The company has already received approval from the IAEA to commercialize this technique.

This new technology considerably improves ScaleDB’s “green cred”. By greatly reducing energy consumption in data centers, it cuts their carbon footprint, leaving little more than a toeprint. Once the cloud storage—which has a 3-year half-life—is worn out, you can release it into the atmosphere. There is mingles with natural clouds making them denser and more reflective. Leading IPCC climate scientists have modeled the effects of this mingling and the scientific consensus is that it will reduce global temperatures by 5-6 degrees centigrade within 20 years (+/- 10 degrees centigrade). The company is in negotiations with Al Gore to promote this new technology, but they cannot comment on these negotiations because the mere fact that such negotiations are in fact happening is covered by a strict NDA and the even more legally binding pinky promise.

ScaleDB set out to become THE cloud database company and today’s announcement really takes that to a whole new level. The tentative name for this new database is VaporWare.