1 hour ago
Showing posts with label mysql. Show all posts
Showing posts with label mysql. Show all posts
Wednesday, June 5, 2013
Problems with Open Source: Part 2
In my prior post on the problems with open source, I wrote that one issue that impacts open source revenues is the macro economy, and how a declining or difficult macro economy can result in reduction of revenues to open source companies. The following article talks about how financially troubled Spain is saving a "fortune" by moving to open source. The Spanish government's savings are coming at the expense of proprietary server software companies--most likely Microsoft--but I would be willing to bet that none of this "savings" is flowing to the open source vendors. That is what happens in a difficult macro economy.
Wednesday, August 24, 2011
The Future of NoSQL (Companies)…
A friend recently bought a GM car. I proceeded to inform him that I am shorting GM stock (technically a put option). He was shocked. “But they make great cars,” he exclaimed. I responded, “I’m not shorting the cars, I’m shorting the company.” Why am I recounting this exchange? Because I believe that the new wave of NoSQL companies—as opposed to the rebranded ODBMS—presents the same situation. I am long the products, but short the companies.
Let me explain. NoSQL companies have built some very cool products that solve real business problems. The challenge is that they are all open source products serving niche markets. They have customer funnels that are simply too small to sustain the companies given their low conversion/monetization rates.
These companies could certainly be tasty acquisition targets for companies that actually make money. But as standalone companies, sadly, I would short them. On that note, I am off to the NoSQL Now! Conference. Hopefully, this post won't get me beat-up while cruising the conference.
Monday, August 15, 2011
Do you need an elastic database?
Not every company or application needs an elastic database. Some applications can get by just fine on a single database server, rendering database elasticity moot from their perspective. To make this determination, simply ask yourself:
1. Will I need more than a single database server?
Look at your current load and your projected growth and ask yourself whether it will exceed the capacity of a single server. If it doesn’t now, nor will it in the future, then you don’t need an elastic database.
2. Will my load fluctuate sufficiently to warrant the investment in elasticity?
If your database requirements won’t experience fluctuations in demand—e.g. daily, weekly, monthly, seasonal changes in the number of servers required—then elasticity isn’t important. For example, if you have a social networking application that requires 2 database nodes 24x7, but peaks at 10 nodes for 2 hours a night, then elasticity is important. If your database has a steady load that requires 3 database nodes and that load doesn’t change, then elasticity isn’t important.
3. Will my database load grow with time?
I your database load will grow with time, then you need to evaluate the growth pattern and ask yourself whether you can handle this expansion manually, or if you simply want to rely on an elastic database to grow seamlessly.
If your answer to these three questions is no, then you don’t really need an elastic database. Now there are certainly other reasons to consider ScaleDB (e.g. high-availability, eliminate partitioning/sharding, eliminate master-slave issues, etc.) but elasticity may not be one of them.
Wednesday, August 3, 2011
Cloud Elasticity & Databases
The primary reasons people are moving to the public cloud are: (1) replace capital expenses with operating expenses (pay as you go); (2) use shared resources for processes like back-up, maintenance, networking (shared expenses); (3) use shared infrastructure that enables you to pay only for those resources you actually use, instead of consuming your maximum load resources at all times (pay-per-use). The first thing you’ll notice is that all 3 cloud benefits have their basis in finances or the cloud business model.
We will focus in on #3 above: Pay-Per-Use. The old school model was to build your compute infrastructure for the maximum load today, plus growth over the life-cycle of the equipment, plus some buffer so the systems don’t get overloaded from spikes in usage. The net result is that your average usage might run 10% of the potential for the infrastructure you mortgaged your home to buy. In other words, you were paying 10X more than you would pay if you only paid by usage. In reality, you might pay half as much to run on the cloud, with the balance of the savings going to the cloud company in the form of profits. This works and it is a win-win for both you and the public cloud.
To achieve this Pay-Per-Use ideal, and the compelling financial advantages it enables, the infrastructure must scale elastically. You must be able to add compute power seamlessly and on the fly, without shut-down. How important is this elasticity? Amazon named their service “EC2” for “Elastic Cloud Computing”. Elastic is the first word, I would say it is pretty important. Besides, if the cloud weren’t elastic, you would simply be paying for the same computer costs, plus the public cloud company’s markup for expenses and profit.
So how elastic are public clouds? The entire cloud stack is elastic, except for one piece, the SQL database. Cloud companies recognized that the SQL database was the Achilles heel of cloud elasticity. To address this problem, they created NoSQL, which delivers database-like capabilities, but removes the things that make a SQL database inelastic; namely SQL, ACID-compliance, data consistency, transactions, etc.
NewSQL appears to be the response from the database vendors, who believe that there is a market for SQL databases that provide cloud elasticity. Not all NewSQL solutions address elasticity, but a few of us do. In my next blog post, I’ll address whether or not database elasticity is important…hint: it depends upon your needs.
Monday, August 2, 2010
Database Architectures & Performance II
As described in the prior post, the shared-disk performance dilemma is simple:
1. If each node stores/processes data in memory, versus disk, it is much faster.
2. Each node must expose the most recent data to the other nodes, so those other nodes are not using old data.
In other words, #1 above says flush data to disk VERY INFREQUENTLY for better performance, while #2 says flush everything to disk IMMEDIATELY for data consistency.
Oracle recognized this dilemma when they built Oracle Parallel Server (OPS), the precursor to Oracle Real Application Cluster (RAC). In order to address the problem, Oracle developed Cache Fusion.
Cache fusion is a peer-based shared cache. Each node works with a certain set of data in its local cache, until another node needs that data. When one node needs data from another node, it requests it directly from the cache, bypassing the disk completely. In order to minimize this data swapping between the local caches, RAC applications are optimized for data locality. Data locality means routing certain data requests to certain nodes, thereby enjoying a higher cache hit ratio and reducing data swapping between caches. Static data locality, built into the application, severely complicates the process of adding/removing nodes to the cluster.
ScaleDB encountered the same conflict between performance and consistency (or RAM vs. disk). However, ScaleDB’s shared cache was designed with the cloud in mind. The cloud imposes certain additional design criteria:
1. The number of nodes will increase/decrease in an elastic fashion.
2. A large percentage of MySQL users will require low-cost PC-based storage
Clearly, some type of shared-cache is imperative. Memcached demonstrates the efficiency of utilizing a separate cache tier above the database, so why not do something very similar beneath the database (between the database nodes and the storage)? The local cache on each database node is the most efficient use of cache, since it avoids a network hop. The shared cache tier, in ScaleDB’s case, the Cache Accelerator Server (CAS), then serves as a fast cache for data swapping.
The Cluster Manager coordinates the interactions between nodes. This includes both database locking and data swapping via the CAS. Nodes maintain the data in their local cache until that data is required by another node. The Cluster Manager then coordinates that data swapping via the CAS. This approach is more dynamic, because it doesn’t rely on a prior knowledge about the location of that data. This enables ScaleDB to support dynamic elasticity of database nodes, which is critical for cloud computing.
The following diagram describes how ScaleDB’s Cache Accelerator Server (CAS) is implemented.
While the diagram above shows a variety of physical servers, these can be virtual servers. An entire cluster, including the lock manager, database nodes and CAS could be implemented on just two physical servers.
Both ScaleDB and Oracle rely upon a shared cache to improve performance, while maintaining data consistency. Both approaches have their relative pros and cons. ScaleDB’s tier-based approach to shared cache is optimized for cloud environments, where dynamic elasticity is important. ScaleDB’s approach also enables some very interesting advantages in the storage tier, which will be enumerated in subsequent posts.
1. If each node stores/processes data in memory, versus disk, it is much faster.
2. Each node must expose the most recent data to the other nodes, so those other nodes are not using old data.
In other words, #1 above says flush data to disk VERY INFREQUENTLY for better performance, while #2 says flush everything to disk IMMEDIATELY for data consistency.
Oracle recognized this dilemma when they built Oracle Parallel Server (OPS), the precursor to Oracle Real Application Cluster (RAC). In order to address the problem, Oracle developed Cache Fusion.
Cache fusion is a peer-based shared cache. Each node works with a certain set of data in its local cache, until another node needs that data. When one node needs data from another node, it requests it directly from the cache, bypassing the disk completely. In order to minimize this data swapping between the local caches, RAC applications are optimized for data locality. Data locality means routing certain data requests to certain nodes, thereby enjoying a higher cache hit ratio and reducing data swapping between caches. Static data locality, built into the application, severely complicates the process of adding/removing nodes to the cluster.
ScaleDB encountered the same conflict between performance and consistency (or RAM vs. disk). However, ScaleDB’s shared cache was designed with the cloud in mind. The cloud imposes certain additional design criteria:
1. The number of nodes will increase/decrease in an elastic fashion.
2. A large percentage of MySQL users will require low-cost PC-based storage
Clearly, some type of shared-cache is imperative. Memcached demonstrates the efficiency of utilizing a separate cache tier above the database, so why not do something very similar beneath the database (between the database nodes and the storage)? The local cache on each database node is the most efficient use of cache, since it avoids a network hop. The shared cache tier, in ScaleDB’s case, the Cache Accelerator Server (CAS), then serves as a fast cache for data swapping.
The Cluster Manager coordinates the interactions between nodes. This includes both database locking and data swapping via the CAS. Nodes maintain the data in their local cache until that data is required by another node. The Cluster Manager then coordinates that data swapping via the CAS. This approach is more dynamic, because it doesn’t rely on a prior knowledge about the location of that data. This enables ScaleDB to support dynamic elasticity of database nodes, which is critical for cloud computing.
The following diagram describes how ScaleDB’s Cache Accelerator Server (CAS) is implemented.

Both ScaleDB and Oracle rely upon a shared cache to improve performance, while maintaining data consistency. Both approaches have their relative pros and cons. ScaleDB’s tier-based approach to shared cache is optimized for cloud environments, where dynamic elasticity is important. ScaleDB’s approach also enables some very interesting advantages in the storage tier, which will be enumerated in subsequent posts.
Wednesday, January 13, 2010
HP Needs a Linux OLTP Database...FAST
Oracle, after dating HP, Dell, Netapp and EMC has found its mate in Sun. Oracle is now becoming a systems company, and unceremoniously dumping these former paramours. These leaves the spurned lovers to find alternate accommodations, especially in the area of the database.
As I have stated previously on this blog, the clear partner of choice on the Windows front is Microsoft. This is demonstrated by today’s partner announcement around MS SQL Server for OLTP. But who is their partner in the Linux segment?
The following are contenders:
* Postgres (HP rolls their own)
* EnterpriseDB (pre-rolled Postgres)
* Ingres or Sybase—Oracle has felled them both in the past, but they are hoping for new life with a big sugar daddy like HP.
* ScaleDB, If HP is going after the cloud and the MySQL market
I don’t see them going for a NoSQL solution because NoSQL = NoEnterprise, making it a non-starter for HP. One way or the other, HP needs a solution for OLTP on Linux and they are on the clock.
For OLAP, HP has NeoView. If they felt the need, there are a number of OLAP solutions out there such a Greenplum, Netizza, Asterdata, Paraccel, Ingres/Vectorwise and others. That said, I think HP feels that they are holding a good hand on in the OLAP space, but Linux-based OLTP just became a gaping hole in their product suite. Today's partnership with Microsoft confirms this problem, but only solves the Windows half not the Linux half.
As I have stated previously on this blog, the clear partner of choice on the Windows front is Microsoft. This is demonstrated by today’s partner announcement around MS SQL Server for OLTP. But who is their partner in the Linux segment?
The following are contenders:
* Postgres (HP rolls their own)
* EnterpriseDB (pre-rolled Postgres)
* Ingres or Sybase—Oracle has felled them both in the past, but they are hoping for new life with a big sugar daddy like HP.
* ScaleDB, If HP is going after the cloud and the MySQL market
I don’t see them going for a NoSQL solution because NoSQL = NoEnterprise, making it a non-starter for HP. One way or the other, HP needs a solution for OLTP on Linux and they are on the clock.
For OLAP, HP has NeoView. If they felt the need, there are a number of OLAP solutions out there such a Greenplum, Netizza, Asterdata, Paraccel, Ingres/Vectorwise and others. That said, I think HP feels that they are holding a good hand on in the OLAP space, but Linux-based OLTP just became a gaping hole in their product suite. Today's partnership with Microsoft confirms this problem, but only solves the Windows half not the Linux half.
Subscribe to:
Posts (Atom)