The Niche in Big Data Processing Market

esProc also engages itself in big data processing, a crowded market where esProc cleverly find its own niche.

There are two types of big data platforms: database scalability plus the MPP strategy, and the Hadoop system.


A database has its scalability limit. It can’t hold a data size exceeding this limit. It costs an organization an arm and a leg to install MPP. Most importantly, a common industry view is that the traditional databases installed on minicomputers and special-purpose computers are not promising, even if they perform rather well now. Thus, it’s unwise to invest more in deploying them. This makes databases the actual small-data processing platform.

The Hadoop system is designed for extremely large clusters and devotes huge resources to fault tolerance and task distribution and management. It shows strength only when a cluster has several hundreds or even over one thousand nodes, but the performance isn’t impressive enough if a cluster only has an average number of nodes.

So, there are traditional databases for handling small data and Hadoop for tackling extremely large-scale data. There isn’t a product for specifically taking care of the more common medium-scale data processing needs. Organizations having small- and medium-scale clusters choose Hadoop in the absence of such a product, without being aware that they are using a sledgehammer to crack a nut.

Now we have esProc. It is the tool intended for small- and medium-scale clusters consisting of several to dozens of nodes and, in principle, better within one switch. That is the computing environment most users have in the real-world business practices. With clusters of such a scale, there’s a low need for fault tolerance and task distribution is relatively simple, enabling esProc to put as much the limited resources as possible in the real computation to achieve higher performance.

Actually, the high performance esProc can achieve a computation using only a few nodes or even the single machine while Hadoop requires many nodes. There is no need for an esProc cluster to be large.

Second, esProc targets computations that are knotty to solve in SQL.

A lot of database vendors commit themselves to offering optimal SQL solutions to big data problems. The field is crowded and almost exhausted and it’s hard to have some real breakthroughs. For example, it’s simple to handle the backend computations for multidimensional analysis in SQL because there are already a variety of deep and mature optimizations. It’s commercially wise for esProc to tackle the less explored issues.

There are still many computing scenarios where a high-efficiency algorithm is difficult or impossible to describe in SQL, like the procedural computations. Programmers need to deal them with workarounds. Most of the time, the amount of data involved is large and efficiency is important. In a word, esProc specifically targets complicated computing scenarios involving large amounts of data.