What esProc Brings?

The mainstream technology for performing structured data computing is relational databases (or data warehouses). Other alternatives are not convenient. Java, for instance, produces a lot of lines for a simple sum, let alone grouping and filtering operations. NoSQL databases are far worse because they trade computing abilities for scalability.

But databases’ computing ability is closed! This means SQL can’t compute data stored outside of a database.  Data must be loaded into a database before it can be computed.

Databases are designed to be closed because they require data meet their norms by providing complete data schemas and performing constraint checking.

Computational needs come from anywhere. Not all of them can be well taken care of by databases. One important instance is the prevalent non-database data generated by contemporary applications.

Years ago this wasn’t a problem because for most applications one central database could handle all computations from transactions to analyses. Things have changed a lot. Various non-database data, such as crawled web data, Excel files uploaded from a second-level or third-level organization, computer-generated logs, XML or JSON data coming from cloud services, etc., has become an important data source. Even the database data may be stored in different databases, often heterogeneous databases of different vendors, because it belongs to different applications. The computations often involve several databases of different structures. Though many databases claim they have the ability to handle the hybrid computations, actual performance is poor and real practices are rather inconvenient. The ability is almost useless.

In both cases a database cannot compute data directly. The general practice is loading the non-database data or the database data to one big database, which requires the deployment of more or bigger databases. This is cost-and resource-consuming. But, since real-time data loading is rare in real-world practices, it is difficult to achieve real-time data computing. Plus, data loading is a kind of computation (during which data structure conversion, code modification, etc. are often involved), but it also can’t take advantage of the database computing ability. We can only hard code it outside of the database.

Usually we store the half-computed and -aggregated data, called the intermediate data, in databases to achieve high-performance presentation (in report developing). But why do we use precious database space to store data that is neither original nor critical? Because we want the computing ability needed for data presentation. The database schema and constraint are useless here. The intermediate data consumes resources and leads to a bloated database full of messy data that is difficult to maintain.

esProc introduces the open computing ability independent of databases.

esProc, the pure computing engine, can compute data coming from any type of source in real time. Thus it can easily process data stored in multiple databases. If the non-database data needs to be first loaded into a database, esProc can handle the data loading, which is essentially a kind of computation, too. It stores the intermediate data in the file system for easier management and more convenient and efficient computation, saving both database space and computational resources.

In other words, esProc lets databases to be able to focus on their strengths and duties (including storage, consistency, modelling and constraint). Users don’t need to deploy more databases or implement database scalability any more only to get computing power. They now have cost-effective, light-weight esProc to provide high performance computing ability. After all, having the right tools for the job makes it a whole lot easier.