Big data Parallel computing

esProc supports multithreaded parallel computing in processing TB data with simple code. This way, a larger task is divided into multiple smaller subtasks to make good use of the computer’s hardware and software.

Easy coding

esProc encapsulates parallel statements, making them more suitable for handling structured data, as well as more simple and intuitive. esProc can divide a file by bytes into approximately equal segments and process them simultaneously. Not only is it quick in file splitting, but also it can maintain complete rows of data by automatically skipping the head row and complementing the tail row. On the other hand, esProc provides handy functions to combine or merge, or further process the results returned from all threads.

Optimum path control

With esProc parallel-processing functions, esProc computing engine will automatically configure the parallel threads. These functions work in the same way as the normal functions, but they have better performance. Programmers are allowed to choose an optimum path freely. For instance, they can find the best way to break a task apart according to data size, the number of CPU cores and hard drive rotation speed; use fork statement to set the number of threads they want; or use a cursor function to retrieve a specified number of records from a file to perform in-memory computing. The controllable optimum path selection helps tap as much the hardware’s potential as possible.

Low-level program optimization

esProc optimizes low-level functions, and performance will be further improved by using them with parallel processing.

It is not necessary for esProc to convert data flow to object during retrieving big files stored in external memory, making it better than JDBC in performance. In data traversal esProc functions execute more efficiently than database interpreter and they perform better than scripting languages like Perl. To perform in-memory related computing or complicated order-related computations, esProc would use sequence numbers, instead of hash algorithm for value matching, so that it would get a higher performance than conventional databases.

Scale-out ablity

esProc supports multi-node, scalable parallel processing, besides the single-node multithreaded parallel processing. Programmers can add or remove the nodes in the cluster as necessary, and freely change and assign a controlling node for distributing the task. A node machine can be a sub-node, while being the main node to distribute a task to certain other nodes that acting as the sub-nodes. That way, a multilevel, nested invocation can be created.