High Performance

esProc optimizes algorithms for structured data handling to support in-memory and external memory computations, parallel computations and order-related computations. This enables programmers to select the optimum path freely based on the characteristics of data and algorithms, achieving higher performance than that of conventional databases and scripting languages like Perl.

File retrieval

Compare performance of esProc and Oracle JDBC in retrieving a file of 890M with 6 columns.

Hardware environment: PC, Core(TM) i5-3450(4 cores)with a total of 4 threads, RAM 16G, SSD 

Software environment: CentOS 6.4, JDK 1.6, Oracle 11g, esProc 3.1 version

Note: Test result will be measured by second. Each test has its test report.

As can be seen, esProc works more than one time faster than Oracle JDBC. This is because Oracle JDBC needs converting data flow to objects, while Java-based esProc doesn’t need to do that. As a result, the throughput capacity attained in using esProc’s external memory file approach is much stronger than Oracle JDBC’s throughput capacity.

File traversal

Traverse a big file of 25G with 6 columns using algorithms of grouping and aggregation as well as query and filtering, in order to test and compare performances of single thread, 2 threads, 4 threads, 8 threads and 16 threads run in esProc and Oracle respectively.

Hardware environment: T610, CPU Intel Xeon E5620*2, RAM 24G, HDD Raid5 800G

Software environment: CentOS 6.4, JDK 1.6, Oracle 11g, esProc 3.1 version

With parallel processing, Oracle’s performance doesn’t get noticeable increase regarding big data (data exceeding the memory capacity) processing; while esProc performance improves obviously (with single/2/4 threads).

Concurrent processing

Traverse a big file with a total size of 16G using 4 concurrent tasks, each of which dealing with 4G data, using algorithms of grouping and aggregation as well as query and filtering to test and compare performances of 1 task, 2 concurrent tasks and 4 concurrent tasks run in esProc and Oracle respectively.

Hardware environment: PC, Core(TM) i5-3450(4 cores)with a total of 4 concurrent tasks, RAM 16G, SSD

Software environment: CentOS 6.4, JDK 1.6, Oracle 11g, esProc 3.1 version

When the total data size exceeds the available memory space, concurrent processing gives esProc better performance than the performance it gives Oracle. On the same condition, esProc has a steadier execution time, whereas Oracle has wide swing between the maximum execution time and the minimum execution time.

Processing text files

Process a text file of 28G with 6 columns to test and compare the performances of esProc single thread, JAVA processing, Perl processing and esProc four threads.

Hardware environment: PC, Core(TM) i5-3450(4 cores)with a total of 4 threads, RAM 16G, SSD

Software environment: CentOS 6.4, JDK 1.6,Perl, esProc 3.1 version

As can be seen, in the case of single thread processing, esProc, an interpreted language based on Java, doesn’t have significant performance loss compared with Java itself. The Java-based esProc even overtakes C-based Perl, another interpreted language, in performance.

By making use of multithreaded architecture, esProc enormously increases performance with simple code. Though Java gets a noticeable increase in performance with multiple threads, its code is very complicated. Perl hasn’t any advantage with mediocre single thread processing performance and complex code for multithreaded processing.

Taken together, esProc is the most practical choice in processing text files for its great performance and concise code.

In-memory computing

Test the performance of esProc in processing an in-memory file with 8 million rows and 6 columns, using algorithms of simple, normal computation and complex related computing, and compare the result with Oracle performance in doing the same task.

Hardware environment: PC, Core(TM) i5-3450(4 cores)with a total of 4 threads, RAM 16G, SSD

Software environment: CentOS 6.4, JDK 1.6, Oracle 11g, esProc 3.1 version

esProc and Oracle are neck and neck in handling simple non-related computing. But esProc greatly surpasses Oracle in handling complex related computing. Because, by using a pointer to reference foreign key values, esProc doesn’t need to calculate Hash values for querying matches; while Oracle needs Hash algorithm to perform JOIN.

When there is sufficient available memory space, if data to be manipulated can be imported into the memory and reorganized in advance with esProc, you will get a better performance than you get with traditional databases. Therefore esProc is suitable for in-memory computing requiring high performance. 

Script execution

Test performance of executing an esProc multi-step script using two types of test data: data of 4 million rows and 6 columns; data of 2.4 billion rows and 6 columns. The results will be compared with Oracle stored procedure for doing the same tasks.

Hardware environment: PC, Core(TM) i5-3450(4 cores)with a total of 4 threads, RAM 16G, SSD

Software environment: CentOS 6.4, JDK 1.6, Oracle 11g, esProc 3.1 version

As can be seen, esProc and Oracle are equal in computing performance when data can be loaded into the memory in one go. When data is too big to be entirely loaded into memory, Oracle stored procedure is soundly beaten in data retrieval and computing performance by esProc due to its poor interpreter. In real world business, there are many complex problems that cannot be programmed directly in SQL. You have to retrieve data row by row using a stored procedure before handling them. In those cases, you can achieve higher efficiency by retrieving data out of the database and handling it with esProc.