Data computations within a Java program

It's not uncommon to compute and process data within a Java program due to a variety of factors.

Separation of business logic and data storage due to system design

Diversity of data sources

SQL difficulty in implementing complex computational logic

Pain points of Java-based data computations

Complicated & lengthy code / Complex logic is hard to implement / Difficult to code and maintain
The huge block of code implements only the grouping and aggregation on one field
To address the Java pain points:

Solution: esProc – the professional computational package for Java

The specialized JARs intended for Java-based computations aim to simplify Java code
Note: SPL is the scripting language on which esProc is based. SPL script is the counterpart of the stored procedure in RDB. A SPL script will be passed to a Java program through JDBC interface to be executed or to achieve the structured computation.

SPL grouping

Concise is important for a program

Regular grouping operation

Count the days each person is on duty

A
1 =file(“/Users/test/duty.xlsx”).importxls@tx()
2 =A1.groups(name;count(name):count)
With esProc, the basic operation can be coded directly in SQL:
SELECT name,count(name) FROM user/test/duty.xlsx GROUP BY name

Intra-group Top N

Get the duty records of the first three days for each person per month

A
1 =file(“/Users/test/duty.xlsx”).importxls@tx()
2 =A1.groups(month(workday):mon,name;~.top(3):top3)

Alignment grouping

List the number of countries that use Chinese, English and French as the official language respectively in order

A
1 =connect(“mysql”)
2 =A1.query@x("select * from world.countrylanguage where isofficial='T'")
3 [Chinese,English,French]
4 =A2.align@a(A3,Language)
5 =A4.new(A3(#):name,~.len():cnt)

Complex SQL simplification

Computing goal: The largest number of days when a specified stock rises consecutively.
select max(continuousDays)-1
from (select count(*) continuousDays
	from (select sum(changeSign) over(order by tradeDate) unRiseDays
		from (select tradeDate,
			case when closePrice>lag(closePrice) over(order by tradeDate)
			then 0 else 1 end changeSign
		from stock) )
	group by unRiseDays)

SQL solution

  • A triple-layer nested query is still needed even when the window function is employed;
  • Can you understand this?
A
1 =stock.sort(tradeDate)
2 =0
3 =A1.max(A2=if(closePrice>closePrice[-1],A2+1,0))

SPL solution

According to our natural way of thinking, we sort data by date (line1), compare closing price of each day with the previous one, add 1 if the price is higher and reset the count if it is lower, and finally get the largest count (line3)

Heterogeneous data sources support

Java lacks certain computational ability to support heterogeneous data sources

esProc offers built-in interface to enable fast and convenient access

  • MongoDB、Redis、Cassandra、ElasticSearch
  • Hadoop
  • Excel、Json、XML、TXT
  • Http Restful、Web Services、OLAP4j 、...
Enable mongoDBexceltxt to enhance computational ability to handle even complex computations

Broad & powerful computational abilities (Part)

As MongoDB's collaborator

  • JOIN operations
  • Multi-computation of one table
  • Simple IN condition
  • Foreign key objectification
  • Simple implementation of APPLY algorithm
  • Concatenation loop
  • Multi-table concatenation
  • Union operation
  • TopN
  • ……

As database's collaborator

  • Pass value list to SQL though sequence parameter
  • Translate SQL to syntax of different database products
  • Single table parallel retrieval by segment
  • Multi-table parallel retrieval by segment
  • Multi-database parallel write
  • Dynamic SQL in same-structure tables
  • ……

Structured file computation

  • Small file filtering
  • Text retrieval
  • Sum on small files
  • Inter-row calculations on small files
  • Small file sorting
  • Small file grouping & aggregation
  • Multi-small-file join query
  • Small file post-grouping filtering
  • ……

Big data computation

  • Big file filtering
  • Sum on big files
  • Inter-row calculations on big files
  • Big file distinct count
  • Big file sorting
  • Big file grouping & aggregation with a small result set
  • Big file grouping & aggregation with a large result set
  • ……