Agile Data Computing Engine

What is esProc?

What does esProc solve?

Data File Processor

  • Self processing text \Excel
  • Non professional developers quickly explore the data at hand
  • Text types are various and format is not standard
  • Excel has various versions and special layout
  • Batch processing of files

SQL Enhancer

  • Solve difficult database problems on desktop
  • Computation with complex logic
  • Multi step procedure computing
  • Large amount of temporary calculation
  • Exploratory analysis with high degree of freedom

Report Data Source

  • Diverse data sources,cross database & cross source
  • SQL with complex logic
  • Tedious Java custom data source
  • Stored procedure outside database that is low coupling and portable

JAVA Computing Layer

  • SQL is forbidden, require java to implement algorithm
  • Non RDB data source
  • Cross database computation
  • Complex algorithm, procedural algorithm
  • Stored procedure outside database that is hot deployed and low coupling
Note: SPL is the scripting language used by esProc. DFX is the script file, which is equivalent to an external stored procedure. Report uses the JDBC interface to pass in SPL statements to realize structured calculation or script execution.

How does esProc do it?

Independent computing engine

esProc does not rely on the computing power of the database, and has a complete agile syntax system.

No database is required during calculation.

  • It is Not converted to SQL and executed by database
  • Universal computing can be implemented for any data source

Agile programming language

SPL(Structured Process Language)

Combined the common advantages of SQL and Java
  • Adapt to set batch computation (SQL advantage)
  • Good at step-by-step procedure computing (Java advantage)
Well designed rich library functions and consistency syntax
  • Easier to master and better performance than Python
e.g.Count the longest consecutively rising trading days for a stock.

Python

1 import pandas as pd
2 def iterate(col):
3 prev = 0;
4 res = 0;
5 val = 0;
6 for curr in col:
7 if curr – prev > 0:
8 res += 1;
9 else:
10 res = 0;
11 prev = curr;
12 if val < res:
13 val = res;
14 return val;
15 data = pd.read_excel('D:/Stock.xlsx',sheet_name=0). sort_values('tradeDate').groupby('Company')['closePrice'].apply(iterate);

SQL

1 select max(continuousDays)-1
2 from (select count(*) continuousDays
3 from (select sum(changeSign) over(order by tradeDate) unRiseDays
4 from (select tradeDate,
5 case when closePrice>lag(closePrice) over(order by tradeDate)
6 then 0 else 1 end changeSign
7 from stock) )
8 group by unRiseDays)

SPL

  A
1 =stock.sort(tradeDate)
2 =0
3 =A1.max(A2=if(closePrice>closePrice[-1],A2+1,0))
Syntax suitable for describing a natural way of thinking!
Professional syntax in the field of set operation, code is more concise for the same process!

APPLICATION

CASEAn insurance company - stored procedures outside the database

Vertica does not support stored procedures. Complex business logic can only be written in Java. Long code is hard to maintain.

User Evaluation

  1. The best use for us is to pass parameters to the Vertica database.
  2. Each cell becomes a data array that are easy to use, compare and manipulate. It is very logical and you have made it user friendly.

Application Effect

  1. Enable Vertica to support stored procedures outside the database
  2. The implementation of procedural algorithm is simple and friendly
  3. BIRT data preparation is simple and efficient

CASEA bank - computing middleware

There are 44 kinds of Excel, which are parsed by Java and imported into database. The development cycle is long and hard coding is difficult.

Reducing development cost

All 44 kinds of Excel analysis reduced from 32 person days to 6 person days

Improving development efficiency

Each Excel parsing code changed from 100 lines to 3 lines

Reducing maintenance cost

In addition to the short code and easy maintenance, script hot deployment takes effect immediately after modification

Now take a deep dive into esProc!