esProc

What esProc is

esProc is a programming language specially designed to handle (semi)structured data, with a well-focused, comprehensive class library. A non-object-oriented language, esProc doesn’t have such complex concepts as inheritance and overloading. It merely uses the concept of object to facilitate the description of object-related methods. Any programmer with the basic level of programming skill, such as the familiarity of BASIC, can learn it well and quickly. A Java-based language for dynamic interpretation and execution, esProc is able to generate code at run time, creating a more flexible program and making the development less complicated.

esProc defines its niche as the handling of (semi)structured data, so it doesn’t provide algorithms for directly performing data analysis, data mining and machine learning, nor is it an expert at processing media and map data.

Compared with high-level programming languages like Java, esProc has abundant basic objects and methods for structured-data computing, which is commonly seen in data analysis, data handling and data preparation. This enables it to produce much more concise code than Java does in expressing the same algorithm and to have higher development efficiency than high languages, like Java.

For example, Java needs dozens of, even nearly a hundred, lines of code to filter a data set, longer if a universal data type and a universal condition are involved. esProc uses a mere one-liner to get things done.  

esProc is integration-friendly with Java applications. Since it has been developed in Java, the two are perfectly compatible. And designed to be integrated, it is open to the invocation coming from a Java main program. It is particularly convenient to use esProc to prepare data source for a Java reporting tool.

With SQL, it is admitted that there’s a lot of hassle to perform piecemeal multi-step computations, particularly the order-related computations. Normally programmers need to retrieve data from the database and handle it with Java or other languages, thanks to SQL’s incomplete orientation towards set, lack of support of discrete records, and its non-stepwise approach. The design of esProc enhances the functionality and fills the gap, permitting a more intuitive implementation of non-equi-grouping, reuse of grouped data, order-related or multi-step computations. esProc integrates the merits of both SQL and Java to let programmers employ the SQL-style batch approach to set-style operations, while enjoying the flexibility as Java can provide.

Yet esProc cannot and doesn’t aim to replace SQL, despite its much simpler syntax in most scenarios.

Data retrieval from the database could cause plenty of I/O performance loss. When big data is involved in simple operations, it takes much longer time to retrieve data than to perform the computation. In view of this, it’s more appropriate to handle data in the database. Besides, the SQL metadata system helps create a more transparent syntax. Programmers need not concern themselves with the physical storage scheme. As a pure computing engine without complete storage mechanism, esProc can handle data from all types of files and databases, but it has different syntax for performing in-memory computations and external memory computations, to which different approaches are needed.

Choosing to use esProc doesn’t mean abandoning SQL. Instead, it helps SQL in handling computational scenarios where SQL has been weak. They include complex multi-step computations and computations involving heterogeneous databases, etc.

Except for SQL, the industry hasn’t had another standard programing language specializing in handling structured data. However, SQL has computational weaknesses as mentioned above, as well as application limitations caused by its closed nature. We cannot handle a local file in SQL freely. As a result, people often turn to scripting languages like Python (pandas) and R in dealing with those types of scenarios.

The truth is that Python (pandas) and R are designed to perform mathematical statistics and analysis. Equipped with dataframe object though, they are not the specialized tools for processing structured data, and provide no direct support of external memory computations. esProc is an expert in structured-data computing, boasting the table sequence object for in-memory data (the counterpart of dataframe’s superset) and cursor object for external memory data. esProc is convenient in coding multithreading parallel processing, and has simple and easy-to-use configuration and method of application in handling heterogeneous data sources (xls, json, and mongoDB).

However, esProc is poor at performing mathematical statistics and analysis because of the lack of necessary class library.

Apart from the individually handled analyses, the structured-data computing often takes place within an application. Being integration-friendly with Java, esProc can be easily invoked by a Java main program. Python and R language are none-too-friendly with integration, programmers cannot write an algorithm in Python that can be invoked by Java.  

Where to use esProc

Independent operation

esProc’s Integrated Development Environment (IDE) has good interactivity. Anyone who has programming basics can use it as a desktop interactive analysis tool.

The key aspect of interactive computing is to conveniently show and reference the intermediate results, determining the next step by the previous step. esProc adopts a cellset-style coding to naturally retain the intermediate results in the cells for viewing when they’re needed. Programmer can directly reference the intermediate results without having to name them, making the stepwise interactive computations extremely convenient. An average scripting language performs interactive computations from command line, which is far more inefficient.

esProc can access and handle heterogeneous data sources, including the common databases and the files stored in local file system, such as TXT and XLS files. The final result can be viewed, or written back to the database or the file.

esProc is intended as a development tool for writing repeatedly executed code. It’s easier and more intuitive to debug an esProc cellset than to debug the traditional text code. Besides executing at the IDE, esProc can be started by an external job scheduling software from the command line. With its support of various data sources and remarkable computing power, esProc can perform tasks like scheduler data manipulation (similar to ETL). 

Working as Java class library

As we mentioned, being integrated is another purpose esProc is intended for.

esProc provides JDBC interface through which the esProc code can be invoked as the database stored procedure. The passing of parameter, the execution of code and the result returning are all in accordance with the JDBC standard. Programmers familiar with JDBC can pick up esProc quickly. esProc RTL is provided as JARs, it can be deployed and distributed with an application. The integration is completely seamless.

As far as we know, Java hasn’t had a universal class library for structured data. It’s a cumbersome progress that programmers have to hardcode this type of computational problems. esProc, very integration-friendly and with excellent computing power, can work as a Java class library for processing batched (semi)structured data. When there is not a database involved (such as the case where a text file is handled), SQL’s ease-of-use computing capability will have no chance to demonstrate. Other times if the algorithm is difficult to be coded in SQL, one has to retrieve data out of database to perform the computation. In all these scenarios, esProc can be used to assist Java with the computation.

As a SQL statement, a single esProc statement can be invoked. If the code is short, programmers can write a long single statement directly in the esProc JDBC, without having to create a script file. This will save them the trouble of managing the script as well as increase programming flexibility. 

Preparing data source for reporting tools

As a special example of the application of Java, reporting tools can certainly integrate the esProc code through JDBC to supply data sources to themselves.

The process of report development involves many complex, temporary computations, which are rarely implemented successfully with the reporting tool due to their complexity, and which will cause unreasonable storage usage if performed within the database because of their temporariness, and which will lead to tight coupling between the Java application and these codes if carried out in the intermediate Java program. But by using esProc as the special middleware to prepare the report data source, these computations can be detached to execute separately so that developing can be much easier. Moreover, the esProc script can be managed together with the report template, which effectively reduces the complexity of application management.

Here are some typical scenarios exemplifying esProc’s role in solving computational problems. All the examples come from the Q&A on the internet, which are the first-hand, real-life problems, and have been simplified to facilitate understanding.