esProc is ideal for the ETL processes with complex business logic. To date, there is not any existing module for such tasks yet, and scripting is always required to implement such an ETL process.
With the professional esProc IDE and grid-style scripting, users can program more efficiently. esProc is good at achieving a complex computing goal conveniently, including the flexible hardcoding of ETL. In terms of data sources, it supports a wide range of them, as well as computing data from different kinds of data sources. For big data computing, the heavy ETL workload can be shared by and distributed to multiple inexpensive PCs with a parallel computing frame. esProc supports the scheduling functions on various operating systems for being called flexibly by command lines.
High development efficiency. esProc is a tool to script in the grid. With esProc, computing logic can be conveniently laid out in 2D space, so that a business algorithm can be interpreted in the computer language more easily. The grid supports step-by-step computing by nature, in that each cell represents a computing unit or step, and by which esProc gets the ability to convert the complicated business logic of an ETL process into simple steps. The grid-style script offers an intuitive view of the code indentation and work scope, and streamlines cell reference and reuse. Users can reference any cell with its native cell name, which means they need not to define the variables. By clicking cells, users can monitor the computed results intuitively, without having to search for them in a list of variables. The grid-style scripting also makes debugging, in the true sense of the term, for ETL processes possible. .
Agile algorithms. With a genuine support for set data type, esProc simplifies structured data computations to facilitate flexible computing from the business prospective. esProc supports the ordered sets, capable of accessing set members and performing serial-number-related computing conveniently, for example, ranking, sorting, year-over-year comparison, and calculation of link relative ratio . With the “set of set” mechanism to represent groups, esProc can use equi-grouping, alignment grouping and enumeration grouping to solve various grouping problems easily. In addition, users can handle discrete records in a data set in the same way as handling an object. Such separate, individual records will give users a much more flexible and freer access experience than ever. For many ETL processes that are tough to perform in SQL/SP, esProc can solve them quite easily with its agile syntax.
Comprehensive data source support. esProc supports hybrid computing involving different data sources, including all kinds of databases and the non-database file data sources and allows for result write-back to multiple or single data source, . It offers a wide selection of functions to support structured data and non-structured data computing. Additionally, esProc supports the file data on a local machine and the remote files on LANs, the local files and the big data files in the distributed file system of HDFS, and both the common txt or Excel files and the files in proprietary formats but with better performance.
Convenient Command Line Scheduling. Users can execute the esProc scripts with command line directly, set the regular launch with the schedule function provided by OS, and perform the ETL tasks on various editions of OS including Windows, Linux, Unix, and Mac. esProc also provides JDBC API for users to implement the much more flexible scheduling by coding. With JDBC, users can manipulate the ETL tasks more flexibly.
Parallel frame speeding the ETL process. esProc supports the parallel computing on big data, and is capable of accomplishing the ETL task involving TB data in an HDFS file or a database file. With the parallel computing frame, massive amount of data can be allocated to multiple computing nodes equally. Each node only needs to handle a small amount of the data. esProc supports the multilevel distributed computing where each node can act as either the main node for allocating and summarizing, or the sub node for performing specific tasks. The node machines can be high-end servers or inexpensive Windows client or Linux PCs.