Diversified Data Sources

esProc supports computations between different data sources, and writing the computing result back to multiple or single data source. It supports not only the rational database, but MongoDB, Cassandra, and other NoSQL databases. It provides a great many functions to handle the structured and semi-structured data computing. esProc can directly access files on both local machine and the LAN, or seamlessly access the distributed file systems like HDFS. It supports both the common txt or Excel files, and the uncommon files in proprietary formats with better performance.

Relational databases: esProc supports all JDBC-enabled databases, such as Oracle, MSSQL, MYSQL, and DB2.

Text data source: esProc supports structured text data sources, such as *. txt and *. log files, and customized separators for rows and columns. It also directly supports Excel files of various versions and the binary files in proprietary formats.

Seamlessly access to HDFS: esProc has the inbuilt function to access HDFS, with the compatible access route. It supports multithreaded parallel computing to retrieve and process big files stored in HDFS with cursor.

NoSQL databases: esProc can access JDBC-enabled NoSQL databases, including Hbase, Cassandra, etc. It also offers functions specializing in MongoDB to access data using MongoDB query syntax.

Semi-structured data: esProc provides a third-party programming interface to allow users to create customized esProc functions for parsing XML, SOAP, or other types of semi-structured data. There’s no difference between customized functions and native esProc functions in handling structured data.

Besides data retrieval, esProc allows writing result back to the original data source or the data source of a different type, or multiple data sources at the same time. esProc also has inbuilt functions to write data back to various sources, including modifying a single record and writing back batch data.

Providing the standard JDBC interface for main program, esProc, along with a specific data source, can build an easy-to-use hybrid database. Traditionally, a multi-data-source requires being handled by high-end reporting tools, hard-to-maintain ETL and expensive data warehouse. Without being bound with specific data sources, esProc intrinsically supports hybrid computing involving various data sources. This can reduce the difficulty of binding a NoSQL database with a traditional database, remove the single-source restriction in building a report, and enable Java applications to confront the increasingly complex data environment with ease.