Parallel Algorithms Optimized for Medium and Small Clusters
esProc optimizes parallel computing solutions for the medium and small clusters by supporting multithreaded computing on single node, and centerless multi-node parallel computing. Thus it is especially ideal for handling data-intensive, computation-intensive and high concurrency tasks, as well as those involving heterogeneous multi-data-sources. esProc specifically provides the scripting language with native support of the big data computing, saving users the effort of implementing the algorithm details with high level languages. So, they can develop program easily and meet the business needs requiring complex computing procedure more conveniently.
Ideal for Medium and Small Clusters. esProc is simply structured with low requirement on the hardware and running environment of node machines, making it easy to deploy and maintain for the medium and small clusters. Catering to users of the medium and small clusters in pursuit of performance, esProc provides two methods for data exchange between nodes: in-memory direct exchange, and external file cache. They can thus strike the balance between performance and fault tolerance. esProc has the controllable task distributing mechanism, allowing programmers to distribute the computing pressure based on the task and hardware characteristics. It is also capable of allocating the resource intelligently to ensure the reliability and stability of big data computing. Moreover, esProc permits the global variable and private space in a node to elevate the performance while maintaining the stability of performing the task.
Complex business logic handling. With esProc scripts specially designed for big data handling, users can easily achieve the computing goals involving a lot of steps or complex logic. The grid-style esProc scripts support step-by-step computing by breaking the complex task apart into multiple simple steps, during which users can reference any cell directly with its name without having to define a variable and monitor the computing process of every single intermediate step. esProc can easily create multi-table association,and supports set-type data, discrete records, ordered sets, object reference, and set-style grouping. Besides the convenient implementation of algorithms, esProc can also receive external parameters, divide a task, appoint parallel nodes dynamically, and summarize computing results in a single script.
Big data computing. esProc has the ability to compute TB data from databases or HDFS files. With its parallel computing framework, massive data can be distributed to multiple computing nodes, and each node only needs to calculate a small volume of data. esProc supports the multilevel distributed computing in which every node can act as the main node for allocating and summarizing or the sub node for undertaking specific computing jobs. A node machine can be a high-end server or an inexpensive PC of the Windows client or Linux server.
Compute-intensive task handling. The esProc parallel framework is also fit for dealing with compute-intensive tasks requiring high CPU performance. This framework allows for segmenting a task into several parts and allocating them to multiple computers equally. The computing pressure on each node would be relatively small and the overall performance improves greatly. Traditionally the computing capability of handling compute-intensive tasks needs to be obtained by using the high-end database server/cluster. With esProc users can achieve the same great performance with the normal PCs and desktop CPU, without the need to introduce expensive database hardware and software.
High concurrency task handling. esProc is also ideal for the Web and report applications which are characterized with great concurrency. The data volume and computing workload of such applications are normal, but the connection requests for computing are huge and intensive. The esProc parallel computing framework is very flexible. The task assignment can be controlled dynamically through external parameters. In this way, the requests of great concurrency can be allocated to every computing node equally. In order to handle the climbing concurrent requests as organizations grow in size, esProc enables them to achieve a seamless expansion by simply modifying the parameters (file is allowed), without having to alter the script.
Multi-data-source handling. esProc supports computing data from multiple or heterogeneous data sources, including various types of structured data, non-structured data, database data, local files, HDFS big files and the distributed database. Because esProc provides the consistent JDBC interface for main applications, esProc, along with the data sources, can build an easy-to-use hybrid database. Implementing the multi-data-source computing conventionally requires high-end reporting tools, hard-to-maintain ETL, and expensive data warehouse. esProc greatly reduces the couplings of big data and the traditional databases, and removes the single-source report restriction. In addition, it also empowers Java applications to handle the increasingly complex requirements for big data computing.
The current IT development brings about the explosive growth of data, computing, and concurrency. The data environment gets more complex and the computing goal becomes ever more challenging. esProc empowers its users with powerful computing tools and convenient scripting to develop modern big data applications easily.