Background

Excel

In effect, Excel, instead of many of the BI tools, is the most widely used desktop data analysis tool.

Excel is simple, intuitive, easy to use and to understand, particularly suitable for the average analysts who are incapable of programming and don’t have knowledge about mathematical models. Moreover, the result of each action operated on an Excel worksheet will be immediately shown to enable programmers to decide how to make the next move. This represents the typical model analysts use to perform analytics. It is neither necessary nor possible to model objects beforehand.

But we’ve found some Excel defects as data analysis becomes increasingly complicated. Simply and briefly put, they are the “4M” problems:

1. Multi-row records

There’s no definite concept of structured records in Excel. The single-row record is a record that corresponds to a single row. If a record has too many data items and needs to occupy multiple rows or if it has sub-records – the record is a multi-row record. It’s very complicated to edit a multi-row record and to perform operations on it.

2. Multi-level tables

Excel provides functions for grouping data, yet, unlike an ungrouped worksheet, many operations become impossible or need to be performed in a different way with the resulting hierarchical table, where consecutive operations are hard to be carried out. What’s worse, the grouping generates multi-row records, on which the cross-group copying of formulas that reference the aggregate values can’t be performed correctly and intelligently. As a result, computing errors arise.

3. Multi-table joins

Excel isn’t a relational-algebra-based product. It doesn’t have specialized functions to join tables; it only provides functions such as Lookup for simple cross-page cell reference, which are complicated to use and perform poorly.

4. Multi-member value

It is known that Excel has the concept of set, enabling aggregate operations on cells within a certain range. But it doesn’t provide explicit set data type, making it impossible to store set-typed values and limiting its abilities of handling set-oriented operations.

We’ll further explain the problems later through examples.

esCalc

To solve those Excel problems, we created another model for handling the spreadsheet data, and from this model the brand-new spreadsheet software – esCalc – was born.

Instead of an improved version of Excel, esCalc bases its data and operational models directly on the relational algebra, making it have more in common with the relational database. esCalc has a definition for records, and provides most of SQL’s computational features like computed columns, sorting, filtering, grouping and performing distinct, as well as join and union between multiple tables. Because of its spreadsheet interactive interface, esCalc can be regarded as a visualized SQL calculator.

Different from the general database client softwares that use field name in referencing a data item, esCalc inherits Excel’s grid style, in which cells are used to name data items and describe formulas, and supports automatic and intelligent copying of formulas. The more intuitive way makes it easy to be manipulated by the layman and handy to express order-related computations which SQL isn’t good at.

Furthermore, esCalc includes a multi-level data model to increase related computing capabilities on the basis of SQL, enabling it to support hierarchical tables containing a main table and its sub-tables, to perform operations such as filtering, sorting, re-grouping and ungrouping on the grouped worksheet, and to copy formulas automatically and intelligently between cells at different levels.

esCalc is designed for performing interactive data analysis. It doesn’t support such extensive functionalities as Excel does, but it’s more sophisticated and more adept with handling batched data. esCalc is intended to be a cooperator of Excel, rather than an unnecessary rival. Not only can esCalc retrieve an xls file to analyze and process it, also it can export the computational result in xls format for further processing in Excel. But it’s necessary to point out that esCalc isn’t a plug-in for Excel, it is an independent application.

Compared with Excel, the big functionalities for data handling that esCac hasn’t are VBA scripting and pivot table. In our opinion, VBA is too difficult for average analysts; besides, many VBA scripts are created only in order to complement those functionalities that are not so convenient to be carried out in Excel. With the same functionalities better provided in esCalc, VBA scripting is not as important as it is in Excel. For analysts who are capable of programming, we provide esProc, another product of the RaqSoft software family, for occasions where scripts are needed. The software offers scripting abilities that are much more powerful than VBA. As for the pivot table, Excel has already exceled in its feature and leaves little room for improvement. So the target of esCalc is to generate files of xls format as the data sources for Excel pivot tables.

Now let’s turn to discussing the above-mentioned “4M” problems of Excel in data analysis and handling through examples, and provide their esCalc solutions.