Grouping

Formulas and their copying

We mentioned in the preceding part that there are records with their sub-records. But in many cases the hierarchical records are generated from group operations.

The Excel data model doesn’t support multi-level worksheets. Though the group operation is provided, it is treated specially. Aggregate operations performed on the summary level after data grouping use SUBTOTAL, which is difficult to memorize, instead of more familiar functions like SUM/COUNT; otherwise group members won’t be correctly located. 

As mentioned previously, for the formulas in cells at the level of detail data, on one hand we can’t simply use the drag-and-drop method to perform the batched copying (because detail data is inconsecutive data areas separated by summary rows) but can only perform the cross-group copying manually; on the other hand, when formulas reference cells at the summary level or involve cross-row calculations (such as calculations of percentages and YOY rate), even the manual operation can’t guarantee a correct copying according to the Excel rule of formula copying, and, moreover, manual modification of mistakenly copied formulas is needed. All the work is too tedious to bear when there are a lot of groups.

Excel provides the sign $ to reference summary cells in a one-level worksheet, but it becomes helpless when facing the multi-level worksheet.

esCalc has a data model that supports multi-level tables. Aggregate operations performed on the summary level after data grouping still use common functions like sum/count. Particularly, esCalc distinguishes the levels to which the cells belong, and handles the copying of formulas that reference cells both at detail data level (including inter-row reference) and at summary data level according to different situations. The intra-group copying only adjusts cells at the detail data level, while cross-group copying changes the cells at summary data level. What the esCalc users need to do is to reference a desirable cell intuitively, without having to distinguish different levels themselves using the sign $ (actually the sign can merely reference data from one level and falls short of the need). With esCalc, formulas can be correctly copied even calculations involve multiple levels of summary data.

Calculate the average temperature difference in each month, for example, based on the following sheet: 

esCalc_homeintro_group_1

esCalc stores the same type of data in homo-rows. Thus as E3 calculates the average temperature difference in January, its homo-cells corresponding to other months calculate their respective average temperature differences at the same time, saving users the trouble of copying formulas. Here’s the result:

esCalc_homeintro_group_2

In the above data handling, the month data is sub-rows and their parent row is the quarter data. Calculations performed on the sub-rows won’t affect the parent row; and similarly, data handling in the parent row won’t affect the sub-rows. For instance, enter the formula ={A3}.count() in E2 to calculate the number of records in each quarter. Here’s the result:

esCalc_homeintro_group_3

esCalc formulas can be intelligently copied according to different data structures, instead of being mechanically copied according to positions of cells. The esCalc copying rule is more reasonable.

Another example is to calculate the precipitation based on the climate table:

esCalc_homeintro_group_4

Of which E2 and E6 respectively calculate the average precipitation of the current quarter, as E2’s formula ={D3}.avg(). To calculate the difference between the precipitation in each month and the average precipitation in the corresponding quarter, just enter the formula =D3-E2 in E3. Here’s the result:

esCalc_homeintro_group_5

Formulas have been intelligently adjusted during the copying according to the hierarchical level to which the target cell belongs. For instance, we click on E6 and know that the formula has been adjusted as ={D7}.avg(), which calculates the average precipitation of the current quarter; click on E8 and see the formula have been adapted as =D8-E6, which means subtracting the average precipitation value of the current quarter from the precipitation of the current month. So we can see that esCalc can correctly copy the formula to both a cell that sits on a group’s summary row and one that sits on a group’s detail row.

Post-grouping operations

That data grouping in Excel is special is also reflected by the difficulty in handling post-grouping operations. We can’t perform operations such as sorting and filtering freely on the grouped worksheet table as what we do with a single-level table.

For example, in order to find out ranks of sellers on performances, we want to group and aggregate the order records by sellers and then sort groups according to aggregate amounts. To do that we need to first perform group and aggregate and then the sort by aggregate values; during the sorting, members of a group need to move together with the aggregate value. But we can’t perform this kind of sorting automatically in Excel. In a modified version of this example, for each seller we want to delete the small orders, each of which makes up less than 1% of the seller’s total sales amount, and then re-calculate the total amount. This requires grouping data and calculating the percentage of each member in each group, and performing filtering on all groups by the percentages (here the non-related calculations discussed above will also be used). Excel can’t make it all at once due to its lack of support to operations on the multi-level worksheet; it can only handle the groups separately one by one.

esCalc sees a multi-level worksheet as normal, and makes it open to all operations. So it’s easy for esCalc to handle the above scenarios. In esCalc, during the sorting by aggregate values after data grouping, detail rows of a group will move together with their summary row, which is again an application of esCalc record conception (a group with its members as a whole can be regarded as a record). The post-grouping filtering on detail rows will be performed once and for all by copying all groups at one time.

Here’s the order table: 

esCalc_homeintro_group_6

F1 calculates total sales amount of orders with the formula ={E3}.sum(). F3 calculates the percentage of each order’s amount in the total amount with the formula =round(E3/F1,4).

esCalc permits various operations on a grouped worksheet, such as filtering. To delete every order whose amount makes up less than 1% of the total sales amount, we select F3 to do the filtering:

esCalc_homeintro_group_7

Here’s what we get through data filtering:

esCalc_homeintro_group_8

Structure editing

Excel doesn’t support inserting or deleting a data level based on a grouped worksheet. To change the existing data structure, we need to clear groups and re-group data, making the work we did on the summary level (the calculated cells) a waste. Sometimes it is the summary values, instead of the details, that we desired.

In the worksheet in which the small orders have been removed, for example, we need to group the records by the ordered products to see which products are more popular among each seller’s non-small orders. To do this we need to insert another level of groups into the double-level grouped worksheet, and to aggregate and sort each group. As we are only interested in the group and aggregate results, we want to delete the detail level of data. But we can’t perform these operations automatically in Excel. We have to copy the intermediate results out into a new worksheet for further handling. Even worse is that since the grouped data is not continuous, even the copying action can’t be carried out automatically.

In esCalc, we re-group the preceding worksheet table by products, and here’s the result:

esCalc_homeintro_group_9

We can do further computations based on the re-grouped worksheet. To calculate the total sales amount for each seller, for instance, we enter the same formula ={E4}.sum() in both E2 and E3. Here’s the result:

esCalc_homeintro_group_10

We entered the same formulas in F1, E2 and E3 to calculate total sales amount, but we get different results because they are entered in cells that sit at different levels.

Now we select E2 to perform a sorting in descending order to sort the worksheet data by the total sales amounts of sellers. The result is as follows:

esCalc_homeintro_group_11

In esCalc, when grouping rows move because of sorting or other operations, their sub-rows will follow suit.

There’s nothing particular for esCalc to carry out these operations. Because it defines the hierarchical level as a nature of the worksheet, enabling free insertion or deletion of a level and automatic copying of an action operated on a row to all its homo-rows (similar concept to homo-cells). So all detail rows will be deleted simultaneously if we execute an action on a certain detail row.

The data model for esCalc spreadsheet encompasses a hierarchical structure, making grouping and ungrouping the normal operations that can be still performed on the same worksheet as filtering and sorting. Here’s an analogy between spreadsheet data models and a numerical system. Within the range of integers, we are free to do addition, subtraction and multiplication but we can’t do division at will, because the quotient isn’t necessarily an integer. But if we expand the range into the rational numbers, the division operation becomes naturally as well, though we need to redefine the rules for other operations in the expanded scope. Likewise, when esCalc extends the data model for worksheets to include a multi-level structure, it also redefines rules for carrying out sorting, filtering and generating computed columns (to support the smart copying of formulas across different levels, for instance). By doing so, related operations can be performed consecutively, ensuring an interactive data analysis to proceed smoothly.