SPL CookBook

Here comes SPL (not SQL) Cookbook.

SPL (Structured Process Language) is the programming language used by esProc, the professional data computing engine. The language encapsulates a wealth of libraries. A data computing and processing program written in SPL is much faster and simpler than one written in SQL or Python. See comparison:《SPL vs SQL》,《SPL vs Python》.

The SPL Cookbook collects hundreds of data processing tasks and provides the SPL solutions. They cover most of the scenarios the data analysts may encounter in real-world businesses. Once you learn how to implement these tasks, you can handle the common data analysis work effortlessly.

CONTENTS

• 1.1 Access a record with its sequence number
• 1.2 Generate a nonexistent group name according to the sequence number in aggregate operation
• 1.3 Group records and do calculation by sequence numbers in each group
• 1.4 Define subsets by the initial sequence number and the specified step value
• 1.5 Loop through sequence numbers to access records and do inter-row calculation
• 1.6 Comparison of sequences
• 1.7 Alignment calculation between members in sequences
• 1.8 Compare whether two sequences are equal
• 1.9 Location: locate a member in the sequence
• 1.10 Location: grouping by the positions of members in a sequence
• 1.11 Location: find a record by the position and do inter-row calculation
• 1.12 Location: find records by positions and do inter-row calculation
• 1.13 Location: Group & count by segment
• 1.14 Location: Group & calculate average value by segment
• 1.15 Location: obtain records by their original sequence numbers after sorting
• 1.16 Location: Group members by positions repeatedly
• 1.17 Location: check whether a record contains all specified members
• 1.18 Location: determine whether a record exists by the primary key value
• 1.19 Location: inter-row calculation over Top N records
• 1.20 Select: find the record with the minimum value
• 1.21 Select: find the record with the maximum value
• 1.22 Select: search data by segment
• 1.23 Select: Top N
• 1.24 Select: Find a record according to the primary key value

• 2.1 Get records by checking whether a target value is contained in a specified set
• 2.2 Get records by checking whether a target value is contained in a specified set (the set is relatively large)
• 2.3 Get records by matched foreign key values
• 2.4 Get records by matched non-foreign-key values
• 2.5 Speed up non-foreign-key mapping
• 2.6 Get records by matched multi-field foreign key values
• 2.7 An example of self join simplification
• 2.8 Get records by mismatched foreign key values
• 2.9 Get mismatched records
• 2.10 An example of simplifying SQL double negation
• 2.11 Get matching records
• 2.12 Compare with all results of subquery

• 3.1 Get the maximum value
• 3.2 Get the sequence number of the record with the maximum value and do inter-row calculation
• 3.3 Get another field value of the record with the maximum value
• 3.4 Find top N field values
• 3.5 Get the sequence numbers of records with top N values of a specified field
• 3.6 Get records with top N values in a specified field
• 3.7 Get other field values of the records with top N values of a specified field
• 3.8 Get top N records in each group after grouping
• 3.9 Perform grouping & aggregation and get top N records in each group

Chapter 4 Grouping & Aggregation READ

• 4.1 Aggregation operation: SUM
• 4.2 Aggregation operation: MAX & MIN
• 4.3 Aggregation operation: AVERAGE
• 4.4 Aggregation operation：COUNT
• 4.5 Aggregation operation： logic AND
• 4.6 Aggregation operation：logic OR
• 4.7 Aggregation operation： Count distinct members
• 4.8 Aggregation operation：MEDIAN
• 4.9 Aggregation operation：RANKING
• 4.10 Aggregation operation： An application scenario of RANKING

• 5.1 Group by the specified order, each group keeps only one record
• 5.2 Group in specified order
• 5.3 Group in specified order and put unmatched records in a new group
• 5.4 Group by sequence number, each group keeps only one record
• 5.5 Group by sequence number
• 5.6 Repeatedly grouped by sequence numbers
• 5.7 Group by segments of field values
• 5.8 Group by segment according to expression result
• 5.9 Group by enumerated conditions, records are not repeatedly grouped
• 5.10 Group by enumerated conditions, unmatched records are put in a new group
• 5.11 Repeatedly group by enumerated conditions

• 6.1 Inter-row calculation in subsets after grouping
• 6.2 Group in the order of record and perform count
• 6.3 Ordered conditional grouping
• 6.4 Group by sequence number
• 6.5 Multilevel grouping & aggregation
• 6.6 Ordered grouping of big data
• 6.7 Ordered conditional grouping of big data

• 7.1 Merge a sequence and a new member in loop
• 7.2 Loop assignment
• 7.3 Loop calculation: complex inter-row calculation
• 7.4 Loop calculation: maximum continuous rising days
• 7.5 Loop calculation: nested loop
• 7.6 Loop calculation: loop number
• 7.7 Loop calculation：calculate adjacent data by position during the loop calculation
• 7.8 Loop calculation: iterative accumulation
• 7.9 Loop calculation：group and calculate ranking
• 7.10 Loop calculation： calculate dense ranking in each group
• 7.11 Loop calculation: iterative sum
• 7.12 Loop calculation: custom iterative calculation

Chapter 8 Join query over multiple tablesREAD

• 8.1 Perform filtering through multi-level association
• 8.2 Switch foreign key field values to the corresponding records
• 8.3 Get records by matched foreign key values
• 8.4 Get records by mismatched foreign key values
• 8.5 Join query over two tables
• 8.6 Perform a multi-field join and conditional filtering over two tables
• 8.7 Join query over multiple tables
• 8.8 Join two tables of the same order by merging
• 8.9 Join big data tables of the same order by merging
• 8.10 Perform a left join by multi-field primary key of dimension table
• 8.11 Perform a left join between two tables
• 8.12 Perform a full join between two tables
• 8.13 Cartesian product with filter condition
• 8.14 Use Cartesian product to calculate matrix multiplication
• 8.15 Use left join to calculate Cartesian product
• 8.16 Join query between big data tables and large dimension table
• 8.17 Fast join query between small data table and large dimension table
• 8.18 Fast join query over same-order data tables and large dimension table
• 8.19 Join two tables through locating records by sequence numbers
• 8.20 Perform an alignment join by positions to shuffle values of a field
• 8.21 Perform alignment join over multiple tables by sequence numbers
• 8.22 Cross Apply operation
• 8.23 Outer Apply operation
• 8.24 Convert Apply operation to Cartesian product
• 8.25 Complex uses of Apply operation

• 9.1 Concatenation of two sets
• 9.2 Intersection of two sets
• 9.3 Union of two sets
• 9.4 Difference of two sets
• 9.5 XOR operation of two sets
• 9.6 Mixed use of concatenation and difference
• 9.7 Set operations of sequences: intersection and union
• 9.8 Concatenation of all set members in a sequence
• 9.9 The union of all set members in a sequence
• 9.10 Merge same-order sets in the current order to calculate concatenation
• 9.11 Merge same-order sets to calculate union
• 9.12 Merge same-order sets to calculate intersection
• 9.13 Merge same-order sets to calculate XOR
• 9.14 Merge same-order sets to calculate difference
• 9.15 Merge table sequences by primary key to calculate concatenation
• 9.16 Merge table sequences to find differences
• 9.17 Merge unordered tables to calculate union
• 9.18 Aggregation of sequences: union & difference
• 9.19 Aggregation of sequences：intersection
• 9.20 Perform mixed set operations over two small files
• 9.21 Perform complex set operations over two small files
• 9.22 Merge two big data tables to calculate concatenation
• 9.23 Merge two big data tables to calculate union

• 10.1 Row to column transposition
• 10.2 Column to row transposition
• 10.3 Bidirectional transposition
• 10.4 Dynamic row to column transposition
• 10.5 Row to column transposition with dynamic columns by filling into a table
• 10.6 Convert multiple rows to multiple rows of another form
• 10.7 Transpose rows to columns by position-based value assignment
• 10.8 Transpose rows to columns, and do inter-column calculations at the same time
• 10.9 Dynamic transposition after the main and sub table join
• 10.10 Dynamic row to column transposition after multi-table join
• 10.11 Transposition in column-layout

• 11.1 Recursively search single references
• 11.2 Traverse all files in the directory
• 11.3 Recursively search all references by loop
• 11.4 Recursively search references until the specified value
• 11.5 Search the upper level reference
• 11.6 Find records with the specified value in the reference chain with the parent value listed
• 11.7 Search for leaf records
• 11.8 Find all upper level references
• 11.9 Hanoi Tower problem
• 11.10 Pirate treasure division problem
• 11.11 Traverse the directories to summarize all the files

Chapter 12 Using structured text dataREAD

• 12.1 Filter small files
• 12.2 Read certain fields in a text file
• 12.3 Read data in a text file using specified separator
• 12.4 Aggregate data in a small file to get sum
• 12.5 Inter-column calculation in a small file
• 12.6 Perform comprehensive calculations using small text files
• 12.7 Read untitled structured text data
• 12.8 Read a text file using specified data type and format
• 12.9 Read structured text data according to the specified character set
• 12.10 Sort data in a small text file in ascending order
• 12.11 Sort data in a small text file in descending order
• 12.12 Sort structured data in a small text file by multi fields in specified order
• 12.13 Perform grouping & aggregation over a small file
• 12.14 Perform filter after grouping over a small file
• 12.15 Deduplication for a small file
• 12.16 Count distinct for small file data
• 12.17 Perform grouping & count distinct in each group over a small file
• 12.18 Associatively query data over multiple files
• 12.19 Join small files to query non-associative field
• 12.20 Join small associative files into a wide table
• 12.21 Combine data from multiple text files
• 12.22 Divide data in a text file into groups and write them to different files
• 12.23 Write data in a text file to different files according to judgements

Chapter 13 Using structured big text fileREAD

• 13.1 Filter a big file
• 13.2 Perform aggregate sum over a big text file
• 13.3 Inter-column calculation in a big text file
• 13.4 Perform comprehensive calculations over a big text file
• 13.5 Sort a big text file
• 13.6 Sort a big text file in descending order
• 13.7 Sort a big text file by multiple fields in specified order
• 13.8 Find records in a big data table that match data in another big data table
• 13.9 Perform grouping & aggregation over a big file, with small result set
• 13.10 Perform grouping & aggregation over a big file, with large result set
• 13.11 Filter after grouping over a big file
• 13.12 Deduplication of big text file
• 13.13 Count distinct over a big text file
• 13.14 Group & count distinct in each group over a big text file
• 13.15 Group a big file by values of a certain field, and query record containing the max value of another field in each group
• 13.16 Combine & calculate data in multiple big data files
• 13.17 The join filter over a large file and a small file
• 13.18 Join a large file and a small file into a wide table to query
• 13.19 Merge-join two big files
• 13.20 Set operations of multiple big text files
• 13.21 Divide a big text file into groups and write them to different files
• 13.22 Write data in a large text file to different files according to judgements
• 13.23 Organize a fixed-structure big text file into structured data
• 13.24 Organize a big file with indefinite-line structure into structured data
• 13.25 Find the lines containing keyword in all big text files in the specified directory
• 13.26 Replace specified text in all text files under the specified directory
• 13.27 Count the frequencies of each word in a big text file
• 13.28 Count the frequencies of each letter in a big text file
• 13.29 Remove duplicate lines from a big text file
• 13.30 Remove repeated paragraphs from a big text file

Chapter 14 Use SQL to query text data directly READ

• 14.1 Filter
• 14.2 Aggregate
• 14.3 Inter-column calculation
• 14.4 CASE statement
• 14.5 Sort
• 14.6 TOP-N
• 14.7 Group & Aggregate
• 14.8 Filter after grouping
• 14.9 Select distinct
• 14.10 Count distinct
• 14.11 Count distinct in each group after grouping
• 14.12 Join query over two text files
• 14.13 Join query over multiple files
• 14.14 Multi-level join query over multiple files
• 14.15 Using nested subquery
• 14.16 Using common table expression (CTE)
• 14.17 Using command line to execute SQL

• 15.1 Read xlsx data in simple format
• 15.3 Read free format xlsx data
• 15.4 Read the crosstab in an xlsx file
• 15.5 Read the main & sub table in xlsx file
• 15.6 Read big xlsx file
• 15.7 Write a data table to xlsx file
• 15.8 Append data table to xlsx file
• 15.9 Write data table to different worksheets of an xlsx file
• 15.10 Export a large amount of data to xlsx file
• 15.11 Sort after join
• 15.12 Specify display attributes
• 15.13 Fill in the specified cell or area of an xlsx file
• 15.14 Export row-style report to xlsx file
• 15.15 Export multi-level grouped report to xlsx file
• 15.16 Export crosstab report to xlsx file
• 15.17 Combine multiple xlsx files of same structure
• 15.18 Split an xlsx file and export it to different xlsx files

Chapter 16 Using JSON and XML dataREAD

• 16.1 Import single-layer JSON file
• 16.2 Import multi-layer JSON file with same-structure detailed data
• 16.3 Import multi-layer JSON file with different-structure detailed data
• 16.4 Nested aggregation
• 16.5 Get field values recursively & combine members of sequences recursively to get SUM
• 16.6 Store a JSON file to the database
• 16.7 Store a multi-layer JSON file to multiple database tables
• 16.8 Output the data table as an XML string with elements only
• 16.9 Import an element-only XML file and organize it according to specified format
• 16.10 Import XML file with both elements and attributes
• 16.11 Import XML file, perform alignment combing and then filtering
• 16.12 Import elements with different structure of the specified layer from an XML file
• 16.13 Import elements of the specified layer from an XML file with different sub-node element structure
• 16.14 Join query over XML file and database data
• 16.15 Parse XML data in batches
• 16.16 Call external WebService according to parameters and import XML data
• 16.17 Get different data from XML file according to parameters

• 17.1 Organize a multi-line, fixed-structure text
• 17.2 Organize a varied structure text
• 17.3 Parse text with regular expression and organize it into structured data
• 17.4 Parse text with regular expression and organize it into structured data (One record corresponds to multiple lines)
• 17.5 Read in text and perform transposition
• 17.6 Organize a complex text file into structured data
• 17.7 Search all text files in the specified directory to find the lines containing keywords
• 17.8 Replace string in all text files in a specified directory
• 17.9 Count the frequencies of each English word in a text file
• 17.10 Remove duplicate lines from a text file
• 17.11 Count the frequencies of each letter in a text file
• 17.12 Remove duplicate paragraph from a text file

Chapter 18 String & datetime handlingREAD

• 18.1 Concatenate strings in two columns
• 18.2 Concatenate string and other type of value
• 18.3 Concatenate members in a sequence
• 18.4 Add quotation marks to members when concatenating members of a sequence
• 18.5 Convert a table sequence to CSV format
• 18.6 Split a string into a sequence of characters
• 18.7 Split strings into a sequence of words
• 18.8 Use tab as a separator to split a string
• 18.9 Use comma as the separator to split a string
• 18.10 Split a string into two segments by specified separator
• 18.11 Split a string with regular expression
• 18.12 Parse a string into numerical value
• 18.13 Parse a percentage string into a numerical value
• 18.14 Automatically parse a string into the proper data type
• 18.15 Split a string and parse the split members into proper data types
• 18.16 Parse string to table sequence
• 18.17 Parse the string type field in a table sequence with regular expression
• 18.18 Parse indefinite-structure text with regular expression
• 18.19 Use code to parse string type fields in a table sequence
• 18.20 Modify the filter condition in the SQL statement
• 18.21 Translate standard SQL statements into specified database format
• 18.22 Parse and analyze HTML file
• 18.23 Parse HTML file to get table sequence
• 18.24 Calculate the date N days after a certain date
• 18.25 Calculate the number of days between two dates
• 18.26 Calculate the number of seconds / minutes between two datetimes
• 18.27 Calculate the first day and last day of the week
• 18.28 Calculate the average daily sales for a quarter
• 18.29 Calculate age
• 18.30 Calculate the date N months before a certain date
• 18.31 Calculate the date after N working days
• 18.32 Get a sequence of working days
• 18.33 Get a sequence of dates between two dates
• 18.34 Divide the period between two dates equally into n segments

• 19.1 Pass value list to SQL through sequence parameter
• 19.2 Translate standard SQL to syntax of different database products
• 19.3 Retrieve data from a single table by segment with parallel processing
• 19.4 Multi-table retrieval with parallel processing
• 19.5 Update databases with parallel processing
• 19.6 Compose dynamic SQL on same-structure tables – (1)
• 19.7 Compose dynamic SQL on same-structure tables – (2)

• 20.1 JOIN operations
• 20.2 Multi-operation on a single table
• 20.3 Simple IN operator
• 20.4 Foreign key objectification
• 20.5 A simple implementation of APPLY algorithm
• 20.6 Traversal-based calculation and concatenation
• 20.7 Multi-table concatenation
• 20.8 Union
• 20.9 Intersection
• 20.10 Difference
• 20.11 Get sequence number of a member in a sequence
• 20.12 Intersection of sequence and record sequence field
• 20.13 TopN calculations
• 20.14 Nested-structure data aggregation
• 20.15 Nested subdocuments query
• 20.16 Multi-attribute subdocuments concatenation
• 20.17 Nested List subdocuments query
• 20.18 Cross aggregation
• 20.19 Group by segment
• 20.20 Group by category
• 20.21 Export data as CSV
• 20.22 Export to database
• 20.23 Import data to MongoDB
• 20.24 Mixed data sources computations

Chapter 21 As Reporting Tools' CollaboratorREAD

• 21.1 Column-group layout
• 21.2 Column-group layout in stagger pattern
• 21.3 Print wide table horizontally
• 21.4 Row copy
• 21.5 Conditional group display
• 21.6 Dynamically insert a sub table to master table
• 21.7 Join fields horizontally to generate a table
• 21.8 Crosstab inter-column calculation
• 21.9 Row-to-column transposition
• 21.10 Aggregation by specified months
• 21.11 Dynamic data source
• 21.12 Intra-group inter-row calculations
• 21.13 Split a field value into records
• 21.14 Dynamic grouping by time interval
• 21.15 IN condition query