01Dynamic parameter query

It is suitable to perform dynamic parameter query in IDE.

There are xls files named after years. We want to specify file name and range of amounts through parameters and perform queries on files.
Orders2009.xlsx
Orders2010.xlsx

Code and parameters
Execute and debug
xlsimport@t(;,3) means reading the xls file from row 3 to the end. The first row is used as column header after being retrieved.

02Process-related computation

Process-related computations are complicated. They are suitable to be handled in IDE.

Compare two csv files (the old one and the modified, new one) having same number of rows and find how many rows in them having same data (no modification) and how many rows having inconsistent data (modified).
Orders_old.csv
Orders_new.csv
Compare records of the two files in the same row through loop processes.

ABC
1=old=file("Orders_old.csv").import@ct() /Read old file
2=new=file("Orders_new.csv").import@ct() /Read new file
3for old.len()=cmp(old(A3),new(A3))/Compare records on same row in the two files in loop
4 =@|B3//Append comparison result to B4
5=B4.count(~==0) //Number of same rows
6=B4.count(~!=0) //Number of different rows
  1. “for n”means looping through 1 to n; the loop body is the indented B3-B4;
  2. We can reference a default loop variable, which is cell A3 the for statement settles, or a default loop number, like #A3, within the loop body;
  3. We can use a cell name, like A3 and B4, as a default variable name, or define a meaningful name, like old and new;
  4. @ represents the current cell.

More examples

  • Give a 5% performance-based bonus to salespeople whose performance rank in top 10% in the year 2014;
  • Find all subordinates of a certain organization and the number of subordinate levels;
  • Based on user payment details table, generate a summary table recording payable amount per month in the year 2014 for each user;
  • According to keywords in text file 1, query text file 2 and output corresponding strings;
  • According to stock closing prices, list average closing price in 20 days for each transaction date from 2020-01-01 to 2020-01-10;
  • Based on sale table, find the number of days needed to achieve 20 orders in each month of the year 2014;
  • Based on employee income table, get the rank of each employee’s income in their department;
  • Based on score table, get rank of each subject for the student whose ID is 8 in class one;
  • According to stock closing prices, calculate the cumulative transaction amount of the whole year for each transaction date in the year 2019;
  • Based on sale table, get the date when the sale target of 150,000 for the first quarter in 2014 is achieve.

03Complex computation

The IDE is suitable for handling hard to write complex computations.

Count the longest continuously rising dates for a certain stock.
TDATE,CLOSING,OPENING,HIGHEST,LOWEST,VOLUME
1/2/2020,3085.2,3066.34,3098.1,3066.34,29.25B
1/3/2020,3083.79,3089.02,3093.82,3074.52,26.15B
1/6/2020,3083.41,3070.91,3107.2,3065.31,31.26B
1/7/2020,3104.8,3085.49,3105.45,3084.33,27.66B
1/8/2020,3066.89,3094.24,3094.24,3059.13,29.79B
AB
1=T("share_index.csv").sort(TDATE)/Read file and sort it
2=A1.group@i(CLOSING < CLOSING[-1])/Put neighboring records where prices rise to same group
3=A2.max(~.count())/Get the max value
CLOSING[-1] represents the previous closing price; CLOSING < CLOSING[-1] puts neighboring records with rising prices to same group when price is down (each record with downed price is put to a separate group).

There is an xls file storing attendance data, where 7 rows per person per day form one record. We want to perform a row-to-column transposition to generate a new file where each record is made up of two rows per person per day.

ABCDE
1Per_Codein_outDateTimeType
2111026312013-10-1109:17:14In
3111026362013-10-1109:17:14Break
4111026352013-10-1111:38:21Return
5111026302013-10-1111:43:21NULL
6111026362013-10-1113:21:30Break
7111026352013-10-1114:25:58Return
8111026322013-10-1118:28:55Out
ABCDEF
1Per_CodeDateInOutBreakReturn
2111026310/11/201309:17:1418:28:5511:37:0011:38:21
3111026310/11/201309:17:1418:28:5513:21:3014:25:58
The task involves a large amount of order-based computation and computations on grouped subsets.

AB
1=T("attendance.xlsx").sort(Per_Code,Date,Time)/Read data and sort it according to Per_Code, Date and Time
2=A1.group@o((#-1)\7)/Put every 7 records to same group according to Per_Code
3=A2.(~([1,7,2,3,1,7,5,6]))/Get ordered data of the current day from each group
4=A3.conj(
[~.Per_Code,~.Date]
|~.(Time).m([1,2,3,4])
|[~.Per_Code,~.Date]
|~.(Time).m([5,6,7,8]))
/Arrange data in a group into a sequence and concatenate groups
5=create(Per_Code,Date,In,Out,Break,Return)/Create an empty result set
6>A5.record(A4)/Insert data to result set
7=file("result.xlsx").xlsexport@t(A5)/Export result set to a new xls
  1. group@o means that sorting is not necessary before grouping;
  2. The m() function accesses members (can be records) of a set according to ordinal numbers or an array.

More examples

  • Calculate average salary of employees in each of certain states, and get average of employees in all the other states;
  • Calculate growth rate of closing price in each of the last 10 transaction dates in 2019 for a certain stock, compared with the previous date;
  • Find the Olympic games where China is before Russia in terms of medal total;
  • Calculate earnings yield of each date within a certain time period for two stock indexes;
  • For a certain stock, calculate growth rate of the day having the highest closing price, compared with the previous day, within a certain time period;
  • For a certain stock, calculate growth rate of transaction amount compared with the previous day for each of transaction dates when growth rate of closing price is above 3% within a certain time period;
  • According to several salary intervals, count employees and calculate average salary in each interval;
  • Find courses not included in the Select_Course table (those no one selects) according to the order of the table;
  • Get payable amount of each month in 2014 for each user according to user payment details table.

04Parse text files of complex formats

Parsing a text file of complex format involves multiple steps and it is more convenient to handle it in IDE.

A txt file contains multiple e-mails. Each mail consists of 4-field records, where the number of rows under Content field for each record varies. We want to query records according to Receiver interval and date interval.
Sender:
Melody<Melody@bus.emory.edu>
Receiver:
Susan<Susan@google.com>
Date:
1/14/2020
Content:
Do you yahoo!?
SBC Yahoo!DSL - Dow only @29.95per month!

Sender:
Tom<Tom@163.com>
Receiver:
rose<rose@163.com>
Date:
2/24/2020
Content:
IMPORTANT NOTICE:
The information in this email(and any attachments) is confidential.
If you are not the intended recipient, you must not use or disseminate the
… 
The query is not difficult but parsing records with different number of rows is hard.

AB
1=file("mail.txt").import@si()/Read file as a set of rows
2=A1.select(~!="")/Remove empty rows
3=A2.group@i(~=="Sender:")/Group the set according to marker “Sender:”; each group is a record made up of an indefinite number of rows
4=A3.new(~(2):Sender,~(4):Receiver,date(~(6)):Date ,~.to(8,).concat():Content)/Organize the record having an indefinite number of rows into a standard one
5=A4.select(like@c(Receiver,"*"+arg_name+*") && Date>arg_begin && Date<=arg_end )/Query records
  1. group@i closes the current group and create a new one when the specified condition is true.
  2. ~ represents the current member of a set; In A3 it is the current row, and in A4 it is the current group, which corresponds to a record containing an indefinite number of rows.
  3. In A4, ~(2) gets the 2nd row of the current group; ~.to(8,) gets rows from the 8th to the end of the current group.

More examples

  • Parse records made up of an indefinite number of rows into standard text according to a regular expression;
  • Parse text of cross table format into standard text (inverse transposition);
  • Match fields of a csv file according to double quotation marks rather than splitting them simply with the comma;
  • A certain field of a csv file contains subfields represented as sectionalized string;
  • The subfield of a certain field in a csv file is sectionalized string, where first section does not contain value when it is Name;
  • When Type field is 1 Version field does not have any subfields, otherwise Version field has subfields;
  • Traverse all files in a certain directory, and find and replace the specified string;
  • Remove duplicate paragraphs in a txt file while maintaining order of the original paragraphs;
  • There are whitespaces at both end of string field values in a txt file;
  • It is a standard two-dimensional table when values of a certain field are split according to separator into N rows.

05Parse/generate xls files of complex formats

Certain xls files have uniquely complex formats, and they become convenient to handle in IDE.

Cells in an xls file contain data in the form of key=value, and there are empty cells, too. We want to enter key and query all corresponding values.
ABCDE
1 No=123ver=456Cos=789
2No=678ver=783No=900U=89
3No=330Y=67ver=890Cos=311F=19
Each cell corresponds to one record and we want to arrange the file into standard format.

AB
1=file("keyvalue.xlsx").xlsimport@w()/Load data as the form that cells in a row compose a small set and multiple small sets compose a larger set
2=A1.conj()/Concatenate the double-layer set into a single-layer set
3=A2.select(~)/Remove empty cells
4=A3.(~.split("="))/Split each cell value into two parts according to =
5=A4.new(~(1):key,~(2):value)/Standardize records through naming part 1 key field and part 2 value field
6=A5.select(key==arg_key)/Perform query
xlsimport@w parses an xls file into a set of sets.

It is hard to generate an xls file with complex format, and IDE makes the process easy and convenient.

An xls template contains blue table header and white empty cells, and we need to fill white cells with data. The completed table is as follows:
Traditional ways (like POI) involve a huge block of code when filling data to cells of irregular format.

ABCDEF
1Mengniu Funds2017358.2364300
28.550200100400200
3182.676.343.728.516.4
41201.07300.27900.8
515464
6=file("e:/result.xlsx")=A6.xlsopen()
7=C6.xlscell("B2",1;A1)=C6.xlscell("J2",1;B1)=C6.xlscell("L2",1;C1)
8=C6.xlscell("B3",1;D1)=C6.xlscell("G3",1;E1)=C6.xlscell("K3",1;F1)
9=C6.xlscell("B6",1;[A2:F2].concat("\t"))=C6.xlscell("H6",1;[A3:E3].concat("\t"))
10=C6.xlscell("B9",1;[A4:F4].concat("\t"))=C6.xlscell("B11",1;[A5:C5].concat("\t"))
11=A6.xlswrite(B6)
  1. A1-F5 contains data to be filled; A7-F10 is filling process.
  2. Row 6, 9 and 11 contain continuous cells, and SPL can fill data to them at one time through simple code.

More examples

  • Certain rows do not have the 2nd column, and other columns in these rows should shift right one column;
  • Skip certain invalid columns in an xls file;
  • Read a free-style xls file;
  • Read an xls file of cross table format;
  • Read an xls file of primary-sub table format;
  • Generate a summary table of comparisons of economic indicators where there are big gaps between China and the US according the xls file containing economic indicators of countries worldwide;
  • Arrange multiple card-style table in a sheet into a standard two-dimensional table;
  • Merge multiple card-style sheets into a standard two-dimensional table, where each row corresponds one card;
  • It is a standard two-dimensional table when values of a certain column are split according to separator into N rows;
  • Split a file into multiple sheets or files according to a fixed number of rows;
  • Combine a specified column with the other columns and write combinations to sheets or files respectively;
  • Split a row-wise file into multiple card-style sheets, where each card corresponds to one row.

06Diverse data sources

The SPL IDE supports a variety of data sources. Besides csv\txt\xls\xlst, it also supports dozens of other sources, including database \json\xml\http\SalesForce to name a few. This enables cross-source computations.

Retrieve data from http, join it with a csv and perform grouping & aggregation.

AB
1=Orders=json(httpfile("http://127.0.0.1:6868/api/orders").read())/Retrieve data from http
2=Employees=T("d:/Emp.csv")/Retrieve data from csv
3=join@1(Orders:o,SellerId;Employees:e,EId)/Association
4=A3.group(e.Detp,o:dept.Client:client;sum(Amount):amt,count(1):cnt)/Group & summarize