﻿

# Implement Basic Functions of SQL: R Language vs. esProc

As we know, SQL users have to finish composing all codes and then run them all at one time, resulting in a poor ability to interact. However, the simple and easy-to-understand query syntax of SQL is always welcomed by programmers. R and esProc as the computation and analysis tool are surely required to offer the similar query syntax. In this essay, we will discuss how they implement the basic functions of SQL through R and esProc through some examples.

The example data is from 2 tables of the classical Northwind database:
Orders table with the main fields: OrderID, EmployeeID, OrderDate, Freight, CustomerID
Customer table with the main fields: CustomerID, CompanyName

Retrieve data of the entire table

SQL solution: select * from Orders
R solution: A1 esProc solution: \$select * from Orders
Comments: The word count of esProc and SQL codes only differs by one word. R has a strong flavor of programming.

Where: Search for the order which has a freightage higher than 100 and is placed before the year of 1998

SQL solution: SELECT * FROM Orders WHERE Freight > 100 AND OrderDate < ’1998-01-01′
R solution: subset(A1,Freight > 100 & OrderDate < as.POSIXlt(’1998-01-01′))
esProc solution: =A1.select(Freight > 100 && OrderDate < date(“1998-01-01″))
Comments: Both SQL and R solutions are close to SQL to some extent. R is of the typical function style, and esProc is of the typical object style. The programmers prefer the former, while the business experts prefer the latter, esProc, for esProc is easier for them to understand.

Order: Sort by employees correctly, and then sort by freightage in reverse order

SQL solution: SELECT * FROM Orders ORDER BY EmployeeID ,Freight DESC
R solution: A1[order(A1\$ EmployeeID,-A1\$Freight),]
esProc solution: =A1.sort(EmployeeID,Freight:-1)
The R solution is to retrieve 2 vectors from A1 at first, pass them to the function order to group them together for sorting, and then export the serial numbers. At last, data will be rearranged according to the serial number. The computation process of R is rather winding, not as straightforward as SQL. It is because that R is good at handling vector, the access to structured data will take column as the basic element, and the parameter usually takes up a whole column. By comparison, SQL takes the record (column) as the basic unit, with parameter as the column name.
esProc solution resembles that of SQL because esProc takes the record (row) as the basic element .

Group & Sum: Summarize by employee, sum up the freightage, and count the orders:

SQL solution: SELECT EmployeeID, COUNT(OrderID), SUM(Freight) FROM Orders GROUP BY EmployeeID
esProc solution: =A1.group(EmployeeID; EmployeeID, ~.count(), ~.sum(Freight))
R solution:
A4<-aggregate(A1\$Freight,list(A1\$EmployeeID),sum)
A4\$count<-tapply(A1\$Freight,A1\$EmployeeID,length)
Comments: In this case, it is obvious that R and SQL differ greatly. The algorithms available in R may be clearer for mathematicians, and means more learning efforts for users accustomed to SQL.

Join: Perform left join on Orders table and Customers table by CustomerID.

SQL Solution:
Select * from Orders left join Customers on Orders.CustomerID =Customers.CustomerID
esProc solution:
=join@1(A1:CustomerID:Orders, B1:CustomerID:Customers)
R solution:
merge(A1,B1,by.x=”CustomerID”,by.y=”CustomerID”,all.x=TRUE)
Comments: The join of SQL equals to join of esProc or the merge of R. Similarly, the left join of SQL equals to join@1 of esProc, or merge(…all.x=TRUE) of R. Obviously, esProc is more alike SQL in the respects of both the syntax conventions and the literal meanings.

Distinct: Remove the duplicate CustomerID

SQL solution: select distinct CustomerID from Orders
R solution: unique(B2\$CustomerID)
esProc solution: =B2.id(CustomerID)
Comments: The keywords of the two solutions respectively differ to that of SQL. However, their usages are basically the same to that of SQL. In which, R is the typical function style, and esProc is the typical object style.

Like: Search for the record with Island in ShipName

SQL solution: select * from Orders where ShipName like ‘%Island%’
R solution: subset(A1,grepl(“Island”,ShipName,ignore.case = TRUE))
esProc solution: =A1.select(like@c(ShipName ,”*Island*”))
Comments: R supports several means to match, including the regular expressions, and is more powerful than esProc in this respect. The usages of esProc are more close to that of SQL, and fit for those who are familiar with SQL.

As can be seen from the above comparison, esProc has a coding style more close to that of SQL since esProc supports the data type of “Record” by the infrastructure, making it more suitable for users that are familiar with SQL. In addition, compared with R, esProc provides the representation style of “object + functions” that is much easier for business experts to accept.

R is more resourceful in details, ideal for the programmers and mathematicians. In addition, supporting the regular expressions and other functions makes R more open as a preferred analysis tool for programmers.