Filter according to cumulative value
Task:Find the top n customers whose sales accounts for half of the total sales, and rank them by sales in descending order.
Python
| 1 | import pandas as pd |
| 2 | sale_file = "E:\\txt\\sales_volume.csv" |
| 3 | sale_info = pd.read_csv(sale_file) |
| 4 | sale_info.sort_values(by=‘Amount’,inplace=True,ascending=False) |
| 5 | half_amount = sale_info[‘Amount’].sum()/2 |
| 6 | vip_list = [] |
| 7 | amount = 0 |
| 8 | for client_info in sale_info.itertuples(): |
| 9 | amount += getattr(client_info, ‘Amount’) |
| 10 | if amount < half_amount: |
| 11 | vip_list.append(getattr(client_info, ‘Client’)) |
| 12 | else: |
| 13 | vip_list.append(getattr(client_info, ‘Client’)) |
| 14 | break |
| 15 | print(vip_list) |
Pandas does not have an existing loop function to calculate the position that meets the conditions, so it can only use for loop to complete the calculation.
esProc
| A | ||
| 1 | E:\\txt\\sales_volume.csv | Data storage path |
| 2 | =file(A1).import@tc().sort@z(Amount) | Import data and sort by amount in descending order |
| 3 | =A2.sum(Amount)/2 | Calculate half of total sales |
| 4 | =A2.pselect(cum(Amount)>=A3) | Find the position where the cumulative sales is more than half of total sales |
| 5 | =A2.(Client).m(:A4) | Take the client names before the position |
It is recommended to use loop function to calculate in esProc, and filter and record the position information according to the conditions.
