Imagine following dataframe is given.
import pandas as pd
products = ['Apple', 'Apple', 'Carrot', 'Eggplant', 'Eggplant']
customer_demand_date = ['2023-01-01', '2023-01-07', '2023-01-01', '2023-01-01', '2023-01-07', '2023-01-14']
col_02_2023 = [0, 20, 0, 0, 0, 10]
col_03_2023 = [20, 30, 10, 0, 10, 0]
col_04_2023 = [10, 40, 50, 30, 40, 10]
col_05_2023 = [40, 40, 60, 50, 60, 20]
data = {'Products': products,
'customer_demand_date': customer_demand_date,
'02_2023': col_02_2023,
'03_2023': col_03_2023,
'04_2023': col_04_2023,
'05_2023': col_05_2023}
df = pd.DataFrame(data)
print(df)
Products customer_demand_date 02_2023 03_2023 04_2023 05_2023
0 Apple 2023-01-01 0 20 10 40
1 Apple 2023-01-07 20 30 40 40
2 Carrot 2023-01-01 0 10 50 60
3 Egg 2023-01-01 0 0 30 50
4 Egg 2023-01-07 0 10 40 60
5 Egg 2023-01-14 0 0 10 20
I have columns products, custome_demand_date (every week there is new customer demand for products per upcoming months) and months with quantity demand. How can I determine which product has experienced the most frequent changes in customer demand over the months, and sort the products in descending order of frequency of change? I have tried to group by product, accumulate the demand quantity but none of them can analyze the data both horizontally (per customer demand date) and vertically (per months). Desired output:
Sorted products Ranking(or %, or count of changes)
Egg 1 (or 70% or 13)
Apple 2 (or 52% or 8)
Carrot 3 (22% or 3)
Either ranking or % of change frequency or count of changes.
- Note: percentages in desired output are random numbers
I'd really appreciate if you have any clever approach to solve this problem? Thanks
CodePudding user response:
One way is to define a function that counts horizontal and vertical changes which you can apply to each group individually.
import pandas as pd
from io import StringIO
def change_freq(x, months):
# count horizontal changes
chngs_horizontal = x[months].diff(axis=1).fillna(0).astype(bool).sum().sum()
# count vertical changes
chngs_vertical = x[months].diff(axis=0).fillna(0).astype(bool).sum().sum()
return chngs_horizontal chngs_vertical
# sample data
data = StringIO("""
Products,customer_demand_date,02_2023,03_2023,04_2023,05_2023
Apple,2023-01-01,0,20,10,40
Apple,2023-01-07,20,30,40,40
Carrot,2023-01-01,0,10,50,60
Egg,2023-01-01,0,0,30,50
Egg,2023-01-07,0,10,40,60
Egg,2023-01-14,0,0,10,20
""")
df = pd.read_csv(data, sep=",")
# count horizontal and vertical changes by product
result = df.groupby('Products').apply(change_freq, ['02_2023','03_2023','04_2023','05_2023'])
result.sort_values(ascending=False)
This returns
Products
Egg 13
Apple 8
Carrot 3
CodePudding user response:
Proposed script
import pandas as pd
from datetime import datetime
products = ['Apple', 'Apple', 'Carrot', 'Eggplant', 'Eggplant']
customer_demand_date = ['2023-01-01', '2023-01-07', '2023-01-01', '2023-01-01', '2023-01-07']
col_02_2023 = [0, 20, 0, 40, 50]
col_03_2023 = [20, 30, 10, 50, 50]
col_04_2023 = [10, 40, 50, 50, 60]
col_05_2023 = [40, 40, 60, 60, 60]
data = {'Products': products,
'customer_demand_date': customer_demand_date,
'02_2023': col_02_2023,
'03_2023': col_03_2023,
'04_2023': col_04_2023,
'05_2023': col_05_2023}
df = pd.DataFrame(data)
def func(g):
g['diff'] = g['count'].diff()
return g
df['count'] = df.apply(lambda x: x[2:6].sum(), axis=1)
result = (df.groupby(['Products'])
.apply(lambda g: func(g))
)
print(result)
Result
Products customer_demand_date 02_2023 ... 05_2023 count diff
0 Apple 2023-01-01 0 ... 40 70 NaN
1 Apple 2023-01-07 20 ... 40 130 60.0
2 Carrot 2023-01-01 0 ... 60 120 NaN
3 Eggplant 2023-01-01 40 ... 60 200 NaN
4 Eggplant 2023-01-07 50 ... 60 220 20.0
[5 rows x 8 columns]
To sort inside a month
result['month'] = [datetime.strptime(date, '%Y-%m-%d').strftime('%Y-%m') for date in customer_demand_date]
result = (result.groupby(['month'])
.apply(lambda g: g.sort_values('diff', ascending=False))
)
Products customer_demand_date 02_2023 ... count diff month
month ...
2023-01 1 Apple 2023-01-07 20 ... 130 60.0 2023-01
4 Eggplant 2023-01-07 50 ... 220 20.0 2023-01
0 Apple 2023-01-01 0 ... 70 NaN 2023-01
2 Carrot 2023-01-01 0 ... 120 NaN 2023-01
3 Eggplant 2023-01-01 40 ... 200 NaN 2023-01
[5 rows x 9 columns]