Sort the products based on the frequency of changes in customer demand-CodePudding

Imagine following dataframe is given.

import pandas as pd 
products = ['Apple', 'Apple', 'Carrot', 'Eggplant', 'Eggplant']
customer_demand_date = ['2023-01-01', '2023-01-07', '2023-01-01', '2023-01-01', '2023-01-07', '2023-01-14']
col_02_2023 = [0, 20, 0, 0, 0, 10]
col_03_2023 = [20, 30, 10, 0, 10, 0]
col_04_2023 = [10, 40, 50, 30, 40, 10]
col_05_2023 = [40, 40, 60, 50, 60, 20]

data = {'Products': products,
        'customer_demand_date': customer_demand_date,
        '02_2023': col_02_2023,
        '03_2023': col_03_2023,
        '04_2023': col_04_2023,
        '05_2023': col_05_2023}

df = pd.DataFrame(data)

print(df) 

Products customer_demand_date  02_2023  03_2023  04_2023  05_2023
0    Apple           2023-01-01        0       20       10       40
1    Apple           2023-01-07       20       30       40       40
2   Carrot           2023-01-01        0       10       50       60
3      Egg           2023-01-01        0        0       30       50
4      Egg           2023-01-07        0       10       40       60
5      Egg           2023-01-14        0        0       10       20

I have columns products, custome_demand_date (every week there is new customer demand for products per upcoming months) and months with quantity demand. How can I determine which product has experienced the most frequent changes in customer demand over the months, and sort the products in descending order of frequency of change? I have tried to group by product, accumulate the demand quantity but none of them can analyze the data both horizontally (per customer demand date) and vertically (per months). Desired output:

Sorted products      Ranking(or %, or count of changes) 
Egg                  1 (or 70% or 13)   
Apple                2 (or 52% or 8)
Carrot               3 (22% or 3)

Either ranking or % of change frequency or count of changes.

Note: percentages in desired output are random numbers

I'd really appreciate if you have any clever approach to solve this problem? Thanks

CodePudding user response：

One way is to define a function that counts horizontal and vertical changes which you can apply to each group individually.

import pandas as pd
from io import StringIO

def change_freq(x, months):
    # count horizontal changes
    chngs_horizontal = x[months].diff(axis=1).fillna(0).astype(bool).sum().sum()
    # count vertical changes
    chngs_vertical = x[months].diff(axis=0).fillna(0).astype(bool).sum().sum()
    return chngs_horizontal chngs_vertical

# sample data
data = StringIO("""
Products,customer_demand_date,02_2023,03_2023,04_2023,05_2023
Apple,2023-01-01,0,20,10,40
Apple,2023-01-07,20,30,40,40
Carrot,2023-01-01,0,10,50,60
Egg,2023-01-01,0,0,30,50
Egg,2023-01-07,0,10,40,60
Egg,2023-01-14,0,0,10,20
""")

df = pd.read_csv(data, sep=",")

# count horizontal and vertical changes by product
result = df.groupby('Products').apply(change_freq, ['02_2023','03_2023','04_2023','05_2023'])
result.sort_values(ascending=False)

This returns

Products
Egg       13
Apple      8
Carrot     3

CodePudding user response：

Proposed script

import pandas as pd 
from datetime import datetime

products = ['Apple', 'Apple', 'Carrot', 'Eggplant', 'Eggplant']
customer_demand_date = ['2023-01-01', '2023-01-07', '2023-01-01', '2023-01-01', '2023-01-07']

col_02_2023 = [0, 20, 0, 40, 50]
col_03_2023 = [20, 30, 10, 50, 50]
col_04_2023 = [10, 40, 50, 50, 60]
col_05_2023 = [40, 40, 60, 60, 60]

data = {'Products': products,
        'customer_demand_date': customer_demand_date,
        '02_2023': col_02_2023,
        '03_2023': col_03_2023,
        '04_2023': col_04_2023,
        '05_2023': col_05_2023}

df = pd.DataFrame(data)


def func(g):
    g['diff'] = g['count'].diff()
    return g

df['count'] = df.apply(lambda x: x[2:6].sum(), axis=1)

result = (df.groupby(['Products'])
            .apply(lambda g: func(g))
          )

print(result)

Result

   Products customer_demand_date  02_2023  ...  05_2023  count  diff
0     Apple           2023-01-01        0  ...       40     70   NaN
1     Apple           2023-01-07       20  ...       40    130  60.0
2    Carrot           2023-01-01        0  ...       60    120   NaN
3  Eggplant           2023-01-01       40  ...       60    200   NaN
4  Eggplant           2023-01-07       50  ...       60    220  20.0

[5 rows x 8 columns]

To sort inside a month

result['month'] = [datetime.strptime(date, '%Y-%m-%d').strftime('%Y-%m') for date in customer_demand_date]

result = (result.groupby(['month'])
        .apply(lambda g: g.sort_values('diff', ascending=False))
      )

           Products customer_demand_date  02_2023  ...  count  diff    month
month                                              ...                      
2023-01 1     Apple           2023-01-07       20  ...    130  60.0  2023-01
        4  Eggplant           2023-01-07       50  ...    220  20.0  2023-01
        0     Apple           2023-01-01        0  ...     70   NaN  2023-01
        2    Carrot           2023-01-01        0  ...    120   NaN  2023-01
        3  Eggplant           2023-01-01       40  ...    200   NaN  2023-01

[5 rows x 9 columns]