Home > database >  Comparing values of same column depending on values in different column from the same pandas datafra
Comparing values of same column depending on values in different column from the same pandas datafra

Time:09-15

Here is my data frame:

month year category size sold
6 2022 shirt M 52
5 2022 shirt M 45
1 2022 shirt S 61
12 2021 shirt S 89
12 2021 pant S 72
7 2022 shirt M 42
8 2022 shirt M 55
8 2022 pants 41 9

What I would like is to roll up previous month to another column:

Like this:

current_month_year previous_month_year category size sold_current sold_previous
6-2022 5-2022 shirt M 52 45
1-2022 12-2021 shirt S 61 89
12-2021 pant S 72 0
8-2022 7-2022 shirt M 55 42
8-2022 pant 41 9 0

how would I do this?

I have no idea how to do this so don't have any code to show.

CodePudding user response:

Will start by creating the column current_month_year based on the columns month and year

df['current_month_year'] = df['month'].astype(str)   '-'   df['year'].astype(str)

[Out]:
   month  year category size  sold current_month_year
0      6  2022    shirt    M    52             6-2022
1      5  2022    shirt    M    45             5-2022
2      1  2022    shirt    S    61             1-2022
3     12  2021    shirt    S    89            12-2021
4     12  2021     pant    S    72            12-2021

Then, let's now focus on the column previous_month_year. The approach will be similar at first, but then we will add the conditions

df['previous_month_year'] = df['month'].astype(str)   '-'   df['year'].astype(str)

[Out]:
   month  year category size  sold current_month_year previous_month_year
0      6  2022    shirt    M    52             6-2022              6-2022
1      5  2022    shirt    M    45             5-2022              5-2022
2      1  2022    shirt    S    61             1-2022              1-2022
3     12  2021    shirt    S    89            12-2021             12-2021
4     12  2021     pant    S    72            12-2021             12-2021

Then let's add the conditions:

• If month is 1 (January), the previous month is 12 (December) with less 1 year

df.loc[df['month'] == 1, 'previous_month_year'] = '12-'   (df['year'] - 1).astype(str)

[Out]:
   month  year category size  sold current_month_year previous_month_year
0      6  2022    shirt    M    52             6-2022              6-2022
1      5  2022    shirt    M    45             5-2022              5-2022
2      1  2022    shirt    S    61             1-2022             12-2021
3     12  2021    shirt    S    89            12-2021             12-2021
4     12  2021     pant    S    72            12-2021             12-2021

• If month is different than 1 (January), then subtract 1 to the month and the year

df.loc[df['month'] != 1, 'previous_month_year'] = (df['month'] - 1).astype(str)   '-'   df['year'].astype(str)

[Out]:
   month  year category size  sold current_month_year previous_month_year
0      6  2022    shirt    M    52             6-2022              5-2022
1      5  2022    shirt    M    45             5-2022              4-2022
2      1  2022    shirt    S    61             1-2022             12-2021
3     12  2021    shirt    S    89            12-2021             11-2021
4     12  2021     pant    S    72            12-2021             11-2021
5      7  2022    shirt    M    42             7-2022              6-2022
6      8  2022    shirt    M    55             8-2022              7-2022
7      8  2022    pants   41     9             8-2022              7-2022

Finally, as OP wants to have the column sold_previous to show the number of sold items in the previous_month_year for a given category in a given current_month_year, the following will do the work (if there isn't previous_month_year, the value of the cell will stay 0 - as defined before the loop):

df['sold_previous'] = 0
for i in range(len(df)):
    if df.loc[i, 'current_month_year'] == df.loc[i, 'previous_month_year']:
        df.loc[i, 'sold_previous'] = 0
    else:
        df.loc[i, 'sold_previous'] = df.loc[(df['current_month_year'] == df.loc[i, 'previous_month_year']) & (df['category'] == df.loc[i, 'category']), 'sold'].sum()

[Out]:
   month  year category  ... current_month_year  previous_month_year sold_previous
0      6  2022    shirt  ...             6-2022               5-2022            45
1      5  2022    shirt  ...             5-2022               4-2022             0
2      1  2022    shirt  ...             1-2022              12-2021            89
3     12  2021    shirt  ...            12-2021              11-2021             0
4     12  2021     pant  ...            12-2021              11-2021             0
5      7  2022    shirt  ...             7-2022               6-2022            52
6      8  2022    shirt  ...             8-2022               7-2022            42
7      8  2022    pants  ...             8-2022               7-2022             0

If one wants to change columns names, for example sold to sold_current, one can do the following

df['sold_current'] = df['sold'] 

CodePudding user response:

You can create a new DataFrame with the output columns you want and iterate over the original DataFrame to filter the size and category in adjacent months to get the previous sales. The current sales and the rest of the rows are easy to add by simply copying from one DataFrame to the other and doing a small transformation to calculate the previous month.

I would do something like this:

df = pd.DataFrame({'month': [6, 5, 1, 12, 12, 7, 8, 8], 'year': [2022, 2022, 2022, 2021, 2021, 2022, 2022, 2022], 'category': ['shirt', 'shirt', 'shirt', 'shirt', 'pant', 'shirt', 'shirt', 'pants'], 'size': ['M', 'M', 'S', 'S', 'S', 'M', 'M', '41'], 'sold': [52, 45, 61, 89, 72, 42, 55, 9]})

df_new = pd.DataFrame(columns=['current_month_year', 'previous_month_year', 'category', 'size', 'sold_current', 'sold_previous'])

df_new['current_month_year'] = df['month'].astype(str)   '-'   df['year'].astype(str)
for index, row in df.iterrows():
    if row['month'] == 1:
        df_new.loc[index, 'previous_month_year'] = '12-'   str(row['year'] - 1)
    else:
        df_new.loc[index, 'previous_month_year'] = str(row['month'] - 1)   '-'   str(row['year'])

df_new['category'] = df['category']
df_new['size'] = df['size']
df_new['sold_current'] = df['sold']

for index, row in df.iterrows():
    sold_previous = df[(df['month'].astype(str)   '-'   df['year'].astype(str)) == df_new.loc[index, 'previous_month_year']]
    sold_previous = sold_previous[sold_previous['category'] == df_new.loc[index, 'category']]
    sold_previous = sold_previous[sold_previous['size'] == df_new.loc[index, 'size']]
    sold_previous = sold_previous['sold'].values
    if sold_previous.size > 0:
        df_new.loc[index, 'sold_previous'] = sold_previous[0]
    else:
        df_new.loc[index, 'sold_previous'] = 0

This would be the output:

current_month_year  previous_month_year category    size    sold_current    sold_previous
0   6-2022  5-2022  shirt   M   52  45
1   5-2022  4-2022  shirt   M   45  0
2   1-2022  12-2021 shirt   S   61  89
3   12-2021 11-2021 shirt   S   89  0
4   12-2021 11-2021 pant    S   72  0
5   7-2022  6-2022  shirt   M   42  52
6   8-2022  7-2022  shirt   M   55  42
7   8-2022  7-2022  pants   41  9   0
  • Related