Home > database >  Python: apply if else to multiple variables within a grouping to create a new column
Python: apply if else to multiple variables within a grouping to create a new column

Time:08-24

I have some data that looks like this:

d = {'id' : ["A","A","A","A","A","A","B","B","B","B","B","B"],
'month' : [1,1,1,1,2,2,1,1,2,2,2,2],
'week' : [1,2,3,4,1,2,1,2,1,2,3,4]}
example_df = pd.DataFrame(data = d)

I want to group by id and create a new column based on the contents of month and week but I get this error: KeyError: 'month'. Here is my attempt:

example_df['final_score'] = (
example_df.groupby(['id'])
.transform(lambda x: 'converted' if ((x['month'] == 1) &
                                     (x['week'].isin([3,4].any())))
           else 'not_converted')
                                 )

Does anyone know what's going on here?

CodePudding user response:

groupby.transform handles only one column at a time.

Use groupby.transform('any') to build a mask to use with numpy.where:

m1 = example_df['month'].eq(1)
m2 = example_df['week'].isin([3,4]).groupby(example_df['id']).transform('any')

example_df['final_score'] = np.where(m1&m2, 'converted', 'not_converted')

output:

   id  month  week    final_score
0   A      1     1      converted
1   A      1     2      converted
2   A      1     3      converted
3   A      1     4      converted
4   A      2     1  not_converted
5   A      2     2  not_converted
6   B      1     1      converted
7   B      1     2      converted
8   B      2     1  not_converted
9   B      2     2  not_converted
10  B      2     3  not_converted
11  B      2     4  not_converted

CodePudding user response:

What result do you want? Is it maybe one you can get with this line of code ?

import numpy as np
example_df['final_score'] = np.where((example_df['month'] == 1 & example_df['week'].isin([3,4])), "converted",  "not converted")

CodePudding user response:

I don't see why you need a groupby and transform for this given that your transform is not dependent on the grouping -

A simple apply like this should work -

example_df['final_score'] = example_df.apply(lambda x: 'converted' if ((x['month'] == 1) &
                                     (x['week'] in [3, 4]))
           else 'not_converted', axis=1)

Output

   id  month  week    final_score
0   A      1     1  not_converted
1   A      1     2  not_converted
2   A      1     3      converted
3   A      1     4      converted
4   A      2     1  not_converted
5   A      2     2  not_converted
6   B      1     1  not_converted
7   B      1     2  not_converted
8   B      2     1  not_converted
9   B      2     2  not_converted
10  B      2     3  not_converted
11  B      2     4  not_converted
  • Related