Home > other >  python dataframe groupby transform with condition
python dataframe groupby transform with condition

Time:12-17

I have a dataframe like this:

ColA ColB ColC ColD
A Type1 10 Red
A Type1 20 Blue
A Type2 10 Blue
A Type2 10 Red
A Type2 30 Blue
B Type1 70 Blue
B Type1 10 Red
B Type2 19 Yellow
B Type2 50 Blue
B Type2 12 Yellow
B Type2 15 Blue

I would like to use groupby (ColA and ColB) and transform with condition based on ColD="Blue" to calculate avg ColC into ColE, and I've already made sure each ColA and Colb combination must contains one "Blue" in ColD.

What I would like to get:

ColA ColB ColC ColD ColE
A Type1 10 Red 20
A Type1 20 Blue 20
A Type2 10 Blue 20
A Type2 10 Red 20
A Type2 30 Blue 20
B Type1 70 Blue 70
B Type1 10 Red 70
B Type2 19 Yellow 37.5
B Type2 50 Blue 37.5
B Type2 12 Yellow 37.5
B Type2 15 Blue 37.5

So far, I am successful on groupby transform, but adding condition on ColD throw it off, any ideas?

I've tried without condition on df.groupby(['ColA', 'ColB'])['ColC'].transform('mean') but don't know how to add condition ColD= "Blue" into it.

CodePudding user response:

Mask the values in ColC where ColD is not Blue then do groupby and transform on masked column

m = df['ColD'] != 'Blue'
df['ColE'] = df['ColC'].mask(m).groupby([df['ColA'], df['ColB']]).transform('mean')

   ColA   ColB  ColC    ColD  ColE
0     A  Type1    10     Red  20.0
1     A  Type1    20    Blue  20.0
2     A  Type2    10    Blue  20.0
3     A  Type2    10     Red  20.0
4     A  Type2    30    Blue  20.0
5     B  Type1    70    Blue  70.0
6     B  Type1    10     Red  70.0
7     B  Type2    19  Yellow  32.5
8     B  Type2    50    Blue  32.5
9     B  Type2    12  Yellow  32.5
10    B  Type2    15    Blue  32.5

CodePudding user response:

Let's filter the dataframe by ColD column then do a groupby.transform to get the mean of Blue then do another group.transform to fill the other color.

df['ColE'] = (df[df['ColD'].eq('Blue')]
              .groupby(['ColA', 'ColB'])['ColC'].transform('mean')
              .reindex(df.index)
              .groupby([df['ColA'], df['ColB']])
              .transform(lambda col: col.ffill().bfill()))
print(df)

   ColA   ColB  ColC    ColD  ColE
0     A  Type1    10     Red  20.0
1     A  Type1    20    Blue  20.0
2     A  Type2    10    Blue  20.0
3     A  Type2    10     Red  20.0
4     A  Type2    30    Blue  20.0
5     B  Type1    70    Blue  70.0
6     B  Type1    10     Red  70.0
7     B  Type2    19  Yellow  32.5
8     B  Type2    50    Blue  32.5
9     B  Type2    12  Yellow  32.5
10    B  Type2    15    Blue  32.5

CodePudding user response:

You were almost there! :)

df = pd.DataFrame(dict(cola=[1,1,3,4], colb=[1,1,3,7], colc=[5,1,6,9], cold=['b','b','y','b']))

df['cole'] = df.groupby(['cola', 'colb']).colc.transform('mean').where(df.cold == 'b')

|    |   cola |   colb |   colc | cold   |   cole |
|---:|-------:|-------:|-------:|:-------|-------:|
|  0 |      1 |      1 |      5 | b      |      3 |
|  1 |      1 |      1 |      1 | b      |      3 |
|  2 |      3 |      3 |      6 | y      |    nan |
|  3 |      4 |      7 |      9 | b      |      9 |
  • Related