I have a dataframe like this:
ColA | ColB | ColC | ColD |
---|---|---|---|
A | Type1 | 10 | Red |
A | Type1 | 20 | Blue |
A | Type2 | 10 | Blue |
A | Type2 | 10 | Red |
A | Type2 | 30 | Blue |
B | Type1 | 70 | Blue |
B | Type1 | 10 | Red |
B | Type2 | 19 | Yellow |
B | Type2 | 50 | Blue |
B | Type2 | 12 | Yellow |
B | Type2 | 15 | Blue |
I would like to use groupby (ColA and ColB) and transform with condition based on ColD="Blue" to calculate avg ColC into ColE, and I've already made sure each ColA and Colb combination must contains one "Blue" in ColD.
What I would like to get:
ColA | ColB | ColC | ColD | ColE |
---|---|---|---|---|
A | Type1 | 10 | Red | 20 |
A | Type1 | 20 | Blue | 20 |
A | Type2 | 10 | Blue | 20 |
A | Type2 | 10 | Red | 20 |
A | Type2 | 30 | Blue | 20 |
B | Type1 | 70 | Blue | 70 |
B | Type1 | 10 | Red | 70 |
B | Type2 | 19 | Yellow | 37.5 |
B | Type2 | 50 | Blue | 37.5 |
B | Type2 | 12 | Yellow | 37.5 |
B | Type2 | 15 | Blue | 37.5 |
So far, I am successful on groupby transform, but adding condition on ColD throw it off, any ideas?
I've tried without condition on df.groupby(['ColA', 'ColB'])['ColC'].transform('mean')
but don't know how to add condition ColD= "Blue" into it.
CodePudding user response:
Mask
the values in ColC
where ColD
is not Blue
then do groupby
and transform
on masked column
m = df['ColD'] != 'Blue'
df['ColE'] = df['ColC'].mask(m).groupby([df['ColA'], df['ColB']]).transform('mean')
ColA ColB ColC ColD ColE
0 A Type1 10 Red 20.0
1 A Type1 20 Blue 20.0
2 A Type2 10 Blue 20.0
3 A Type2 10 Red 20.0
4 A Type2 30 Blue 20.0
5 B Type1 70 Blue 70.0
6 B Type1 10 Red 70.0
7 B Type2 19 Yellow 32.5
8 B Type2 50 Blue 32.5
9 B Type2 12 Yellow 32.5
10 B Type2 15 Blue 32.5
CodePudding user response:
Let's filter the dataframe by ColD
column then do a groupby.transform
to get the mean of Blue
then do another group.transform
to fill the other color.
df['ColE'] = (df[df['ColD'].eq('Blue')]
.groupby(['ColA', 'ColB'])['ColC'].transform('mean')
.reindex(df.index)
.groupby([df['ColA'], df['ColB']])
.transform(lambda col: col.ffill().bfill()))
print(df)
ColA ColB ColC ColD ColE
0 A Type1 10 Red 20.0
1 A Type1 20 Blue 20.0
2 A Type2 10 Blue 20.0
3 A Type2 10 Red 20.0
4 A Type2 30 Blue 20.0
5 B Type1 70 Blue 70.0
6 B Type1 10 Red 70.0
7 B Type2 19 Yellow 32.5
8 B Type2 50 Blue 32.5
9 B Type2 12 Yellow 32.5
10 B Type2 15 Blue 32.5
CodePudding user response:
You were almost there! :)
df = pd.DataFrame(dict(cola=[1,1,3,4], colb=[1,1,3,7], colc=[5,1,6,9], cold=['b','b','y','b']))
df['cole'] = df.groupby(['cola', 'colb']).colc.transform('mean').where(df.cold == 'b')
| | cola | colb | colc | cold | cole |
|---:|-------:|-------:|-------:|:-------|-------:|
| 0 | 1 | 1 | 5 | b | 3 |
| 1 | 1 | 1 | 1 | b | 3 |
| 2 | 3 | 3 | 6 | y | nan |
| 3 | 4 | 7 | 9 | b | 9 |