I have a dataframe. I want to group by rows on some columns and then form a count column and then get the max of counts and create a column for it and attach it to dataframe.
I tried:
df["max_pred"] = df.groupby(['fid','prefix','pred_text1'],
sort=False)["pred_text1"].transform("max")
However it lists a row with max repeat for pred_text1
, but I want the number of reparation for it
For example:
A B C
a d b
a d b
a d b
a d a
a d a
b b c
b b c
b b d
If I group the rows by A and B and then count C and get its max for each group and store that in new column F, I expect:
A B F E
a d 3 b
a d 3 b
a d 3 b
a d 3 b
a d 3 b
b b 2 c
b b 2 c
b b 2 c
Also, E shows the most frequent item specified by max value
CodePudding user response:
You can use groupby.transform
with value_counts
:
df['F'] = (df.groupby(['A', 'B'])['C']
.transform(lambda g: g.value_counts(sort=False).max())
)
Variant with collections.Counter
:
from collections import Counter
df['F'] = (df.groupby(['A', 'B'])['C']
.transform(lambda g: max(Counter(g).values()))
)
Output:
A B C F
0 a d b 3
1 a d b 3
2 a d b 3
3 a d a 3
4 a d a 3
5 b b c 2
6 b b c 2
7 b b d 2
update
I would use a merge
here:
cols = ['A', 'B']
out = df.merge(df.value_counts().groupby(cols).head(1)
.reset_index(name='F').rename(columns={'C': 'E'})
)
Output:
A B C E F
0 a d b b 3
1 a d b b 3
2 a d b b 3
3 a d a b 3
4 a d a b 3
5 b b c c 2
6 b b c c 2
7 b b d c 2
CodePudding user response:
Another option is with get_dummies
; for large enough data, I'd expect @mozway's solution to scale better:
temp = (pd
.get_dummies(df, columns = ['C'], prefix="",prefix_sep="")
.groupby(['A','B'])
.transform('sum'))
(df
.drop(columns='C')
.assign(F=temp.max(1), E = temp.idxmax(1))
)
A B F E
0 a d 3 b
1 a d 3 b
2 a d 3 b
3 a d 3 b
4 a d 3 b
5 b b 2 c
6 b b 2 c
7 b b 2 c