I'm trying to count the occurance of grouped values and write values in a column using apply and grouby function on a dataframe. I have the following data frame:
df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})
I have two statements which gives me the correct result, but I need the result in combination. One is:
df_new = df.groupby("colA").count()
which gives
colA
name1 1
name2 3
name4 1
name5 2
the other one is
df_new = df.groupby("colA")["colB"].apply(lambda lists: ','.join(color)).reset_index(name='Color')
and gives
colA Color
0 name1 red
1 name2 yellow,yellow,yellow
2 name4 black
3 name5 green,blue
what I need is the combination looking like
colA Color Count grouped A
0 name1 red 1
1 name2 yellow,yellow,yellow 3
2 name4 black 1
3 name5 green,blue 2
Trying to combinate in many ways and of course did research but I couldn't get it right.
Thanks in advance.
CodePudding user response:
You can join first to second as new column and use colA
to assign values in correct places.
df_new = df_2.join(df_1, on='colA')
It needed also df_1.rename(columns={'colB': 'Count grouped A'})
import pandas as pd
df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})
df_1 = df.groupby("colA").count().rename(columns={'colB': 'Count grouped A'})
df_2 = df.groupby("colA")["colB"].apply(lambda lists: ','.join(lists)).reset_index(name='Color')
df_new = df_2.join(df_1, on='colA')
print(df_new)
EDIT:
The same with small changes
first
groups = df.groupby("colA")
and later two timesgroups...
.apply(','.join)
instead of.apply(lambda lists: ','.join(lists))
import pandas as pd
df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})
groups = df.groupby("colA")
df_1 = groups.count().rename(columns={'colB': 'Count grouped A'})
df_2 = groups["colB"].apply(','.join).reset_index(name='Color')
df_new = df_2.join(df_1, on='colA')
print(df_new)
EDIT:
If you would keep Color
as list
then it could be simpler.
You could use .str.len()
to count elements in list
.str
suggests that it has functions for strings but some of them work also with list
(ie. .str[1:4]
) or even with dictionary
(ie. .str[key]
)
import pandas as pd
df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})
df_new = df.groupby("colA")["colB"].apply(list).reset_index(name='Color')
df_new['Count grouped A'] = df_new['Color'].str.len()
print(df_new)