I'm trying to group a DataFrame by two columns and count the difference occurence in the third column. What I've got so far is:
import pandas as pd
df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green'], 'colC': ['value1', 'vlaue2', 'value2', 'value4', 'value5', 'value6']})
df_grouped = df.groupby(["colA", "colB"])["colC"].count()
print(df_grouped)
Where the output is:
colA colB
name1 red 1
name2 yellow 3
name4 black 1
name5 green 1
What I'm trying to do is, see colA and colB as 1 colum and count (and show) the occurance of the different values in colC. So in this case it shoud be:
colA colB colC count
0 name1 red value1 1
1 name2 yellow vlaue2 2
2 value5 1
3 name4 black value4 1
4 name5 green value6 1
Thanks in advance.
Edit: What I also tried is this:
df["colAB"] = df["colA"].map(str) " " df["colB"]
df_grouped = df.groupby(["colAB", "colC"])["colC"].count().reset_index(name="count")
But it also didn't bring the right result. To be more specific. What I need is every value (colC) and the number how often it occurs for the combination of colA and colB. Even if there are 2 columns for colA and ColB the combination of the two is one distinct value.
CodePudding user response:
If I understand correctly you just need to group by 3 col A,B and C:
df_grouped = df.groupby(["colA", "colB","colC"])["colC"].count().reset_index(name="count")
Output :
>
colA colB colC count
0 name1 red value1 1
1 name2 yellow value2 2
2 name2 yellow value5 1
3 name4 black value4 1
4 name5 green value6 1
df_grouped = df.groupby(["colA", "colB","colC"])["colC"].count()
>
colA colB colC
name1 red value1 1
name2 yellow value2 2
value5 1
name4 black value4 1
name5 green value6 1