I have a Data Frame like that :
df = pd.DataFrame({"id": [1, 2, 2, 1, 3, 3, 2, 1, 2],
"colors": ['green', 'blue', 'green', 'yellow', 'yellow', 'yellow', 'green', 'red', 'black']
So there are two series with same length but no logical patern between id and colors. Eg id 1 can be blue green, red etc.
That I want to do is to create an other Data Frame with a first row of each unique id and next rows with the count of all possible colors.
In my example, I want the result to be :
id | green | blue | yellow | red | black |
---|---|---|---|---|---|
1 | 1 | 0 | 1 | 1 | 0 |
2 | 2 | 1 | 0 | 0 | 1 |
3 | 0 | 0 | 2 | 0 | 0 |
I try : but the == doesn't work.
print(df.loc[df.id.unique() == df["id"], "colors"].value_counts())
CodePudding user response:
You could use pivot_table
:
df.pivot_table(index="id", columns="colors", aggfunc=len, fill_value=0)
# out:
colors black blue green red yellow
id
1 0 0 1 1 1
2 1 1 2 0 0
3 0 0 0 0 2