So I need to count the number of occurrences of a value per year, per animal. I've managed to do it but it's outputting a single column kind of dataframe rather than the data being in workable cells.
I've used:
df.groupby(["Animal", "year"])["value"].count()
and:
df.groupby(["Animal", "year", "value"]).size().unstack(fill_value=0)
and I tried a pivot but it gave an error, in the error message it said too much data but the main error was "Index contains duplicate entries". I've about 13000 rows of data.
I have no idea what to do with the response. I can't call the columns like you would with a Dataframe, I don't know how to. I want to be able to create proportions, like value A is 10% and value B is 90% for animal 1 for year 2020, in a Dataframe.
I tried to do df.to_frame()
after the count()
created a series, one but it just created a one column DataFrame.
The data is like:
Animal Year Value
1 2020 A
1 2020 A
1 2019 B
1 2019 B
2 2020 A
And I need it to be:
Animal Year A B
1 2020 2 0
1 2019 0 2
2 2020 1 0
But in a full proper Dataframe of 4 columns, not squished into one column. Then I can create the % proportions from there.
Thanks in advance!!
CodePudding user response:
Try:
x = df.pivot_table(
index=["Animal", "Year"], columns="Value", aggfunc="size", fill_value=0
).reset_index()
x.columns.name = None
print(x)
Prints:
Animal Year A B
0 1 2019 0 2
1 1 2020 2 0
2 2 2020 1 0