Basically, I have the columns date
and intensity
which I have grouped by date this way:
intensity = dataframe_scraped.groupby(["date","intensity"]).count()['sentiment']
which yielded the following results:
date intensity
2021-01 negative 33
neutral 72
positive 44
strong_negative 24
strong_positive 22
..
2022-05 positive 13
strong_negative 20
strong_positive 16
weak_negative 12
weak_positive 18
I want to calculate the percentages of these numerical values by date in order to bar-plot it later. Any ideas on how to achieve this?
I've tried something naïve along the lines of:
100 * dataframe_scraped.groupby(["date","intensity"]).count()['sentiment'] / dataframe_scraped.groupby(["date","intensity"]).count()['sentiment'].transform('sum')
CodePudding user response:
I think this should work:
df.value_counts(subset=["date", "intensity"]) / df.value_counts(subset=["date"])
This counts the number of each value in the group, divided by the total number in the date group (so this would be negative's 33 / sum of 2021-01, for example).
The other interpretation of your question is that you wanted the proportion as a total of all counts in the whole dataframe, so you could use this:
df.value_counts(subset=["B", "C"], normalize=True)
which returns the count's proportion against all other groups.