Home > Software engineering >  How do I calculate the percentage (counted non-numerical values) in Pandas?
How do I calculate the percentage (counted non-numerical values) in Pandas?

Time:05-15

Basically, I have the columns date and intensity which I have grouped by date this way:

intensity = dataframe_scraped.groupby(["date","intensity"]).count()['sentiment'] which yielded the following results:

date     intensity      
2021-01  negative           33
         neutral            72
         positive           44
         strong_negative    24
         strong_positive    22
                            ..
2022-05  positive           13
         strong_negative    20
         strong_positive    16
         weak_negative      12
         weak_positive      18

I want to calculate the percentages of these numerical values by date in order to bar-plot it later. Any ideas on how to achieve this?

I've tried something naïve along the lines of: 100 * dataframe_scraped.groupby(["date","intensity"]).count()['sentiment'] / dataframe_scraped.groupby(["date","intensity"]).count()['sentiment'].transform('sum')

CodePudding user response:

I think this should work:

df.value_counts(subset=["date", "intensity"]) / df.value_counts(subset=["date"])

This counts the number of each value in the group, divided by the total number in the date group (so this would be negative's 33 / sum of 2021-01, for example).

The other interpretation of your question is that you wanted the proportion as a total of all counts in the whole dataframe, so you could use this:

df.value_counts(subset=["B", "C"], normalize=True)

which returns the count's proportion against all other groups.

  • Related