Home > Enterprise >  Calculating proportions by each subgroup (dplyr)
Calculating proportions by each subgroup (dplyr)

Time:10-26

I have a dataset composed of reddit posts where each row shows the post, date, the predicted ML sentiment based on the content in the post, and whether a given post is directed at a specific politician.

Here is a data example:

post        date            mood         directed_to_whom 
Cartman   2012-09-03.       negative           Romney
Cartman  2012-09-06.        negative           Romney
Cartman  2012-09-13.        negative           Romney 
Cartman    2012-09-15.      neutral           Bush
Mackey   2012-09-03.       negative           Bush
Mackey  2012-09-08.        neutral            Bush
Mackey  2012-09-13.        neutral            post
Garrison   2012-09-03.      negative          Romney
Garrison  2012-09-04.       negative          pre
Garrison  2012-09-04.       negative          pre
Garrison  2012-09-05.     negative           Obama

I have created a graph showing the monthly share of negative, neutral, versus positive posts throughout time as below. However, I am interested in creating a variable that measures the number/share of posts that were negative and directed at Obama, or positive and directed at Romney, but I am not sure if that's possible?

ggplot(both_group, aes(x = as.Date(month_year), fill = sentiment ,y = sentiment_percentage))  
    geom_bar(stat = "identity", position=position_dodge())   
    scale_x_date(date_breaks = "1 month", date_labels = "%b %Y")   
      xlab("Sentiment")   
    theme(plot.title = element_text(size = 18, face = "bold"))  
          scale_y_continuous (name = "Sentiment share")  
  theme_classic() 
    theme(plot.title = element_text(size = 5, face = "bold"),
          axis.text.x = element_text(angle = 90, vjust = 0.5))

Here is the Output: enter image description here

CodePudding user response:

What about something like this. PS. I edited your data to make the plot more interesting in the example.

library(tidyverse)

dat |>
  mutate(month = lubridate::ymd(date) |> 
           lubridate::month()) |>
  count(month, mood, directed_to_whom)|>
  group_by(month, directed_to_whom) |>
  mutate(freq = n/sum(n)) |>
  filter((mood == "negative" & directed_to_whom == "Obama") |
           (mood == "positive" & directed_to_whom == "Romney")) |>
  unite(grp, mood, directed_to_whom, sep = " toward " ) |>
  ggplot(aes(month, freq, color = grp)) 
  geom_point() 
  geom_line()

Example Data:


dat <- read_table("post        date            mood         directed_to_whom
Cartman   2012-09-03.       negative           Romney
Cartman  2012-09-06.        positive           Romney
Cartman  2012-09-13.        negative           Romney 
Cartman    2012-09-15.      neutral           Bush
Mackey   2012-09-03.       negative           Obama
Mackey  2012-09-08.        neutral            Obama
Mackey  2012-09-13.        neutral            Obama
Garrison   2012-09-03.      positive          Romney
Garrison  2012-09-04.       negative          Obama
Garrison  2012-09-04.       negative          Obama
Garrison  2012-1010-04.       negative          Obama
Garrison  2012-10-04.       positive          Obama
Garrison  2012-09-04.       positive          Obama
Garrison  2012-09-04.       negative          Obama
Garrison  2012-11-04.       negative          Obama
Cartman  2012-09-06.        positive           Romney
Cartman  2012-10-06.        positive           Romney
Cartman  2012-10-06.        positive           Romney
Cartman  2012-10-06.        neutral           Romney
Cartman  2012-11-06.        negative           Romney
Cartman  2012-12-06.        positive           Romney
Garrison  2012-11-04.       positive          Obama
Garrison  2012-12-05.     negative           Obama")
  • Related