Stata drop if equivalent for string variable in ggplot (R)-CodePudding

I am trying to produce a graph for a categorical variable with three sub-groups, but I would like to strictly present the results for two groups. In Stata, this can be done while producing a graph by adding something like, but I am not sure if there is an R equivalent?

drop if sentiment== "neutral"

Here is the a data example:

dput(head(sample_graph, 5))
(list(sentiment = structure(c(3L, 2L, 4L, NA, 2L), .Label = c("meg", 
"negative", "neutral", "positive"), class = "factor"), treatment_announcement = c("pre", 
"pre", "pre", "pre", "post"), n = c(78L, 150L, 87L, 1L, 829L), 
    sentiment_percentage = c(0.246835443037975, 0.474683544303797, 
    0.275316455696203, 0.00316455696202532, 0.490822972172883
    ), am = structure(c(2L, 2L, 2L, 2L, 1L), .Label = c("post", 
    "pre"), class = "factor")), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L), groups = structure(list(
    treatment_announcement = c("post", "pre"), .rows = structure(list(
        5L, 1:4), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))

I have used this code before, which works well but it drops all observations under this category, but I only want to drop them for visualization purposes, not all rows in the df itself. For instance, after running the code below, my observations declined from 8000 to 6323.

sample_graph<- sample_graph %>%
   drop_na() %>% 
   filter(sentiment != "neutral")

Therefore, I have attempted dropping the specific subgroup within the ggplot itself, but I am facing an error: "Problem with filter() input ..2. i Input ..2 is aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage)."

ggplot(sample_graph %>% filter(sentiment != "neutral", aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage)))  
    geom_bar(stat = "identity", position=position_dodge())   
    scale_fill_grey()  
    ylab("percentage")   
    theme(text=element_text(size=20))   
  scale_fill_manual(values = c("positive" = "green", 
                               "negative" = "red"))  
    theme(plot.title = element_text(size = 18, face = "bold"))  
    scale_x_discrete(limits = c("pre", "post"))  
  theme_bw()

Following Allen's advice below, I tried the following:

twitter_posts |> 
  drop_na() |> 
  filter(sentiment != "neutral") |> 
  select(sentiment, treatment_announcement) |>     # we're only interested in sentiment & treatment_announcement
    group_by(sentiment) %>%   # group data and
    add_count(treatment_announcement) |>       # add count of treatment_announcement
    unique() |>           # remove duplicates
    ungroup() |>            # remove grouping
    group_by(treatment_announcement) |>        # group by treatment_announcement 
    mutate(sentiment_percentage = n/sum(n)) |>  # ...calculating percentage
    mutate(sentiment = as.factor(sentiment)) |>        # change to factors so that ggplot treats...
    mutate(am = as.factor(treatment_announcement))  |>   
  twitter_posts (data = teacher_posts, aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage))  
    geom_bar(stat = "identity", position=position_dodge())   
    scale_fill_grey()  
    xlab("Treatment refers to the implementation of the wage subsidy program targeted at jobless teachers")   
    ylab("percentage")   
    theme(text=element_text(size=20))   
  scale_fill_manual(values = c("positive" = "green", 
                               "negative" = "red"))  
    theme(plot.title = element_text(size = 18, face = "bold"))  
    scale_x_discrete(limits = c("pre", "post"))  
  theme_bw()

And I am receiving this error "Mapping should be created with aes() or aes_()." although I have the aes mapping for the plot.

CodePudding user response：

You can do some version of this via piping to ggplot or using filter in the data argument

library(tidyverse)
library(palmerpenguins)

penguins <- penguins 



penguins |> 
  drop_na() |> 
  filter(species != "Adelie") |> 
  ggplot(aes(x = bill_length_mm, y = body_mass_g))  
  geom_point()



ggplot(data = filter(penguins,species != "Adelie"), aes(x = bill_length_mm, y = body_mass_g))  
  geom_point()
#> Warning: Removed 1 rows containing missing values (geom_point).

^{Created on 2022-07-18 by the reprex package (v2.0.1)}

So taking the code you provided it would look something like this

twitter_posts |> 
  drop_na() |> 
  filter(sentiment != "neutral") |> 
  select(sentiment, treatment_announcement) |>     # we're only interested in sentiment & treatment_announcement
    group_by(sentiment) %>%   # group data and
    add_count(treatment_announcement) |>       # add count of treatment_announcement
    unique() |>           # remove duplicates
    ungroup() |>            # remove grouping
    group_by(treatment_announcement) |>        # group by treatment_announcement 
    mutate(sentiment_percentage = n/sum(n)) |>  # ...calculating percentage
    mutate(sentiment = as.factor(sentiment)) |>        # change to factors so that ggplot treats...
    mutate(am = as.factor(treatment_announcement))  |>   
  ggplot(aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage))  
    geom_bar(stat = "identity", position=position_dodge())   
    scale_fill_grey()  
    xlab("Treatment refers to the implementation of the wage subsidy program targeted at jobless teachers")   
    ylab("percentage")   
    theme(text=element_text(size=20))   
  scale_fill_manual(values = c("positive" = "green", 
                               "negative" = "red"))  
    theme(plot.title = element_text(size = 18, face = "bold"))  
    scale_x_discrete(limits = c("pre", "post"))  
  theme_bw()

So you would be doing your data cleaning and then plotting it. Because you are piping it you do not need to include the data argument.

CodePudding user response：

If I were you, I would just create a new dataframe by filtering your original one with

newdataframe <- originaldataframe  %>% 
filter(variable==)

or something in this style.

From there generating the new graph should be trivial if you already have a working code.

Maybe is not the most polished way to do it, but its fast and effective.

Hope it helps.