Home > OS >  Stata drop if equivalent for string variable in ggplot (R)
Stata drop if equivalent for string variable in ggplot (R)

Time:07-19

I am trying to produce a graph for a categorical variable with three sub-groups, but I would like to strictly present the results for two groups. In Stata, this can be done while producing a graph by adding something like, but I am not sure if there is an R equivalent?

drop if sentiment== "neutral"

Here is the a data example:

dput(head(sample_graph, 5))
(list(sentiment = structure(c(3L, 2L, 4L, NA, 2L), .Label = c("meg", 
"negative", "neutral", "positive"), class = "factor"), treatment_announcement = c("pre", 
"pre", "pre", "pre", "post"), n = c(78L, 150L, 87L, 1L, 829L), 
    sentiment_percentage = c(0.246835443037975, 0.474683544303797, 
    0.275316455696203, 0.00316455696202532, 0.490822972172883
    ), am = structure(c(2L, 2L, 2L, 2L, 1L), .Label = c("post", 
    "pre"), class = "factor")), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L), groups = structure(list(
    treatment_announcement = c("post", "pre"), .rows = structure(list(
        5L, 1:4), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))

I have used this code before, which works well but it drops all observations under this category, but I only want to drop them for visualization purposes, not all rows in the df itself. For instance, after running the code below, my observations declined from 8000 to 6323.

sample_graph<- sample_graph %>%
   drop_na() %>% 
   filter(sentiment != "neutral")

Therefore, I have attempted dropping the specific subgroup within the ggplot itself, but I am facing an error: "Problem with filter() input ..2. i Input ..2 is aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage)."

ggplot(sample_graph %>% filter(sentiment != "neutral", aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage)))  
    geom_bar(stat = "identity", position=position_dodge())   
    scale_fill_grey()  
    ylab("percentage")   
    theme(text=element_text(size=20))   
  scale_fill_manual(values = c("positive" = "green", 
                               "negative" = "red"))  
    theme(plot.title = element_text(size = 18, face = "bold"))  
    scale_x_discrete(limits = c("pre", "post"))  
  theme_bw()

Following Allen's advice below, I tried the following:

twitter_posts |> 
  drop_na() |> 
  filter(sentiment != "neutral") |> 
  select(sentiment, treatment_announcement) |>     # we're only interested in sentiment & treatment_announcement
    group_by(sentiment) %>%   # group data and
    add_count(treatment_announcement) |>       # add count of treatment_announcement
    unique() |>           # remove duplicates
    ungroup() |>            # remove grouping
    group_by(treatment_announcement) |>        # group by treatment_announcement 
    mutate(sentiment_percentage = n/sum(n)) |>  # ...calculating percentage
    mutate(sentiment = as.factor(sentiment)) |>        # change to factors so that ggplot treats...
    mutate(am = as.factor(treatment_announcement))  |>   
  twitter_posts (data = teacher_posts, aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage))  
    geom_bar(stat = "identity", position=position_dodge())   
    scale_fill_grey()  
    xlab("Treatment refers to the implementation of the wage subsidy program targeted at jobless teachers")   
    ylab("percentage")   
    theme(text=element_text(size=20))   
  scale_fill_manual(values = c("positive" = "green", 
                               "negative" = "red"))  
    theme(plot.title = element_text(size = 18, face = "bold"))  
    scale_x_discrete(limits = c("pre", "post"))  
  theme_bw()

And I am receiving this error "Mapping should be created with aes() or aes_()." although I have the aes mapping for the plot.

CodePudding user response:

You can do some version of this via piping to ggplot or using filter in the data argument

library(tidyverse)
library(palmerpenguins)

penguins <- penguins 



penguins |> 
  drop_na() |> 
  filter(species != "Adelie") |> 
  ggplot(aes(x = bill_length_mm, y = body_mass_g))  
  geom_point()



ggplot(data = filter(penguins,species != "Adelie"), aes(x = bill_length_mm, y = body_mass_g))  
  geom_point()
#> Warning: Removed 1 rows containing missing values (geom_point).

Created on 2022-07-18 by the reprex package (v2.0.1)

So taking the code you provided it would look something like this

twitter_posts |> 
  drop_na() |> 
  filter(sentiment != "neutral") |> 
  select(sentiment, treatment_announcement) |>     # we're only interested in sentiment & treatment_announcement
    group_by(sentiment) %>%   # group data and
    add_count(treatment_announcement) |>       # add count of treatment_announcement
    unique() |>           # remove duplicates
    ungroup() |>            # remove grouping
    group_by(treatment_announcement) |>        # group by treatment_announcement 
    mutate(sentiment_percentage = n/sum(n)) |>  # ...calculating percentage
    mutate(sentiment = as.factor(sentiment)) |>        # change to factors so that ggplot treats...
    mutate(am = as.factor(treatment_announcement))  |>   
  ggplot(aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage))  
    geom_bar(stat = "identity", position=position_dodge())   
    scale_fill_grey()  
    xlab("Treatment refers to the implementation of the wage subsidy program targeted at jobless teachers")   
    ylab("percentage")   
    theme(text=element_text(size=20))   
  scale_fill_manual(values = c("positive" = "green", 
                               "negative" = "red"))  
    theme(plot.title = element_text(size = 18, face = "bold"))  
    scale_x_discrete(limits = c("pre", "post"))  
  theme_bw()


So you would be doing your data cleaning and then plotting it. Because you are piping it you do not need to include the data argument.

CodePudding user response:

If I were you, I would just create a new dataframe by filtering your original one with

newdataframe <- originaldataframe  %>% 
filter(variable==)

or something in this style.

From there generating the new graph should be trivial if you already have a working code.

Maybe is not the most polished way to do it, but its fast and effective.

Hope it helps.

  • Related