I am trying to produce a graph for a categorical variable with three sub-groups, but I would like to strictly present the results for two groups. In Stata, this can be done while producing a graph by adding something like, but I am not sure if there is an R equivalent?
drop if sentiment== "neutral"
Here is the a data example:
dput(head(sample_graph, 5))
(list(sentiment = structure(c(3L, 2L, 4L, NA, 2L), .Label = c("meg",
"negative", "neutral", "positive"), class = "factor"), treatment_announcement = c("pre",
"pre", "pre", "pre", "post"), n = c(78L, 150L, 87L, 1L, 829L),
sentiment_percentage = c(0.246835443037975, 0.474683544303797,
0.275316455696203, 0.00316455696202532, 0.490822972172883
), am = structure(c(2L, 2L, 2L, 2L, 1L), .Label = c("post",
"pre"), class = "factor")), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L), groups = structure(list(
treatment_announcement = c("post", "pre"), .rows = structure(list(
5L, 1:4), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))
I have used this code before, which works well but it drops all observations under this category, but I only want to drop them for visualization purposes, not all rows in the df itself. For instance, after running the code below, my observations declined from 8000 to 6323.
sample_graph<- sample_graph %>%
drop_na() %>%
filter(sentiment != "neutral")
Therefore, I have attempted dropping the specific subgroup within the ggplot itself, but I am facing an error: "Problem with filter()
input ..2
.
i Input ..2
is aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage)
."
ggplot(sample_graph %>% filter(sentiment != "neutral", aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage)))
geom_bar(stat = "identity", position=position_dodge())
scale_fill_grey()
ylab("percentage")
theme(text=element_text(size=20))
scale_fill_manual(values = c("positive" = "green",
"negative" = "red"))
theme(plot.title = element_text(size = 18, face = "bold"))
scale_x_discrete(limits = c("pre", "post"))
theme_bw()
Following Allen's advice below, I tried the following:
twitter_posts |>
drop_na() |>
filter(sentiment != "neutral") |>
select(sentiment, treatment_announcement) |> # we're only interested in sentiment & treatment_announcement
group_by(sentiment) %>% # group data and
add_count(treatment_announcement) |> # add count of treatment_announcement
unique() |> # remove duplicates
ungroup() |> # remove grouping
group_by(treatment_announcement) |> # group by treatment_announcement
mutate(sentiment_percentage = n/sum(n)) |> # ...calculating percentage
mutate(sentiment = as.factor(sentiment)) |> # change to factors so that ggplot treats...
mutate(am = as.factor(treatment_announcement)) |>
twitter_posts (data = teacher_posts, aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage))
geom_bar(stat = "identity", position=position_dodge())
scale_fill_grey()
xlab("Treatment refers to the implementation of the wage subsidy program targeted at jobless teachers")
ylab("percentage")
theme(text=element_text(size=20))
scale_fill_manual(values = c("positive" = "green",
"negative" = "red"))
theme(plot.title = element_text(size = 18, face = "bold"))
scale_x_discrete(limits = c("pre", "post"))
theme_bw()
And I am receiving this error "Mapping should be created with aes()
or aes_()
." although I have the aes mapping for the plot.
CodePudding user response:
You can do some version of this via piping to ggplot or using filter in the data argument
library(tidyverse)
library(palmerpenguins)
penguins <- penguins
penguins |>
drop_na() |>
filter(species != "Adelie") |>
ggplot(aes(x = bill_length_mm, y = body_mass_g))
geom_point()
ggplot(data = filter(penguins,species != "Adelie"), aes(x = bill_length_mm, y = body_mass_g))
geom_point()
#> Warning: Removed 1 rows containing missing values (geom_point).
Created on 2022-07-18 by the reprex package (v2.0.1)
So taking the code you provided it would look something like this
twitter_posts |>
drop_na() |>
filter(sentiment != "neutral") |>
select(sentiment, treatment_announcement) |> # we're only interested in sentiment & treatment_announcement
group_by(sentiment) %>% # group data and
add_count(treatment_announcement) |> # add count of treatment_announcement
unique() |> # remove duplicates
ungroup() |> # remove grouping
group_by(treatment_announcement) |> # group by treatment_announcement
mutate(sentiment_percentage = n/sum(n)) |> # ...calculating percentage
mutate(sentiment = as.factor(sentiment)) |> # change to factors so that ggplot treats...
mutate(am = as.factor(treatment_announcement)) |>
ggplot(aes(x = treatment_announcement, fill = sentiment, y = sentiment_percentage))
geom_bar(stat = "identity", position=position_dodge())
scale_fill_grey()
xlab("Treatment refers to the implementation of the wage subsidy program targeted at jobless teachers")
ylab("percentage")
theme(text=element_text(size=20))
scale_fill_manual(values = c("positive" = "green",
"negative" = "red"))
theme(plot.title = element_text(size = 18, face = "bold"))
scale_x_discrete(limits = c("pre", "post"))
theme_bw()
So you would be doing your data cleaning and then plotting it. Because you are piping it you do not need to include the data argument.
CodePudding user response:
If I were you, I would just create a new dataframe by filtering your original one with
newdataframe <- originaldataframe %>%
filter(variable==)
or something in this style.
From there generating the new graph should be trivial if you already have a working code.
Maybe is not the most polished way to do it, but its fast and effective.
Hope it helps.