I have a project at work where I am trying to plot data on movies.
My goal is to plot 'definite interest' & 'total awareness' both on the Y-axis and 'estimated admissions' on the X-axis. I will be using ggplot2 and will have both Y- values in different colors.
The main issue I am having is filtering the movies by 'year' and 'window' in ggplot. The window I want is T-0 and the year of release should be 2018. Since I only know how to implement values on the X- and Y-Axis in ggplot without conditions, I could use some guidance.
How should I go about filtering the data and plotting the X and Y's? The excel sheet is attached. Also, I understand the code may not be correct, but I come from a Java background, so no idea what I am doing.
If my_data$window = T-0 {
my_data$window <- TRUE
} else {
FALSE
}
I expected to make 'window' true if T-0 = TRUE. I attempted the same for 'year'
x <- my_data$estimatedAdmissions
y1 <- my_data$definiteInterest
y2 <- my_data$totalAwareness
plot(x, y1, y2, filter 1, filter 2)
I expect to plot those values and will change y1 & y2 to different colors later.
CodePudding user response:
Please provide a reproducible example next time - we can't copy your screenshot into R, so I had to recreate your dataset myself like so.
dat <- data.frame(year = 2018,
title = rep(c("a", "b"), each = 6),
estimatedAdmissions = rep(c(287351, 29518), each = 6),
window = rep(c("T 2", "T 1", "T-0", "T-1", "T-2", "T-3"), 2),
definiteInterest = c(13, 15, 25, 22, 16, 27, 15, 14, 23, 20, 28, 29),
totalAwareness = round(runif(4, 31, n = 12)))
You can use subset
to filter rows, and then ggplot
with geom_point
to create a scatterplot. Ggplot works best with data in long format, so I reshaped your data longer with pivot_longer()
from tidyr
first.
library(tidyr)
library(ggplot2)
dat |> subset(year == 2018 & window == "T-0") |>
pivot_longer(cols = c(definiteInterest, totalAwareness), names_to = "measure", values_to = "percentage") |>
ggplot(aes(x = estimatedAdmissions, y = percentage, colour = measure))
geom_point()