R Version:4.2.2
R Studio Version:2022.07.2 Build 576
Windows Version: Windows 11 Home, 22H2
I'm using the data "flights" in "nycflights13" package. I add 2 variables "cancel_status" and "sche_dep_exact_time", and trying to make a plot with geom_boxplot, but R always automatically remove the cases that cancel_status is "Y". I don't know why, please help me out, thank you very much!
I have checked the posts that with the same "removed rows containing non-finite values" problem, every post suggest that there's a limit in the code, so the r will remove the cases out of the limit, but I didn't see any limit in my case.
Code to add Variable:
`flights_cancel_status <- flights %>%
mutate(cancel_status = ifelse(is.na(dep_time), "Y", "N"),
sche_dep_hour = dep_time %/% 100,
sche_dep_min = dep_time %% 100,
sche_dep_exact_time = sche_dep_hour sche_dep_min / 60)`
Code to plot:
`ggplot(data = flights_cancel_status)
geom_boxplot(mapping = aes(x = sche_dep_exact_time,
y = cancel_status))`
The Error Message:
Warning message:
Removed 8255 rows containing non-finite values (stat_boxplot()
).
I need a plot that sche_dep_exact_time is on the Aex X, and cancel_status on the Aex Y.
CodePudding user response:
You define cancel_status
as ifelse(is.na(dep_time), "Y", "N")
. In other words, when cancel_status
is "Y"
, dep_time
values were NA
. So all your calculations based on dep_time
are NA
for entries where cancel_status == "Y"
, including sche_dep_exact_time
.
ggplot2
removes NA
entries, and this is why you get no sche_dep_exact_time
boxplot for cancel_status == "Y"
.