Home > Software design >  R ggplot2 Warning Message: Removed rows containing non-finite values
R ggplot2 Warning Message: Removed rows containing non-finite values

Time:11-13

R Version:4.2.2

R Studio Version:2022.07.2 Build 576

Windows Version: Windows 11 Home, 22H2

I'm using the data "flights" in "nycflights13" package. I add 2 variables "cancel_status" and "sche_dep_exact_time", and trying to make a plot with geom_boxplot, but R always automatically remove the cases that cancel_status is "Y". I don't know why, please help me out, thank you very much!

I have checked the posts that with the same "removed rows containing non-finite values" problem, every post suggest that there's a limit in the code, so the r will remove the cases out of the limit, but I didn't see any limit in my case.

Code to add Variable:

`flights_cancel_status <- flights %>%
  mutate(cancel_status = ifelse(is.na(dep_time), "Y", "N"), 
         sche_dep_hour = dep_time %/% 100,
         sche_dep_min = dep_time %% 100,
         sche_dep_exact_time = sche_dep_hour   sche_dep_min / 60)`

Code to plot:

`ggplot(data = flights_cancel_status)  
  geom_boxplot(mapping = aes(x = sche_dep_exact_time,
                             y = cancel_status))`

The Error Message:

Warning message: Removed 8255 rows containing non-finite values (stat_boxplot()).

I need a plot that sche_dep_exact_time is on the Aex X, and cancel_status on the Aex Y.

CodePudding user response:

You define cancel_status as ifelse(is.na(dep_time), "Y", "N"). In other words, when cancel_status is "Y", dep_time values were NA. So all your calculations based on dep_time are NA for entries where cancel_status == "Y", including sche_dep_exact_time.

ggplot2 removes NA entries, and this is why you get no sche_dep_exact_time boxplot for cancel_status == "Y".

  • Related