Home > Software engineering >  How to change values in 1 column based of date/time range in a different column in R
How to change values in 1 column based of date/time range in a different column in R

Time:03-25

I have a dataframe with a DATE/TIME column and a column with some numeric values. I'd like to change some numeric values to "N/A" based of a range of DATE/TIME they are recorded at.

This is what my dataframe looks like

df = structure(list(Date_Time_GMT_3 = structure(c(1592226000, 1592226900, 
                                                  1592227800, 1592228700, 1592229600, 1592230500), class = c("POSIXct", 
                                                                                                             "POSIXt"), tzone = "EST"), diff_20676892_AIR_X3lh = c(NA, 0.385999999999999, 
                                                                                                                                                                   0.193, 0.290000000000001, 0.385, 0.576000000000001), diff_20819828_B1LH_DOUBLE_CHECK = c(NA, 
                                                                                                                                                                                                                                                            0, 0, 0, 0.0949999999999989, 0)), row.names = c(NA, 6L), class = "data.frame")

I want to change all values for diff_20819828_B1LH_DOUBLE_CHECK to N/A if they are between 2020-06-15 08:30:00 and 2020-06-15 09:00:00

I tried this code

df[df$Date_Time_GMT_3 > "2020-06-15 08:30:00"| < "2020-06-15 09:00:00"] = "NA"

but to no surprise this doesn't work. How can I fix this?

CodePudding user response:

Your date column is in "EST", so you can do this:

df[df$Date_Time_GMT_3 > as.POSIXct("2020-06-15 08:30:00", tz="EST") &
     df$Date_Time_GMT_3 < as.POSIXct("2020-06-15 09:00:00", tz="EST"),3] <- NA

      Date_Time_GMT_3 diff_20676892_AIR_X3lh diff_20819828_B1LH_DOUBLE_CHECK
1 2020-06-15 08:00:00                     NA                              NA
2 2020-06-15 08:15:00                  0.386                           0.000
3 2020-06-15 08:30:00                  0.193                           0.000
4 2020-06-15 08:45:00                  0.290                              NA
5 2020-06-15 09:00:00                  0.385                           0.095
6 2020-06-15 09:15:00                  0.576                           0.000

Note that there is only one row between those times, row 4, and above changes the value(s) in the 3rd column for such row(s) to NA

CodePudding user response:

Your base R code isn't working because

  1. You didn't specify which column's values should be changed
  2. You're using an | instead of an &
  3. After a logical operator you need to repeat which vector to assess
  4. You're not telling R that those strings are date-times.

Langtang's solution is very neat. Another option using dplyr and lubridate is:

library(dplyr)
library(lubridate)
df %>% mutate(diff_20819828_B1LH_DOUBLE_CHECK = na_if(
  diff_20819828_B1LH_DOUBLE_CHECK,
  Date_Time_GMT_3 %within% interval("2020-06-15 08:30:00", "2020-06-15 09:00:00")
))
  • Related