I have a dataset and am trying to recode a date column. The date column has 1900-01-01. I want to change this to missing. Otherwise, I want it to contain the original date.
in SAS I would just do something like:
if date = '1900-01-01' then cleandate = ""; else cleandate = date.
Sample Data:
df <-data.frame(
id = c(1,2,3),
date = as.Date(c("1900-01-01", "1984-01-01", "1900-01-01")
))
desired outcome:
id | date |
---|---|
1 | NA (or however R assigns missing data so it won't be included in calculations) |
2 | 1984-01-01 |
3 | NA (or however R assigns missing data so it won't be included in calculations) |
CodePudding user response:
In R, take a copy of the existing date column, and then replace the values. Just be sure you specify the date in the YYYY-MM-DD default format.
df$cleandate <- df$date
df$cleandate[df$cleandate == '1900-01-01'] <- NA
df
# id date cleandate
#1 1 1900-01-01 <NA>
#2 2 1984-01-01 1984-01-01
#3 3 1900-01-01 <NA>
Which can also be accomplished in one step using replace
:
df$cleandate <- replace(df$date, df$date == '1900-01-01', NA)
df
# id date cleandate
#1 1 1900-01-01 <NA>
#2 2 1984-01-01 1984-01-01
#3 3 1900-01-01 <NA>
You could of course skip the copy and just overwrite the values in date
if that is preferred too:
df$date[df$date == '1900-01-01'] <- NA
df
# id date
#1 1 <NA>
#2 2 1984-01-01
#3 3 <NA>
CodePudding user response:
1) Create a copy of df
, df2
, so that we can preserve the input and then use the is.na<-
replacement function. This places NA's onto the LHS where TRUE values are on the RHS.
df2 <- df
is.na(df2$date) <- df2$date == "1900-01-01"
1a) or with transform
:
transform(df, date = `is.na<-`(date, date == "1900-01-01"))
2) We can use if_na
in dplyr:
library(dplyr)
df %>% mutate(date = na_if(date, as.Date("1900-01-01")))