Based on the data and code below, how can I change the value 1/0/1900
to values based on the corresponding row value in the year
column?
Data code:
df = structure(list(year = c("2005", "2004", "ORIG", "ORIG", "2000-2001",
"2000-2003"), date = c("1/0/1900", "1/0/1900", "1/0/1900", "1/0/1900",
"1/0/1900", "1/0/1900")), class = "data.frame", row.names = c(NA,
-6L))
desired = structure(list(year = c("2005", "2004", "ORIG", "ORIG", "2000-2001",
"2000-2001"), date = c("01/01/2005", "01/01/2004", "01/01/2005", "01/01/2005",
"01/01/2000", "01/01/2000")), class = "data.frame", row.names = c(NA,
-6L))
# Current approach replaces every `1/0/1900` to `01/01/2005`
df = df %>% mutate(date = ifelse(date == "1/0/1900",
"01/01/2005",
date))
CodePudding user response:
This feels clunky, perhaps I'm missing something?
df %>%
mutate(
date = paste0(sub("/[^/]*$", "/", date),
substring(if_else(year == "ORIG", first(year), year), 1, 4))
)
# year date
# 1 2005 1/0/2005
# 2 2004 1/0/2004
# 3 ORIG 1/0/2005
# 4 ORIG 1/0/2005
# 5 2000-2001 1/0/2000
# 6 2000-2003 1/0/2000
However, if you are finding /1900
dates in your data, that suggests that a database somewhere had a "null" that was converted into a zero-date like value. Perhaps it should be handled more directly at the data source?
Regex explanation:
/[^/]*$
^ literal '/'
^^^^^ any characters that are not the literal '/'
^ end of string
Essentially: remove from the end of string any non-/
. We could also have used substring(date, 1, nchar(date) - 4)
in place of sub(..)
:
df %>%
mutate(
date = paste0(substring(date, 1, nchar(date) - 4),
substring(if_else(year == "ORIG", first(year), year), 1, 4))
)
CodePudding user response:
We could use case_when
to replace the last 4 digits in 'date' where the 'year' is 'ORIG' to 2005 or replace with substr
ing of year column
library(dplyr)
library(stringr)
df %>%
mutate(date = case_when(year == 'ORIG' ~
str_replace(date, '\\d{4}$', '2005'),
TRUE ~ str_replace(date, '\\d{4}$', substr(year, 1, 4))))
-output
year date
1 2005 1/0/2005
2 2004 1/0/2004
3 ORIG 1/0/2005
4 ORIG 1/0/2005
5 2000-2001 1/0/2000
6 2000-2003 1/0/2000