Home > Mobile >  change repeating row value based on values on another column
change repeating row value based on values on another column

Time:09-28

Based on the data and code below, how can I change the value 1/0/1900 to values based on the corresponding row value in the year column?

Data code:

df = structure(list(year = c("2005", "2004", "ORIG", "ORIG", "2000-2001", 
"2000-2003"), date = c("1/0/1900", "1/0/1900", "1/0/1900", "1/0/1900", 
"1/0/1900", "1/0/1900")), class = "data.frame", row.names = c(NA, 
-6L))

desired = structure(list(year = c("2005", "2004", "ORIG", "ORIG", "2000-2001", 
"2000-2001"), date = c("01/01/2005", "01/01/2004", "01/01/2005", "01/01/2005", 
"01/01/2000", "01/01/2000")), class = "data.frame", row.names = c(NA, 
-6L))

# Current approach replaces every `1/0/1900` to `01/01/2005`
df = df %>% mutate(date = ifelse(date == "1/0/1900", 
                                        "01/01/2005", 
                                        date))

CodePudding user response:

This feels clunky, perhaps I'm missing something?

df %>%
  mutate(
    date = paste0(sub("/[^/]*$", "/", date),
                  substring(if_else(year == "ORIG", first(year), year), 1, 4))
  )
#        year     date
# 1      2005 1/0/2005
# 2      2004 1/0/2004
# 3      ORIG 1/0/2005
# 4      ORIG 1/0/2005
# 5 2000-2001 1/0/2000
# 6 2000-2003 1/0/2000

However, if you are finding /1900 dates in your data, that suggests that a database somewhere had a "null" that was converted into a zero-date like value. Perhaps it should be handled more directly at the data source?

Regex explanation:

/[^/]*$
^         literal '/'
 ^^^^^    any characters that are not the literal '/'
      ^   end of string

Essentially: remove from the end of string any non-/. We could also have used substring(date, 1, nchar(date) - 4) in place of sub(..):

df %>%
  mutate(
    date = paste0(substring(date, 1, nchar(date) - 4),
                  substring(if_else(year == "ORIG", first(year), year), 1, 4))
  )

CodePudding user response:

We could use case_when to replace the last 4 digits in 'date' where the 'year' is 'ORIG' to 2005 or replace with substring of year column

library(dplyr)
library(stringr)
df %>% 
  mutate(date = case_when(year == 'ORIG' ~ 
    str_replace(date, '\\d{4}$', '2005'), 
    TRUE ~ str_replace(date, '\\d{4}$', substr(year, 1, 4))))

-output

 year     date
1      2005 1/0/2005
2      2004 1/0/2004
3      ORIG 1/0/2005
4      ORIG 1/0/2005
5 2000-2001 1/0/2000
6 2000-2003 1/0/2000
  • Related