return values existing in other rows based on column match-CodePudding

I have the following dataframe:

df <- data.frame(Business_Date = c("01/01/2018", "01/01/2019", "01/01/2020"), Business_Date_2019 = c("01/01/2019","01/01/2019","01/01/2019"), Television = c("A","B","C"))

And I would like to create a fourth column based which contains the Television for that specific date in 2019:

desired_output <- data.frame(Business_Date = c("01/01/2018", "01/01/2019", "01/01/2020"), Business_Date_2019 = c("01/01/2019","01/01/2019","01/01/2019"), Television = c("A","B","C"),
                             Television_2019 = c("B","B","B"))

However, when I write:

df$Television_2019 <- case_when(
  df$Business_Date == df$Business_Date_2019 ~ df$Television,
  TRUE ~ NA_character)

It doesn't return the desired_output:

  Business_Date Business_Date_2019 Television Television_2019
1    01/01/2018         01/01/2019          A            <NA>
2    01/01/2019         01/01/2019          B               B
3    01/01/2020         01/01/2019          C            <NA>

As row 1 and row 3 also have in the second column "01/01/2019", I would like to also return "B" in the fourth column for those rows.

How to write a script that returns the desired_output from df?

CodePudding user response：

You can accomplish this by creating a named vector to use as a lookup table:

tv_lookup <- setNames(df$Television, df$Business_Date)
df$Television_2019 <- tv_lookup[df$Business_Date_2019]

df
#   Business_Date Business_Date_2019 Television Television_2019
# 1    01/01/2018         01/01/2019          A               B
# 2    01/01/2019         01/01/2019          B               B
# 3    01/01/2020         01/01/2019          C               B

The reason case_when(df$Business_Date == df$Business_Date_2019) doesn't give you what you want is that it compares each element in Business_Date_2019 to the element in the same position in Business_Date. With your example data, this results in:

"01/01/2018" == "01/01/2019"  # FALSE, therefore <NA>
"01/01/2019" == "01/01/2019"  # TRUE, therefore "B"
"01/01/2020" == "01/01/2019"  # FALSE, therefore <NA>