I have the following dataframe:
df <- data.frame(Business_Date = c("01/01/2018", "01/01/2019", "01/01/2020"), Business_Date_2019 = c("01/01/2019","01/01/2019","01/01/2019"), Television = c("A","B","C"))
And I would like to create a fourth column based which contains the Television for that specific date in 2019:
desired_output <- data.frame(Business_Date = c("01/01/2018", "01/01/2019", "01/01/2020"), Business_Date_2019 = c("01/01/2019","01/01/2019","01/01/2019"), Television = c("A","B","C"),
Television_2019 = c("B","B","B"))
However, when I write:
df$Television_2019 <- case_when(
df$Business_Date == df$Business_Date_2019 ~ df$Television,
TRUE ~ NA_character)
It doesn't return the desired_output:
Business_Date Business_Date_2019 Television Television_2019
1 01/01/2018 01/01/2019 A <NA>
2 01/01/2019 01/01/2019 B B
3 01/01/2020 01/01/2019 C <NA>
As row 1 and row 3 also have in the second column "01/01/2019"
, I would like to also return "B"
in the fourth column for those rows.
How to write a script that returns the desired_output from df?
CodePudding user response:
You can accomplish this by creating a named vector to use as a lookup table:
tv_lookup <- setNames(df$Television, df$Business_Date)
df$Television_2019 <- tv_lookup[df$Business_Date_2019]
df
# Business_Date Business_Date_2019 Television Television_2019
# 1 01/01/2018 01/01/2019 A B
# 2 01/01/2019 01/01/2019 B B
# 3 01/01/2020 01/01/2019 C B
The reason case_when(df$Business_Date == df$Business_Date_2019)
doesn't give you what you want is that it compares each element in Business_Date_2019
to the element in the same position in Business_Date
. With your example data, this results in:
"01/01/2018" == "01/01/2019" # FALSE, therefore <NA>
"01/01/2019" == "01/01/2019" # TRUE, therefore "B"
"01/01/2020" == "01/01/2019" # FALSE, therefore <NA>