I have a dataset that has values for Y and X variables, for certain cities, and for different dates. But the cities don't necessarily have values for all the dates, which is a problem for me.
date | location | X | Y |
---|---|---|---|
2018-10-12 | PARIS | 10 | 20 |
2018-10-12 | NEW YORK | 15 | 20 |
2018-10-13 | PARIS | 16 | 8 |
2018-10-13 | LOS ANGELES | 16 | 8 |
So I want, for each date, a value per city of Y and X like this :
date | location | X | Y |
---|---|---|---|
2018-10-12 | PARIS | 10 | 20 |
2018-10-12 | NEW YORK | 15 | 20 |
2018-10-12 | LOS ANGELES | NA | NA |
2018-10-13 | PARIS | 16 | 8 |
2018-10-13 | LOS ANGELES | 16 | 8 |
2018-10-13 | NEW YORK | NA | NA |
I tried isolating unique dates in a new dataframe and using the 'left_join' function of dplyr package but it doesn't work... How can i achieve this ?
Thanks for your help!
CodePudding user response:
You can use expand
from the tidyr
package to get all possible combinations of date
and location
, and then left_join
to your existing data frame:
cities |>
expand(date, location) |>
left_join(cities)
# Joining, by = c("date", "location")
# # A tibble: 6 x 4
# date location X Y
# <chr> <chr> <int> <int>
# 1 2018-10-12 LOS ANGELES NA NA
# 2 2018-10-12 NEW YORK 15 20
# 3 2018-10-12 PARIS 10 20
# 4 2018-10-13 LOS ANGELES 16 8
# 5 2018-10-13 NEW YORK NA NA
# 6 2018-10-13 PARIS 16 8