I have two tables:
one is the "visits" table:
-------- --------- --------
| date | user_id | visits |
-------- --------- --------
| 1/1/18 | A | 2 |
-------- --------- --------
| 1/2/18 | A | 4 |
-------- --------- --------
| 1/3/18 | A | 10 |
-------- --------- --------
| 1/4/18 | A | 34 |
-------- --------- --------
| 1/5/18 | A | 23 |
-------- --------- --------
| 1/1/18 | B | 15 |
-------- --------- --------
| 1/2/18 | B | 12 |
-------- --------- --------
| 1/1/18 | C | 10 |
-------- --------- --------
| 1/1/18 | D | 5 |
-------- --------- --------
| 1/2/18 | D | 12 |
-------- --------- --------
| 1/3/18 | D | 15 |
-------- --------- --------
| 1/4/18 | D | 25 |
-------- --------- --------
| 1/1/18 | E | 18 |
-------- --------- --------
| 1/1/18 | G | 21 |
-------- --------- --------
| 1/2/18 | G | 10 |
-------- --------- --------
Another one is the "location" table:
--------- ----------
| user_id | location |
--------- ----------
| A | 1 |
--------- ----------
| B | 1 |
--------- ----------
| C | 1 |
--------- ----------
| D | 2 |
--------- ----------
| E | 3 |
--------- ----------
| F | 3 |
--------- ----------
| G | 3 |
--------- ----------
Note:
-
- If a user does not visit, he/she will not show up in the "visits" table. His/her visit is 0 that day.
-
- The "location" table has the complete population of users.
Question: I would like to extend the "visits" table, such that it looks like this:
-------- --------- --------
| date | user_id | visits |
-------- --------- --------
| 1/1/18 | A | 2 |
-------- --------- --------
| 1/2/18 | A | 4 |
-------- --------- --------
| 1/3/18 | A | 10 |
-------- --------- --------
| 1/4/18 | A | 34 |
-------- --------- --------
| 1/5/18 | A | 23 |
-------- --------- --------
| 1/1/18 | B | 15 |
-------- --------- --------
| 1/2/18 | B | 12 |
-------- --------- --------
| 1/3/18 | B | 0 |
-------- --------- --------
| 1/4/18 | B | 0 |
-------- --------- --------
| 1/5/18 | B | 0 |
-------- --------- --------
| 1/1/18 | C | 10 |
-------- --------- --------
| 1/2/18 | C | 0 |
-------- --------- --------
| 1/3/18 | C | 0 |
-------- --------- --------
| 1/4/18 | C | 0 |
-------- --------- --------
| 1/5/18 | C | 0 |
-------- --------- --------
| 1/1/18 | D | 5 |
-------- --------- --------
| 1/2/18 | D | 12 |
-------- --------- --------
| 1/3/18 | D | 15 |
-------- --------- --------
| 1/4/18 | D | 25 |
-------- --------- --------
| 1/5/18 | D | 0 |
-------- --------- --------
| 1/1/18 | E | 18 |
-------- --------- --------
| 1/2/18 | E | 0 |
-------- --------- --------
| 1/3/18 | E | 0 |
-------- --------- --------
| 1/4/18 | E | 0 |
-------- --------- --------
| 1/5/18 | E | 0 |
-------- --------- --------
| 1/1/18 | F | 0 |
-------- --------- --------
| 1/2/18 | F | 0 |
-------- --------- --------
| 1/3/18 | F | 0 |
-------- --------- --------
| 1/4/18 | F | 0 |
-------- --------- --------
| 1/5/18 | F | 0 |
-------- --------- --------
| 1/1/18 | G | 21 |
-------- --------- --------
| 1/2/18 | G | 10 |
-------- --------- --------
| 1/3/18 | G | 0 |
-------- --------- --------
| 1/4/18 | G | 0 |
-------- --------- --------
| 1/5/18 | G | 0 |
-------- --------- --------
A table in this format is easier for me to do further analysis with the whole population in one table.
I would like to code this in R, ideally using tidyverse. I can't wrap my head around how to achieve this. Appreciate any insights and help into this. Thanks so much!
CodePudding user response:
We may need complete
here
library(dplyr)
library(tidyr)
visits %>%
complete(date, user_id = location$user_id, fill = list(visits = 0))