Here's an ice cream shop data frame in R, that shows the flavors at a store month to month.
df <- data.frame(date = as.Date(c(rep("2022-01-01", 3),
rep("2022-02-01", 3),
rep("2022-03-01", 4))),
flavor = c("Almond", "Apple", "Apricot",
"Almond", "Maple", "Mint",
"Almond", "Maple", "Mint", "Pumpkin"))
#> date flavor
#> 1 2022-01-01 Almond
#> 2 2022-01-01 Apple
#> 3 2022-01-01 Apricot
#> 4 2022-02-01 Almond
#> 5 2022-02-01 Maple
#> 6 2022-02-01 Mint
#> 7 2022-03-01 Almond
#> 8 2022-03-01 Maple
#> 9 2022-03-01 Mint
#> 10 2022-03-01 Pumpkin
I created a script to show the ice cream flavors added for any particular month. You can also see flavors removed, as a list. Note that there were not any flavors removed in March (2022-03-01
, flavors.removed
equals <chr [0]>
).
library(dplyr)
library(tidyr)
df %>%
group_by(date) %>%
summarize(flavors = list(flavor)) %>%
mutate(flavors.added = mapply(setdiff, flavors, lag(flavors)),
flavors.removed = mapply(setdiff, lag(flavors), flavors)) %>%
ungroup %>%
select(-flavors) %>%
unnest_longer(flavors.added)
#> # A tibble: 6 x 3
#> date flavors.added flavors.removed
#> <date> <chr> <list>
#> 1 2022-01-01 Almond <NULL>
#> 2 2022-01-01 Apple <NULL>
#> 3 2022-01-01 Apricot <NULL>
#> 4 2022-02-01 Maple <chr [2]>
#> 5 2022-02-01 Mint <chr [2]>
#> 6 2022-03-01 Pumpkin <chr [0]>
When I attempt to capture information on the flavors removed via a call to unnest_longer(flavors.removed)
I end up inadvertently filtering out all information for 2022-03-01
because the flavors.removed
list is empty (<chr [0]>
) for this 2022-03-01
time period.
library(dplyr)
library(tidyr)
df %>%
group_by(date) %>%
summarize(flavors = list(flavor)) %>%
mutate(flavors.added = mapply(setdiff, flavors, lag(flavors)),
flavors.removed = mapply(setdiff, lag(flavors), flavors)) %>%
ungroup %>%
select(-flavors) %>%
unnest_longer(flavors.added) %>%
unnest_longer(flavors.removed) %>%
pivot_longer(-date, names_to = "type", values_to = "flavor") %>%
arrange(date, type) %>%
unique()
#> # A tibble: 8 x 3
#> date type flavor
#> <date> <chr> <chr>
#> 1 2022-01-01 flavors.added Almond
#> 2 2022-01-01 flavors.added Apple
#> 3 2022-01-01 flavors.added Apricot
#> 4 2022-01-01 flavors.removed NA
#> 5 2022-02-01 flavors.added Maple
#> 6 2022-02-01 flavors.added Mint
#> 7 2022-02-01 flavors.removed Apple
#> 8 2022-02-01 flavors.removed Apricot
Is there a better approach to individually identify flavors added and removed, month to month? I need to regain the ninth row as shown below, that gets filtered out utilizing my flawed approach.
#> # A tibble: 9 x 3
#> date type flavor
#> <date> <chr> <chr>
#> 1 2022-01-01 flavors.added Almond
#> 2 2022-01-01 flavors.added Apple
#> 3 2022-01-01 flavors.added Apricot
#> 4 2022-01-01 flavors.removed NA
#> 5 2022-02-01 flavors.added Maple
#> 6 2022-02-01 flavors.added Mint
#> 7 2022-02-01 flavors.removed Apple
#> 8 2022-02-01 flavors.removed Apricot
#> 9 2022-03-01 flavors.added Pumpkin
CodePudding user response:
A possible solution if you don't need the NA
on row 4:
df %>%
group_by(date) %>%
summarize(flavors = list(flavor)) %>%
mutate(flavors.added = mapply(setdiff, flavors, lag(flavors)),
flavors.removed = mapply(setdiff, lag(flavors), flavors)) %>%
ungroup %>%
select(-flavors) %>%
pivot_longer(-date, names_to = "type", values_to = "flavor") %>%
unnest(flavor)
# A tibble: 8 × 3
date type flavor
<date> <chr> <chr>
1 2022-01-01 flavors.added Almond
2 2022-01-01 flavors.added Apple
3 2022-01-01 flavors.added Apricot
4 2022-02-01 flavors.added Maple
5 2022-02-01 flavors.added Mint
6 2022-02-01 flavors.removed Apple
7 2022-02-01 flavors.removed Apricot
8 2022-03-01 flavors.added Pumpkin
CodePudding user response:
I find it more straightforward to find the added and removed flavors separately, and then bind them together at the end if you need to.
You can use tidyr::unchop(keep_empty = TRUE)
to keep from dropping the empty rows in this case.
library(tidyverse)
df <- tibble(
date = as.Date(c(
rep("2022-01-01", 3),
rep("2022-02-01", 3),
rep("2022-03-01", 4)
)),
flavor = c(
"Almond", "Apple", "Apricot",
"Almond", "Maple", "Mint",
"Almond", "Maple", "Mint", "Pumpkin"
)
)
df
#> # A tibble: 10 × 2
#> date flavor
#> <date> <chr>
#> 1 2022-01-01 Almond
#> 2 2022-01-01 Apple
#> 3 2022-01-01 Apricot
#> 4 2022-02-01 Almond
#> 5 2022-02-01 Maple
#> 6 2022-02-01 Mint
#> 7 2022-03-01 Almond
#> 8 2022-03-01 Maple
#> 9 2022-03-01 Mint
#> 10 2022-03-01 Pumpkin
flavors <- df %>%
group_by(date) %>%
summarize(flavors = list(flavor)) %>%
ungroup()
flavors
#> # A tibble: 3 × 2
#> date flavors
#> <date> <list>
#> 1 2022-01-01 <chr [3]>
#> 2 2022-02-01 <chr [3]>
#> 3 2022-03-01 <chr [4]>
# Find added flavors
added <- flavors %>%
mutate(added = mapply(setdiff, flavors, lag(flavors)), .keep = "unused") %>%
unchop(added, keep_empty = TRUE) %>%
pivot_longer(added, names_to = "type", values_to = "flavor")
# Find removed flavors
removed <- flavors %>%
mutate(removed = mapply(setdiff, lag(flavors), flavors), .keep = "unused") %>%
unchop(removed, keep_empty = TRUE) %>%
pivot_longer(removed, names_to = "type", values_to = "flavor")
added
#> # A tibble: 6 × 3
#> date type flavor
#> <date> <chr> <chr>
#> 1 2022-01-01 added Almond
#> 2 2022-01-01 added Apple
#> 3 2022-01-01 added Apricot
#> 4 2022-02-01 added Maple
#> 5 2022-02-01 added Mint
#> 6 2022-03-01 added Pumpkin
removed
#> # A tibble: 4 × 3
#> date type flavor
#> <date> <chr> <chr>
#> 1 2022-01-01 removed <NA>
#> 2 2022-02-01 removed Apple
#> 3 2022-02-01 removed Apricot
#> 4 2022-03-01 removed <NA>
bind_rows(added, removed) %>%
arrange(date, type)
#> # A tibble: 10 × 3
#> date type flavor
#> <date> <chr> <chr>
#> 1 2022-01-01 added Almond
#> 2 2022-01-01 added Apple
#> 3 2022-01-01 added Apricot
#> 4 2022-01-01 removed <NA>
#> 5 2022-02-01 added Maple
#> 6 2022-02-01 added Mint
#> 7 2022-02-01 removed Apple
#> 8 2022-02-01 removed Apricot
#> 9 2022-03-01 added Pumpkin
#> 10 2022-03-01 removed <NA>
Created on 2022-06-02 by the reprex package (v2.0.1)