I have been so frustrated to count the number of clients who made conversion or not in the 3 single categories (app, desktop, web) scenarios and the rest of the overlapping categories (app & Web, app & desktop, web & desktop, app & web & desktop) scenarios. Here is the sample dataset I am working on.
I could figure out the single category count with the aggregate and group_by function in r, however, I can't really figure out how to work on the overlap categories.
Really really thanks so much if someone could help me on this!!! Thanks!!!
df <- data.frame(list(ClientID = c("1", "1", "1", "2", "2", "3", "3" , "3" , "3" , "4" ),
device = c("App", "Web", "App", "Web", "Web", "App", "Desktop", "App", "App", "Web"),
conversion = c("0", "0", "0", "0", "1", "1", "0", "1", "0", "1")) )
Below is the desired outcome:
Scenario With Conversion Without Conversion
App
Web
Desktop
App & Web
App & Desktop
Web & Desktop
App & Desktop & Web
CodePudding user response:
You can prepare a list of device combinations (i.e. all 7 possibilities), and then use setequal()
and unique()
in a helper function, like this
- list of device combinations, using
combn()
device_combinations = unlist(
sapply(1:3, \(i) combn(c("App", "Web" ,"Desktop"), i, simplify = F)),
recursive = F
)
- Helper function,
f
(takes a list of devices, and indicates whether the unique list of devices meets each of the device combinations (it can meet at max one only, by usingunique()
andsetequal()
f <- function(d) sapply(device_combinations, \(ds) setequal(ds, unique(d)))
- Now apply the function by
conversion
andClientID
, unnest, count byconversion
andScenario
and pivot to your desired wide format
df %>%
group_by(conversion,ClientID) %>%
summarize(v = f(device), .groups="drop") %>%
unnest(v) %>%
mutate(
Scenario=rep(sapply(device_combinations,paste,collapse=","),length.out=n()),
conversion =if_else(conversion=="0", "Without Conversion", "With Conversion")
) %>%
group_by(Scenario, conversion) %>%
summarize(ct = sum(v), .groups="drop") %>%
pivot_wider(id_cols = Scenario, names_from = conversion,values_from = ct)
Output:
Scenario `With Conversion` `Without Conversion`
<chr> <int> <int>
1 App 1 0
2 App,Desktop 0 1
3 App,Web 0 1
4 App,Web,Desktop 0 0
5 Desktop 0 0
6 Web 2 1
7 Web,Desktop 0 0
CodePudding user response:
I thought I would add my answer in addition to the one that's already here.
Updated based on your comment
This doesn't look all that neat, but it does do what you're expecting. Although the other answer looks far better.
I started by collecting the count of unique customers by conversion by device.
library(tidyverse)
dfn <- map(c("0", "1"),
function(k) {
with(df[df$conversion == k,], table(device, ClientID)) %>%
as.data.frame() %>% filter(Freq > 0) %>% select(-Freq) %>%
distinct() %>% group_by(device) %>% summarise(cnt = n())
}
)
dfa <- data.frame(device = unique(df$device)) %>%
left_join(., dfn[[1]]) %>% setNames(., c("device", "0")) %>%
left_join(., dfn[[2]])
names(dfa)[3] <- "1"
dfa[is.na(dfa)] <- 0
dfa
# device 0 1
# 1 App 2 1
# 2 Web 2 2
# 3 Desktop 1 0
Then I wanted the combinations. There are only three here, so I could probably write the combinations faster than coding them. However, I've provided a more dynamic approach.
dbls = RcppAlgos::comboGeneral(dfa$device, 2)
# [,1] [,2]
# [1,] App Desktop
# [2,] App Web
# [3,] Desktop Web
# Levels: App Desktop Web
Now I'll use data frame df
to calculate the counts for the combinations determined in dbls
.
dfb <- map2_dfr(rep(1:nrow(dbls), 2), rep(c("0", "1"), 3),
function(x, y){
gimme = with(df[df$conversion == y, ], table(device, ClientID)) %>%
as.data.frame() %>%
filter(Freq > 0, device %in% c(dbls[x, 1], dbls[x, 2])) %>%
select(-Freq) %>% distinct() %>% group_by(ClientID) %>%
mutate(cnt = n()) %>% filter(cnt == 2)
if(nrow(gimme) == 0){
c(device = paste0(dbls[x, 1], " & ", dbls[x, 2]),
wh = y, cnt = 0)
} else {
c(device = paste0(dbls[x, 1], " & ", dbls[x, 2]),
wh = y, cnt = length(unique(gimme$ClientID)))
}
}) %>% pivot_wider(names_from = "wh", values_from = "cnt")
# # A tibble: 3 × 3
# device `0` `1`
# <chr> <chr> <chr>
# 1 App & Desktop 1 0
# 2 App & Web 1 0
# 3 Desktop & Web 0 0
Last, but not least, I combined the two frames.
I don't know which conversion (0 or 1) is with or without, so I just left the markers.
rbind(dfa, dfb)
# device 0 1
# 1 App 2 1
# 2 Web 2 2
# 3 Desktop 1 0
# 4 App & Desktop 1 0
# 5 App & Web 1 0
# 6 Desktop & Web 0 0