I currently have this table and I want to sum the total number of purchases per each ID.
Input:
id | purchases | time |
---|---|---|
a | need | 1:00 |
a | want | 1:30 |
a | none | 2:00 |
b | need | 1:15 |
b | want | 1:30 |
c | none | 1:10 |
c | none | 1:30 |
d | none | 2:00 |
d | need | 2:10 |
d | want | 2:15 |
d | none | 2:35 |
e | none | 3:10 |
e | none | 3:50 |
f | need | 2:55 |
f | want | 3:15 |
f | need | 3:20 |
the purchases column was primarily not existent and instead there were item names. so I created this column first and then proceeded to try to reach the below output
Desired first output: total items bought, number of needs and wants separately, the output column is yes if first purchase is a need, no if it isn't, none if there were no purchases
id | total | need | want | output |
---|---|---|---|---|
a | 2 | 1 | 1 | yes |
b | 2 | 1 | 1 | yes |
c | 0 | 0 | 0 | none |
d | 2 | 1 | 1 | no |
e | 0 | 0 | 0 | none |
f | 3 | 2 | 1 | yes |
I am using dplyr so I would appreciate the suggested code to be feasible for adding in a dplyr pipeline.
What I tried to do
actions %>% group_by (id) %>% arrange(id) %>%
mutate(purchases = ifelse(type == "Buy" & obj_category == "Books" | type == "Buy" & obj_category == "Car" | type=="Buy" & obj_category == "Business" | type == "Buy", "need",
ifelse(type == "Buy" & obj_category == "Sweets" | type == "Buy" & obj_category == "Electronics" | type == "Buy" & obj_category == "Business" | type == "Buy" & obj_category == "House", "want", "none"))) %>%
summarise(need = ifelse(purchases == "need", 1, 0),
want = ifelse(purchases == "want", 1, 0))
thank you in advance
CodePudding user response:
You could try
library(dplyr)
df %>%
group_by(id) %>%
summarise(need = sum(purchases == "need"),
want = sum(purchases == "want"),
total = need want,
output = case_when(first(purchases) == "need" ~ "yes",
total == 0 ~ "none",
TRUE ~ "no"))
# # A tibble: 6 × 5
# id need want total output
# <chr> <int> <int> <int> <chr>
# 1 a 1 1 2 yes
# 2 b 1 1 2 yes
# 3 c 0 0 0 none
# 4 d 1 1 2 no
# 5 e 0 0 0 none
# 6 f 2 1 3 yes
A general version if there are more categories in purchases
:
library(dplyr)
library(janitor)
df %>%
tabyl(id, purchases) %>%
select(-none) %>%
adorn_totals("col") %>%
left_join(
df %>% group_by(id) %>%
summarise(output = case_when(purchases[1] == "need" ~ "yes",
all(purchases == "none") ~ "none",
TRUE ~ "no")))
Data
df <- structure(list(id = c("a", "a", "a", "b", "b", "c", "c", "d",
"d", "d", "d", "e", "e", "f", "f", "f"), purchases = c("need",
"want", "none", "need", "want", "none", "none", "none", "need",
"want", "none", "none", "none", "need", "want", "need"), time = c("1:00",
"1:30", "2:00", "1:15", "1:30", "1:10", "1:30", "2:00", "2:10",
"2:15", "2:35", "3:10", "3:50", "2:55", "3:15", "3:20")), class = "data.frame", row.names = c(NA, -16L))
CodePudding user response:
Here's a solution with dplyr
and janitor
:
library(dplyr)
library(janitor)
df %>%
janitor::tabyl(id, purchases) %>%
left_join(df %>% group_by(id) %>% slice(1), by = "id") %>%
rowwise() %>%
mutate(total = sum(c_across(need:want))) %>%
ungroup() %>%
mutate(purchases = ifelse(purchases == "need", "yes", "no"),
purchases = ifelse(total == 0, "none", purchases)) %>%
select(-c(time, total))
Which gives:
# A tibble: 6 × 5
id need none want purchases
<chr> <dbl> <dbl> <dbl> <chr>
1 a 1 1 1 yes
2 b 1 0 1 yes
3 c 0 2 0 no
4 d 1 2 1 no
5 e 0 2 0 no
6 f 2 0 1 yes