Summarize output of table to simple columns-CodePudding

I currently have this table and I want to sum the total number of purchases per each ID.

Input:

id	purchases	time
a	need	1:00
a	want	1:30
a	none	2:00
b	need	1:15
b	want	1:30
c	none	1:10
c	none	1:30
d	none	2:00
d	need	2:10
d	want	2:15
d	none	2:35
e	none	3:10
e	none	3:50
f	need	2:55
f	want	3:15
f	need	3:20

the purchases column was primarily not existent and instead there were item names. so I created this column first and then proceeded to try to reach the below output

Desired first output: total items bought, number of needs and wants separately, the output column is yes if first purchase is a need, no if it isn't, none if there were no purchases

id	total	need	want	output
a	2	1	1	yes
b	2	1	1	yes
c	0	0	0	none
d	2	1	1	no
e	0	0	0	none
f	3	2	1	yes

I am using dplyr so I would appreciate the suggested code to be feasible for adding in a dplyr pipeline.

What I tried to do

actions %>% group_by (id) %>% arrange(id) %>%
  mutate(purchases = ifelse(type == "Buy" & obj_category == "Books" | type == "Buy" & obj_category == "Car" | type=="Buy" & obj_category == "Business" | type == "Buy", "need",
                            ifelse(type == "Buy" & obj_category == "Sweets" | type == "Buy" & obj_category == "Electronics" | type == "Buy" & obj_category == "Business" | type == "Buy" & obj_category == "House", "want", "none"))) %>%
  summarise(need = ifelse(purchases == "need", 1, 0),
            want = ifelse(purchases == "want", 1, 0))

thank you in advance

CodePudding user response：

You could try

library(dplyr)

df %>%
  group_by(id) %>%
  summarise(need = sum(purchases == "need"),
            want = sum(purchases == "want"),
            total = need   want,
            output = case_when(first(purchases) == "need" ~ "yes",
                               total == 0 ~ "none",
                               TRUE ~ "no"))

# # A tibble: 6 × 5
#   id     need  want total output
#   <chr> <int> <int> <int> <chr> 
# 1 a         1     1     2 yes   
# 2 b         1     1     2 yes   
# 3 c         0     0     0 none  
# 4 d         1     1     2 no    
# 5 e         0     0     0 none  
# 6 f         2     1     3 yes

A general version if there are more categories in purchases:

library(dplyr)
library(janitor)

df %>% 
  tabyl(id, purchases) %>%
  select(-none) %>%
  adorn_totals("col") %>%
  left_join(
    df %>% group_by(id) %>%
      summarise(output = case_when(purchases[1] == "need" ~ "yes",
                                   all(purchases == "none") ~ "none",
                                   TRUE ~ "no")))

Data

df <- structure(list(id = c("a", "a", "a", "b", "b", "c", "c", "d", 
"d", "d", "d", "e", "e", "f", "f", "f"), purchases = c("need", 
"want", "none", "need", "want", "none", "none", "none", "need", 
"want", "none", "none", "none", "need", "want", "need"), time = c("1:00", 
"1:30", "2:00", "1:15", "1:30", "1:10", "1:30", "2:00", "2:10", 
"2:15", "2:35", "3:10", "3:50", "2:55", "3:15", "3:20")), class = "data.frame", row.names = c(NA, -16L))

CodePudding user response：

Here's a solution with dplyr and janitor:

library(dplyr)
library(janitor)

df %>% 
  janitor::tabyl(id, purchases) %>% 
  left_join(df %>% group_by(id) %>% slice(1), by = "id") %>% 
  rowwise() %>% 
  mutate(total = sum(c_across(need:want))) %>% 
  ungroup() %>% 
  mutate(purchases = ifelse(purchases == "need", "yes", "no"),
         purchases = ifelse(total == 0, "none", purchases)) %>% 
  select(-c(time, total))

Which gives:

# A tibble: 6 × 5
  id     need  none  want purchases
  <chr> <dbl> <dbl> <dbl> <chr>    
1 a         1     1     1 yes      
2 b         1     0     1 yes      
3 c         0     2     0 no       
4 d         1     2     1 no       
5 e         0     2     0 no       
6 f         2     0     1 yes