I have a table where the rows are transactions and columns are products bought in that transaction.
An example:
Date | Transaction | product 1 | product 2 | product 3 |
---|---|---|---|---|
1-1 | 1 | Apple | Banana | |
1-1 | 2 | Orange | ||
1-1 | 3 | Apple | Orange | |
1-2 | 4 | |||
1-2 | 5 | Apple | Banana | |
1-3 | 6 | orange | Apple | Banana |
I want to know how many products were sold on what day. So that I will get a table as such:
Date | Apple | Banana | Orange |
---|---|---|---|
1-1 | 2 | 1 | 2 |
1-2 | 1 | 1 | |
1-3 | 1 | 1 | 1 |
The challenge is, however, that there are a lot of products. Hence I won't be able to manually type them out in order for them to count.
Is anyone able to help me out? Thanks in advance!
CodePudding user response:
library(tidyverse)
data <- tribble(
~Date, ~Transaction, ~product.1, ~product.2, ~product.3,
"1-1", 1L, "Apple", "Banana", NA,
"1-1", 2L, "Orange", NA, NA,
"1-1", 3L, "Apple", "Orange", NA,
"1-2", 4L, NA, NA, NA,
"1-2", 5L, "Apple", "Banana", NA,
"1-3", 6L, "Orange", "Apple", "Banana"
)
data %>%
pivot_longer(-c(Date, Transaction)) %>%
count(Date, value) %>%
pivot_wider(names_from = value, values_from = n, values_fill = list(n = 0))
#> # A tibble: 3 x 5
#> Date Apple Banana Orange `NA`
#> <chr> <int> <int> <int> <int>
#> 1 1-1 2 1 2 4
#> 2 1-2 1 1 0 4
#> 3 1-3 1 1 1 0
Created on 2021-12-08 by the reprex package (v2.0.1)
CodePudding user response:
We could use base R
with table
- unlist
the 'product' columns to a vector
while rep
licating the 'Date' by the number of 'product' columns and use table
table(rep(df1$Date, 3), unlist(df1[3:5]))
Apple Banana Orange
1-1 2 1 2
1-2 1 1 0
1-3 1 1 1
CodePudding user response:
Here is a version with pivoting: @akrun: I tried values_fill = list(n = 0)
but gives Error: Can't convert to .
df %>%
select(-Transaction) %>%
pivot_longer(
-Date
) %>%
na.omit() %>%
group_by(Date) %>%
add_count(value) %>%
select(-name) %>%
pivot_wider(
names_from = value,
values_from = n,
values_fn = list
) %>%
unnest(cols = c(Apple, Banana, Orange)) %>%
distinct()
Date Apple Banana Orange
<chr> <int> <int> <int>
1 1-1 2 1 2
2 1-2 1 1 NA
3 1-3 1 1 1