I'd like to do association analysis using apriori algorithm.
To do that, I have to make a dataset.
What I have data is like this.
data.frame("order_number"=c("100145", "100155", "100155", "100155",
"500002", "500002", "500002", "500007"),
"order_item"=c("27684535","15755576",
"1357954","124776249","12478324","15755576","13577","27684535"))
order_number order_item
1 100145 27684535
2 100155 15755576
3 100155 1357954
4 100155 124776249
5 500002 12478324
6 500002 15755576
7 500002 13577
8 500007 27684535
and I want to transfer the data like this
data.frame("order_number"=c("100145","100155","500002","500007"),
"col1"=c("27684535","15755576","12478324","27684535"),
"col2"=c(NA,"1357954","15755576",NA),
"col3"=c(NA,"124776249","13577",NA))
order_number col1 col2 col3
1 100145 27684535 <NA> <NA>
2 100155 15755576 1357954 124776249
3 500002 12478324 15755576 13577
4 500007 27684535 <NA> <NA>
Thank you for your help.
CodePudding user response:
This would be a case of pivot_wider
(or other functions for changing column layout). First step would be creating a row id variable to note whether each is 1, 2 or 3, then shaping this into the dataframe you want:
df <- data.frame("order_number"=c("100145", "100155", "100155", "100155",
"500002", "500002", "500002", "500007"),
"order_item"=c("27684535","15755576",
"1357954","124776249","12478324","15755576","13577","27684535"))
library(tidyr)
library(dplyr)
df |>
group_by(order_number) |>
mutate(rank = row_number()) |>
pivot_wider(names_from = rank, values_from = order_item,
names_prefix = "col")
#> # A tibble: 4 × 4
#> # Groups: order_number [4]
#> order_number col1 col2 col3
#> <chr> <chr> <chr> <chr>
#> 1 100145 27684535 <NA> <NA>
#> 2 100155 15755576 1357954 124776249
#> 3 500002 12478324 15755576 13577
#> 4 500007 27684535 <NA> <NA>