I have a dataset sorted by IDs and several fruits. What I want to do is detect all possible combinations of 2 fruits dependent on the ID without repetition (Apple-Banana combination should be the same as Banana-Apple).
As an example:
ID | Fruit |
---|---|
1 | Apple |
1 | Banana |
1 | Blueberry |
2 | Apple |
3 | Orange |
3 | Banana |
3 | Apple |
3 | Blueberry |
What I want to create is:
ID | Combination |
---|---|
1 | Apple Banana |
1 | Apple Blueberry |
1 | Banana Blueberry |
2 | Apple |
3 | Banana Orange |
3 | Apple Orange |
3 | Blueberry Orange |
3 | Apple Banana |
3 | Banana Blueberry |
3 | Apple Blueberry |
The example dataset:
ID <- c(1,1,1,2,3,3,3,3)
Fruit <- c("Apple","Banana","Blueberry","Apple","Orange","Banana","Apple","Blueberry")
dataset <- data.frame(ID, Fruit)
CodePudding user response:
With dplyr
, you could use summarise
combn
:
library(dplyr)
dataset %>%
group_by(ID) %>%
summarise(Fruit = if(n() > 1) combn(Fruit, 2, simplify = FALSE) else list(Fruit))
# # A tibble: 10 × 2
# # Groups: ID [3]
# ID Fruit
# <dbl> <list>
# 1 1 <chr [2]>
# 2 1 <chr [2]>
# 3 1 <chr [2]>
# 4 2 <chr [1]>
# 5 3 <chr [2]>
# 6 3 <chr [2]>
# 7 3 <chr [2]>
# 8 3 <chr [2]>
# 9 3 <chr [2]>
# 10 3 <chr [2]>
where Fruit
is a list-column containing each pair of fruits for each ID
.
If you want to collapse each element of the list to end up with a character vector-column, just add FUN = toString
into combn()
.
(Notice the difference of the else
statements for both methods, the former is else list(Fruit)
and the latter is just else Fruit
)
dataset %>%
group_by(ID) %>%
summarise(Fruit = if(n() > 1) combn(Fruit, 2, FUN = toString) else Fruit)
# # A tibble: 10 × 2
# # Groups: ID [3]
# ID Fruit
# <dbl> <chr>
# 1 1 Apple, Banana
# 2 1 Apple, Blueberry
# 3 1 Banana, Blueberry
# 4 2 Apple
# 5 3 Orange, Banana
# 6 3 Orange, Apple
# 7 3 Orange, Blueberry
# 8 3 Banana, Apple
# 9 3 Banana, Blueberry
# 10 3 Apple, Blueberry
CodePudding user response:
Here is a base R option
with(
dataset,
rev(stack(by(Fruit, ID, function(x) as.vector(combn(x, pmin(2, length(x)), toString)))))
)
which gives
ind values
1 1 Apple, Banana
2 1 Apple, Blueberry
3 1 Banana, Blueberry
4 2 Apple
5 3 Orange, Banana
6 3 Orange, Apple
7 3 Orange, Blueberry
8 3 Banana, Apple
9 3 Banana, Blueberry
10 3 Apple, Blueberry
CodePudding user response:
This is for reference.
uniID=unique(dataset$ID)
res=NULL
for (id in 1:length(uniID))
{
sameIDdf=dataset[dataset$ID==id, ]
x=nrow(sameIDdf)
print(x)
if (x>1)
{
comb=t(combn(1:x, 2))
for (i in 1:nrow(comb))
{
res=rbind(res, data.frame(ID=id, Combination=paste(sameIDdf[comb[i,1], 'Fruit'], sameIDdf[comb[i,2], 'Fruit'])))
}
} else
{
res=rbind(res, data.frame(ID=id,Combination=sameIDdf[1,'Fruit']))
}
}
res
Result:
ID Combination
<int> <fct>
1 Apple Banana
1 Apple Blueberry
1 Banana Blueberry
2 Apple
3 Orange Banana
3 Orange Apple
3 Orange Blueberry
3 Banana Apple
3 Banana Blueberry
3 Apple Blueberry