I have the following dataset:
ID<-c("A","B","C","D","E")
Fruits1<-c("orange","apple","pineapple","apple","pineapple")
Fruits2<-c("apple","orange","apple","pineapple","orange")
data<-tibble(ID,Fruits1,Fruits2)
data
# A tibble: 5 × 3
ID Fruits1 Fruits2
<chr> <chr> <chr>
1 A orange apple
2 B apple orange
3 C pineapple apple
4 D apple pineapple
5 E Pineapple orange
I want to create a new column called FruitsDiff, which combines columns Fruits1 and Fruits2 like so:
# A tibble: 5 × 4
ID Fruits1 Fruits2 FruitsDiff
<chr> <chr> <chr> <chr>
1 A orange apple apple-orange
2 B apple orange apple-orange
3 C pineapple apple apple-pineapple
4 D apple pineapple apple-pineapple
5 E pineapple orange orange-pineapple
My requirement for this new column is that the fruits be ordered alphabetically, regardless of which fruit comes first in the data frame (so for example, for line 1, even if orange comes before apple, the variable in FruitsDiff is apple-orange). I usually use paste() to combine columns, but this is not going to help me in this situation.
Any suggestions? Also, the separator between the two variables can be anything, I just used a dash for the sake of the example.
CodePudding user response:
We get the elementwise min/max
between the 'Fruit' columns with pmin/pmax
(where the min/max will be based on alphabetic order) and paste
(str_c
) the output from those functions to create the new column 'FruitsDiff'
library(dplyr)
library(stringr)
data %>%
mutate(FruitsDiff = str_c(pmin(Fruits1, Fruits2),
pmax(Fruits1, Fruits2), sep = '-'))
-output
# A tibble: 5 × 4
ID Fruits1 Fruits2 FruitsDiff
<chr> <chr> <chr> <chr>
1 A orange apple apple-orange
2 B apple orange apple-orange
3 C pineapple apple apple-pineapple
4 D apple pineapple apple-pineapple
5 E pineapple orange orange-pineapple
Or in base R
, use the same pmin/pmax
to paste
data$FruitsDiff <- with(data, paste(pmin(Fruits1, Fruits2),
pmax(Fruits1, Fruits2), sep = '-'))
Or with apply
and MARGIN = 1
, loop over the rows, sort
the elements and paste
with collapse
as argument
data$FruitsDiff <- apply(data[-1], 1, \(x) paste(sort(x), collapse = '-'))