I have a data like this:
how to change become:
I need some helps to fit it...
CodePudding user response:
Here is a dplyr
option
library(dplyr)
df %>%
group_by(X1) %>%
summarise(
X3 = names(sort(table(X2), decreasing = TRUE))[1],
X2 = toString(X2),
.groups = "drop") %>%
select(X1, X2, X3)
## A tibble: 3 × 3
# X1 X2 X3
# <chr> <chr> <chr>
#1 A high, high high
#2 B middle, middle, high middle
#3 C low, middle, middle, high middle
Note: Since you're overwriting X2
, we need to carry out the calculation for X3
first and then summarise X2
. The last select
call reorders columns to match your expected output.
Note also that this might need some careful checking of edge cases on your side to deal with ties in word frequencies.
Sample data
df <- data.frame(
X1 = c("A", "A", "B", "B", "B", "C", "C", "C", "C"),
X2 = c("high", "high", "middle", "middle", "high", "low", "middle", "middle", "high"))
CodePudding user response:
Using base, group by X1, collapse the strings - toString, and get the most frequent string - table:
aggregate(X2 ~ X1, df, function(i) c(toString(i), names(sort(-table(i)))[ 1 ]))
# X1 X2.1 X2.2
# 1 A high, high high
# 2 B middle, middle, high middle
# 3 C low, middle, middle, high middle