How to Calculate the highest text frequency by column in R-CodePudding

I have a data like this:

how to change become:

I need some helps to fit it...

CodePudding user response：

Here is a dplyr option

library(dplyr)
df %>%
    group_by(X1) %>%
    summarise(
        X3 = names(sort(table(X2), decreasing = TRUE))[1],
        X2 = toString(X2),
        .groups = "drop") %>%
    select(X1, X2, X3)
## A tibble: 3 × 3
#  X1    X2                        X3    
#  <chr> <chr>                     <chr> 
#1 A     high, high                high  
#2 B     middle, middle, high      middle
#3 C     low, middle, middle, high middle

Note: Since you're overwriting X2, we need to carry out the calculation for X3 first and then summarise X2. The last select call reorders columns to match your expected output.

Note also that this might need some careful checking of edge cases on your side to deal with ties in word frequencies.

Sample data

df <- data.frame(
    X1 = c("A", "A", "B", "B", "B", "C", "C", "C", "C"),
    X2 = c("high", "high", "middle", "middle", "high", "low", "middle", "middle", "high"))

CodePudding user response：

Using base, group by X1, collapse the strings - toString, and get the most frequent string - table:

aggregate(X2 ~ X1, df, function(i) c(toString(i), names(sort(-table(i)))[ 1 ]))
#   X1                      X2.1   X2.2
# 1  A                high, high   high
# 2  B      middle, middle, high middle
# 3  C low, middle, middle, high middle