Im trying to Create a column with the string that appears the most in the row and create another column with the number of times this most prevalent string appeared.
To facilitate my question this is what im trying to achieve:
My actual DF
What im trying to obtain: most prevalente category and count
example df: d
f <- data.frame(ID = 1:4,
V1 = c("A","B","C","D"),
V2 = c("A", "B","D","B"),
V3 = c("A","C","D","B"))
CodePudding user response:
Here is another way:
count <- sapply(apply(f[, -1], 1, table), max)
count
# [1] 3 2 2 2
category <- names(sapply(apply(f[, -1], 1, table), which.max))
category
# [1] "A" "B" "D" "B"
f2 <- data.frame(f, category, count)
f2
# ID V1 V2 V3 category count
# 1 1 A A A A 3
# 2 2 B B C B 2
# 3 3 C D D D 2
# 4 4 D B B B 2
CodePudding user response:
df <- data.frame(ID = 1:4,
V1 = c("A","B","C","D"),
V2 = c("A", "B","D","B"),
V3 = c("A","C","D","B"))
library(data.table)
setDT(df)
other <- melt(df, id.vars = "ID", measure.vars = c("V1", "V2", "V3"))
other <- other[, .N, by = .(ID, value)]
colnames(other) <- c("ID", "category", "count")
other <- other[, .SD[which.max(count)], by = .(ID)]
res <- merge(df, other, by = c("ID"))
res
CodePudding user response:
- We can use
dplyr
rowwise
function to applytable
to each row fromV1:V3
library(dplyr)
df |> rowwise() |>
mutate(category = names(table(c_across(V1:V3)))[which.max(table(c_across(V1:V3)))] ,
count = max(table(c_across(V1:V3))))
- Output
# A tibble: 4 × 6
# Rowwise:
ID V1 V2 V3 category count
<int> <chr> <chr> <chr> <chr> <int>
1 1 A A A A 3
2 2 B B C B 2
3 3 C D D D 2
4 4 D B B B 2