I have data set similar to this:
df_out <- data.frame(
"name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
"Factor1"=rep(c("A","B","C"),times= c(2,1,5)),
"col3"=rep(c("T","S"),times= c(2,6)),
"col4"=rep(c("E","D"),times= c(6,2)))
df_out
I want to mutate all the columns and assign new value to them based on their counts so for all the columns I want the value with the highest occurrence to be the consensus and all of the rest values to be the non consensus and NA stay as it. So the output similar to this:
df_out2 <- data.frame(
"name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
"Factor1"=rep(c("non-consensus","consensus"),times= c(3,5)),
"col3"=rep(c("non-consensus","consensus"),times= c(2,6)),
"col4"=rep(c("consensus","non-consensus"),times= c(6,2)))
df_out2
Any help is appreciated.
CodePudding user response:
You could do
library(tidyverse)
df_out %>%
mutate(across(Factor1:col4,
~ ifelse(.x == names(rev(sort(table(.x))))[1], "consensus", "non-consensus")))
#> name Factor1 col3 col4
#> 1 1 non-consensus non-consensus consensus
#> 2 2 non-consensus non-consensus consensus
#> 3 3 non-consensus consensus consensus
#> 4 4 consensus consensus consensus
#> 5 5 consensus consensus consensus
#> 6 6 consensus consensus consensus
#> 7 7 consensus consensus non-consensus
#> 8 8 consensus consensus non-consensus
Created on 2022-12-11 with reprex v2.0.2
CodePudding user response:
a base R approach:
df_out2 <-
cbind(df_out$name,
df_out[-1] |> ## don't manipulate name column
lapply(function(column){ ## apply this function to each column
level_counts = table(column) ## count observations per factor level
ifelse(level_counts[column] == max(level_counts),
'consensus', 'non consensus'
)
}
) |> as.data.frame() ## convert list of columns to data frame
)
df_out$name Factor1 col3 col4
1 1 non consensus non consensus consensus
2 2 non consensus non consensus consensus
3 3 non consensus consensus consensus
4 4 consensus consensus consensus
5 5 consensus consensus consensus
6 6 consensus consensus consensus
7 7 consensus consensus non consensus
8 8 consensus consensus non consensus
CodePudding user response:
Here is one solution with pivoting: The key points are setting the grouping variable at the right position and remove the n
to get the desired solution:
add_count
is same as group_by(...) and mutate
library(dplyr)
library(tidyr)
df_out %>%
pivot_longer(-name,
names_to = "Factors",
values_to= "Values") %>%
add_count(Factors, Values) %>%
group_by(Factors) %>%
mutate(Values = ifelse(n==max(n), "consensus", "non-consensus")) %>%
select(-n) %>%
pivot_wider(names_from = Factors,
values_from = Values)