Assigning new value to the highest occurrence value in R-CodePudding

I have data set similar to this:

df_out <- data.frame(
  "name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
  "Factor1"=rep(c("A","B","C"),times= c(2,1,5)),
  "col3"=rep(c("T","S"),times= c(2,6)),
  "col4"=rep(c("E","D"),times= c(6,2)))
df_out

I want to mutate all the columns and assign new value to them based on their counts so for all the columns I want the value with the highest occurrence to be the consensus and all of the rest values to be the non consensus and NA stay as it. So the output similar to this:

df_out2 <- data.frame(
  "name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
  "Factor1"=rep(c("non-consensus","consensus"),times= c(3,5)),
  "col3"=rep(c("non-consensus","consensus"),times= c(2,6)),
  "col4"=rep(c("consensus","non-consensus"),times= c(6,2)))
df_out2

Any help is appreciated.

CodePudding user response：

You could do

library(tidyverse)

df_out %>%
  mutate(across(Factor1:col4, 
    ~ ifelse(.x == names(rev(sort(table(.x))))[1], "consensus", "non-consensus")))
#>   name       Factor1          col3          col4
#> 1    1 non-consensus non-consensus     consensus
#> 2    2 non-consensus non-consensus     consensus
#> 3    3 non-consensus     consensus     consensus
#> 4    4     consensus     consensus     consensus
#> 5    5     consensus     consensus     consensus
#> 6    6     consensus     consensus     consensus
#> 7    7     consensus     consensus non-consensus
#> 8    8     consensus     consensus non-consensus

^{Created on 2022-12-11 with reprex v2.0.2}

CodePudding user response：

a base R approach:

df_out2 <- 
cbind(df_out$name,
      df_out[-1] |> ## don't manipulate name column
      lapply(function(column){ ## apply this function to each column
        level_counts = table(column) ## count observations per factor level        
        ifelse(level_counts[column] == max(level_counts),
               'consensus', 'non consensus'
               )
      }
      ) |> as.data.frame() ## convert list of columns to data frame
      )

  df_out$name       Factor1          col3          col4
1           1 non consensus non consensus     consensus
2           2 non consensus non consensus     consensus
3           3 non consensus     consensus     consensus
4           4     consensus     consensus     consensus
5           5     consensus     consensus     consensus
6           6     consensus     consensus     consensus
7           7     consensus     consensus non consensus
8           8     consensus     consensus non consensus

CodePudding user response：

Here is one solution with pivoting: The key points are setting the grouping variable at the right position and remove the n to get the desired solution:

add_count is same as group_by(...) and mutate

library(dplyr)
library(tidyr)

df_out %>%
  pivot_longer(-name, 
               names_to = "Factors",
               values_to= "Values") %>% 
  add_count(Factors, Values) %>% 
  group_by(Factors) %>% 
  mutate(Values = ifelse(n==max(n), "consensus", "non-consensus")) %>% 
  select(-n) %>% 
  pivot_wider(names_from = Factors,
              values_from = Values)