how to count the occurrence of a certain character in multiple columns in R-CodePudding

I am only interested in the occurrence of one certain character, for example, 'stateA' so I do not want to use table() as suggested in most answers I can find. And I wish to store the count as a new DF or in the last row. What is the tidiest way to do this? Thank you very much.

The data looks like

df <- data.frame(
  var1 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T))),
  var2 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T))),
  var3 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T))),
  var4 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T))),
  var5 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T)))
)

CodePudding user response：

Solution using base functionality:

colSums(dat == 'State A')

# X1 X2 X3 X4 X5 
# 42 46 49 55 54

To convert it to data.frame:

res <- colSums(data == 'State A')
data.frame(t(res))

#   X1 X2 X3 X4 X5
# 1 42 46 49 55 54

Or even simpler:

rbind(data.frame(), colSums(data == 'State A'))

#   X1 X2 X3 X4 X5
# 1 42 46 49 55 54

Data used in this example:

set.seed(74170239)

dat <- matrix(
  sprintf('State %s', sample(LETTERS[1:20], 5000, T)), 
  ncol = 5
 ) |>
 data.frame()

CodePudding user response：

Here is a tidyverse option:

library(tidyverse)


df |>
  summarise(across(everything(), ~sum(. == "stateA")))
#>   var1 var2 var3 var4 var5
#> 1   40   53   62   49   46

EDIT: Here is another option. Its definitely not better than the other proposed solutions, but I felt bad for accidentally stealing an answer from @Limey in the comments.

map_dfc(colnames(df), \(x) tibble(!!sym(x) := sum(df[x] == "stateA")))
#> # A tibble: 1 x 5
#>    var1  var2  var3  var4  var5
#>   <int> <int> <int> <int> <int>
#> 1    67    37    55    55    64

*Note that the solutions give a different result because I did not set a seed on the example dataframe.