I am only interested in the occurrence of one certain character, for example, 'stateA' so I do not want to use table() as suggested in most answers I can find. And I wish to store the count as a new DF or in the last row. What is the tidiest way to do this? Thank you very much.
The data looks like
df <- data.frame(
var1 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T))),
var2 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T))),
var3 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T))),
var4 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T))),
var5 = c(paste0('state', sample(LETTERS[1:20], 1000, replace = T)))
)
CodePudding user response:
Solution using base
functionality:
colSums(dat == 'State A')
# X1 X2 X3 X4 X5
# 42 46 49 55 54
To convert it to data.frame
:
res <- colSums(data == 'State A')
data.frame(t(res))
# X1 X2 X3 X4 X5
# 1 42 46 49 55 54
Or even simpler:
rbind(data.frame(), colSums(data == 'State A'))
# X1 X2 X3 X4 X5
# 1 42 46 49 55 54
Data used in this example:
set.seed(74170239)
dat <- matrix(
sprintf('State %s', sample(LETTERS[1:20], 5000, T)),
ncol = 5
) |>
data.frame()
CodePudding user response:
Here is a tidyverse
option:
library(tidyverse)
df |>
summarise(across(everything(), ~sum(. == "stateA")))
#> var1 var2 var3 var4 var5
#> 1 40 53 62 49 46
EDIT: Here is another option. Its definitely not better than the other proposed solutions, but I felt bad for accidentally stealing an answer from @Limey in the comments.
map_dfc(colnames(df), \(x) tibble(!!sym(x) := sum(df[x] == "stateA")))
#> # A tibble: 1 x 5
#> var1 var2 var3 var4 var5
#> <int> <int> <int> <int> <int>
#> 1 67 37 55 55 64
*Note that the solutions give a different result because I did not set a seed on the example dataframe.