count the categorical variable in r-CodePudding


sat_score<-c(100,4,30,4,20)
state <-c("NC","NC","CA","WA","NC")
id <- 1: 5

data<-data.frame(sat_score,state,id)

data is like this

> data
  sat_score state id
1       100    NC  1
2         4    NC  2
3        30    CA  3
4         4    WA  4
5        20    NC  5

if I want to see the state's frequency, I can use the following code,

data %>%
   count(state)

and the result is like this

> data %>%
     count(state)
  state n
1    CA 1
2    NC 3
3    WA 1

However, what I want is not this frequency table for the whole "state" variable.

I want to have how many "NC" are in the "state" column

so, the results should be number 3.

How can I do this?

CodePudding user response：

library(dplyr)

sat_score<-c(100,4,30,4,20)
state <-c("NC","NC","CA","WA","NC")
id <- 1: 5

data<-data.frame(sat_score,state,id)

data %>% 
  tally(state == "NC")

data %>% 
  count(state) %>% 
  filter(state == 'NC') %>% 
  pull(n)

CodePudding user response：

One thing you could do is create a list of all the counts with the state as the row name in order to easily access any of the states' individual counts using the state name:

library(tidyverse)
sat_score<-c(100,4,30,4,20)
state <-c("NC","NC","CA","WA","NC")
id <- 1: 5

data<-data.frame(sat_score,state,id)

# get the state counts
state_count <- count(data, state) 
# put the counts in a list
state_list <- as.list(state_count$n)
# name each count with the state name
names(state_list) <- state_count$state
# access the individual count value with $statename
state_list$NC
#> [1] 3

^{Created on 2022-05-15 by the reprex package (v2.0.1)}

This gives a more general way of getting any of the states counts, however if you're really only looking for the value of NC and don't plan to access any other values, then philiptomk's solution (sum(data$state == "NC") is probably the way to go.