Home > Mobile >  How to get number of occurrences per column?
How to get number of occurrences per column?

Time:04-28

I would like to know how many "NOK" my data has per column.

Up to now I only have df %>% count("NOK")

which counts all occurrences across the entire data frame but I would like to have these split up column-wisely. How do I add this?

I have about 70 columns so I don't want to enter the column names manually.

edit:

dput(df[1:10, c("Vis_housing", "Seasoning", "Seas_HV_pos")])
structure(list(Vis_housing = structure(c(2L, 3L, 3L, 3L, 3L, 
3L, 2L, 3L, 3L, 3L), .Label = c("0", "NOK", "OK"), class = "factor"), 
    Seasoning = structure(c(3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L), .Label = c("0", "NOK", "OK"), class = "factor"), Seas_HV_pos = c(100, 
    33, 19, 27, 27, 20, 17, 23, 10, 80)), row.names = 5:14, class = "data.frame")

CodePudding user response:

The easiest way is

colSums(df == "NOK")

# Vis_housing   Seasoning Seas_HV_pos
#           2           9           0

If you want to filter out those non-numeric columns in advance, expand it as

colSums(Filter(Negate(is.numeric), df) == "NOK")

# Vis_housing   Seasoning
#           2           9

CodePudding user response:

This will count all possible values for each non-numeric column (it is very unlikely that there are two rows with exact the same number, the result would be too messy otherwise):

library(tidyverse)

# example data
data <- mpg

data %>%
  select(-where(is.numeric)) %>%
  pivot_longer(everything()) %>%
  count(name, value)
#> # A tibble: 78 × 3
#>    name  value          n
#>    <chr> <chr>      <int>
#>  1 class 2seater        5
#>  2 class compact       47
#>  3 class midsize       41
#>  4 class minivan       11
#>  5 class pickup        33
#>  6 class subcompact    35
#>  7 class suv           62
#>  8 drv   4            103
#>  9 drv   f            106
#> 10 drv   r             25
#> # … with 68 more rows

Created on 2022-04-27 by the reprex package (v2.0.0)

  •  Tags:  
  • r
  • Related