I would like to know how many "NOK" my data has per column.
Up to now I only have df %>% count("NOK")
which counts all occurrences across the entire data frame but I would like to have these split up column-wisely. How do I add this?
I have about 70 columns so I don't want to enter the column names manually.
edit:
dput(df[1:10, c("Vis_housing", "Seasoning", "Seas_HV_pos")])
structure(list(Vis_housing = structure(c(2L, 3L, 3L, 3L, 3L,
3L, 2L, 3L, 3L, 3L), .Label = c("0", "NOK", "OK"), class = "factor"),
Seasoning = structure(c(3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("0", "NOK", "OK"), class = "factor"), Seas_HV_pos = c(100,
33, 19, 27, 27, 20, 17, 23, 10, 80)), row.names = 5:14, class = "data.frame")
CodePudding user response:
The easiest way is
colSums(df == "NOK")
# Vis_housing Seasoning Seas_HV_pos
# 2 9 0
If you want to filter out those non-numeric columns in advance, expand it as
colSums(Filter(Negate(is.numeric), df) == "NOK")
# Vis_housing Seasoning
# 2 9
CodePudding user response:
This will count all possible values for each non-numeric column (it is very unlikely that there are two rows with exact the same number, the result would be too messy otherwise):
library(tidyverse)
# example data
data <- mpg
data %>%
select(-where(is.numeric)) %>%
pivot_longer(everything()) %>%
count(name, value)
#> # A tibble: 78 × 3
#> name value n
#> <chr> <chr> <int>
#> 1 class 2seater 5
#> 2 class compact 47
#> 3 class midsize 41
#> 4 class minivan 11
#> 5 class pickup 33
#> 6 class subcompact 35
#> 7 class suv 62
#> 8 drv 4 103
#> 9 drv f 106
#> 10 drv r 25
#> # … with 68 more rows
Created on 2022-04-27 by the reprex package (v2.0.0)