Find the count of unique values in all columns in a dataframe without including NA values (R)-CodePudding

Given a reproducible dataframe, I want to find the number of unique values in each column not including missing (NA) values. Below code counts NA values, as a result the cardinality of nat_country column shows as 4 in n_unique_values dataframe (it is supposed to be 3). In python there exists nunique() function which does not take NA values into consideration. In r how can one achieve this?

nat_country = c("United-States", "Germany", "United-States", "United-States", "United-States", "United-States", "Taiwan", NA)
age = c(14,15,45,78,96,58,25,36)
dat = data.frame(nat_country, age)
n_unique_values  = t(data.frame(apply(dat, 2, function(x) length(unique(x)))))

CodePudding user response：

You can use dplyr::n_distinct with na.rm = T:

library(dplyr)
sapply(dat, n_distinct, na.rm = T)
#map_dbl(dat, n_distinct, na.rm = T)

#nat_country         age 
#          3           8

In base R, you can use na.omit as well:

sapply(dat, \(x) length(unique(na.omit(x))))
#nat_country         age 
#          3           8

CodePudding user response：

We could use map or map_dfr with n_distinct:

library(dplyr)
library(purrr)
dat %>% 
  map_dfr(., n_distinct, na.rm = TRUE)

 nat_country   age
        <int> <int>
1           3     8

library(dplyr)
library(purrr)

dat %>% 
  map(., n_distinct, na.rm = TRUE) %>% 
  unlist()

nat_country         age 
          3           8

CodePudding user response：

In base R you can use table. It also has a parameter useNA if you want to change the default behavior.

sapply(dat, function(x) length(table(x)))
nat_country         age 
          3           8