Home > Software engineering >  Check whether any column in the dataframe only has two unique values NA and 0
Check whether any column in the dataframe only has two unique values NA and 0

Time:10-17

I'm having a data frame, where some columns only contain values 0 & NA. I can find whether there are any columns with 2 unique values. But how can I check whether there are any columns in the data frame which has only 0 and NA?

apply(production_list_DF, 2, function(a) length(unique(a))==2) 

This just only checking whether each column is only with 2 unique values

CodePudding user response:

Does this work:

set.seed(1)
df <- data.frame(col1 = sample(1:10,5,F),
                 col2 = sample(1:10,5,F),
                 col3 = sample(c(NA,0),5,T),
                 col4 = sample(c(NA,0),5,T))

df
  col1 col2 col3 col4
1    9    7   NA    0
2    4    2   NA   NA
3    7    3    0   NA
4    1    8    0   NA
5    2    1    0   NA

apply(df,2,function(x) all(x %in% c(NA,0)))
 col1  col2  col3  col4 
FALSE FALSE  TRUE  TRUE 

To get col names

names(df[apply(df,2,function(x) all(x %in% c(NA,0)))])
[1] "col3" "col4"

Using sapply:

sapply(df, function(x) all(x %in% c(NA,0)))
 col1  col2  col3  col4 
FALSE FALSE  TRUE  TRUE 
names(df[sapply(df, function(x) all(x %in% c(NA,0)))])
[1] "col3" "col4"

CodePudding user response:

Base R Solutions:

I prefer colSums with sapply:

> colSums(sapply(df, `%in%`, c(0, NA))) == nrow(df)
 col1  col2  col3  col4 
FALSE FALSE  TRUE  TRUE 
> 

Or with a function:

> sapply(df, function(x) all(x %in% c(NA, 0)))
 col1  col2  col3  col4 
FALSE FALSE  TRUE  TRUE 
> 

Example dataframe took from @KarthikS:

set.seed(1)
df <- data.frame(col1 = sample(1:10,5,F),
                 col2 = sample(1:10,5,F),
                 col3 = sample(c(NA,0),5,T),
                 col4 = sample(c(NA,0),5,T))

df
  col1 col2 col3 col4
1    9    7   NA    0
2    4    2   NA   NA
3    7    3    0   NA
4    1    8    0   NA
5    2    1    0   NA

For columns names:

> names(df)[colSums(sapply(df, `%in%`, c(0, NA))) == nrow(df)]
[1] "col3" "col4"
> 

Or:

> names(df)[sapply(df, function(x) all(x %in% c(NA, 0)))]
[1] "col3" "col4"
> 

P.S. sapply could be replaced with apply(df, 2, ...) in all examples here.

  •  Tags:  
  • r
  • Related