I'm having a data frame, where some columns only contain values 0 & NA. I can find whether there are any columns with 2 unique values. But how can I check whether there are any columns in the data frame which has only 0 and NA?
apply(production_list_DF, 2, function(a) length(unique(a))==2)
This just only checking whether each column is only with 2 unique values
CodePudding user response:
Does this work:
set.seed(1)
df <- data.frame(col1 = sample(1:10,5,F),
col2 = sample(1:10,5,F),
col3 = sample(c(NA,0),5,T),
col4 = sample(c(NA,0),5,T))
df
col1 col2 col3 col4
1 9 7 NA 0
2 4 2 NA NA
3 7 3 0 NA
4 1 8 0 NA
5 2 1 0 NA
apply(df,2,function(x) all(x %in% c(NA,0)))
col1 col2 col3 col4
FALSE FALSE TRUE TRUE
To get col names
names(df[apply(df,2,function(x) all(x %in% c(NA,0)))])
[1] "col3" "col4"
Using sapply:
sapply(df, function(x) all(x %in% c(NA,0)))
col1 col2 col3 col4
FALSE FALSE TRUE TRUE
names(df[sapply(df, function(x) all(x %in% c(NA,0)))])
[1] "col3" "col4"
CodePudding user response:
Base R Solutions:
I prefer colSums
with sapply
:
> colSums(sapply(df, `%in%`, c(0, NA))) == nrow(df)
col1 col2 col3 col4
FALSE FALSE TRUE TRUE
>
Or with a function:
> sapply(df, function(x) all(x %in% c(NA, 0)))
col1 col2 col3 col4
FALSE FALSE TRUE TRUE
>
Example dataframe took from @KarthikS:
set.seed(1)
df <- data.frame(col1 = sample(1:10,5,F),
col2 = sample(1:10,5,F),
col3 = sample(c(NA,0),5,T),
col4 = sample(c(NA,0),5,T))
df
col1 col2 col3 col4
1 9 7 NA 0
2 4 2 NA NA
3 7 3 0 NA
4 1 8 0 NA
5 2 1 0 NA
For columns names:
> names(df)[colSums(sapply(df, `%in%`, c(0, NA))) == nrow(df)]
[1] "col3" "col4"
>
Or:
> names(df)[sapply(df, function(x) all(x %in% c(NA, 0)))]
[1] "col3" "col4"
>
P.S. sapply
could be replaced with apply(df, 2, ...)
in all examples here.