Home > Back-end >  Identify columns that have only 0 values
Identify columns that have only 0 values

Time:09-26

I would like some help with the following question:

As you can see the executable code below it generates a graph with dots and a line. These points are generated from my df1 dataset. But notice that there are some DR that are only 0, for example, the DR09, DR10 and DR12. So I'd like to do some function or something that idenfied when all DRs are 0, and that weren't considered when calculating my datas variable, because as it is, it's considering the whole df1 dataset.

Thank you very much!

library(dplyr)
library(lubridate)
library(tidyverse)

df1 <- structure(
  list(date1 = c("2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28",
                 "2021-06-28","2021-06-28","2021-06-28"),
       date2 = c("2021-04-02","2021-04-03","2021-04-08","2021-04-09","2021-04-10","2021-07-01","2021-07-02","2021-07-03"),
       Week= c("Friday","Saturday","Thursday","Friday","Saturday","Thursday","Friday","Monday"),
       DR01 = c(4,1,4,3,3,4,3,6), DR02= c(4,2,6,7,3,2,7,4),DR03= c(9,5,4,3,3,2,1,5),
       DR04 = c(5,4,3,3,6,2,1,9),DR05 = c(5,4,5,3,6,2,1,9),
       DR06 = c(2,4,3,3,5,6,7,8),DR07 = c(2,5,4,4,9,4,7,8),
       DR08 = c(0,0,0,1,2,0,0,0),DR09 = c(0,0,0,0,0,0,0,0),DR010 = c(0,0,0,0,0,0,0,0),DR011 = c(0,4,0,0,0,0,0,0), DR012 = c(0,0,0,0,0,0,0,0)),
  class = "data.frame", row.names = c(NA, -8L))

#Generate graph

dmda<-"2021-07-01"

datas<-df1 %>%
  filter(date2 == ymd(dmda)) %>%
  summarize(across(starts_with("DR"), sum)) %>%
  pivot_longer(everything(), names_pattern = "DR(. )", values_to = "val") %>%
  mutate(name = as.numeric(name))
colnames(datas)<-c("Days","Numbers")

attach(datas)
plot(Numbers ~ Days, ylim=c(0,20))

model <- nls(Numbers ~ b1*Days^2 b2,start = list(b1 = 47,b2 = 0))

new.data <- data.frame(Days = seq(min(Days),max(Days),len = 45))
lines(new.data$Days,predict(model,newdata = new.data))

enter image description here

CodePudding user response:

We may do this with select i.e. check whether the column is numeric (is.numeric) and where all the values are 0

library(dplyr)
df1 %>%
    select(where(~ is.numeric(.) && all(. == 0))) %>% 
    names
[1] "DR09"  "DR010" "DR012"

If we want to select columns that doesn't include them, then negate (!)

df1 %>%
    select(!where(~ is.numeric(.) && all(. == 0)))
        date1      date2     Week DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
1 2021-06-28 2021-04-02   Friday    4    4    9    5    5    2    2    0     0
2 2021-06-28 2021-04-03 Saturday    1    2    5    4    4    4    5    0     4
3 2021-06-28 2021-04-08 Thursday    4    6    4    3    5    3    4    0     0
4 2021-06-28 2021-04-09   Friday    3    7    3    3    3    3    4    1     0
5 2021-06-28 2021-04-10 Saturday    3    3    3    6    6    5    9    2     0
6 2021-06-28 2021-07-01 Thursday    4    2    2    2    2    6    4    0     0
7 2021-06-28 2021-07-02   Friday    3    7    1    1    1    7    7    0     0
8 2021-06-28 2021-07-03   Monday    6    4    5    9    9    8    8    0     0

Or in base R

names(which(sapply(df1, function(x) is.numeric(x) && all(x == 0))))
[1] "DR09"  "DR010" "DR012"

CodePudding user response:

We could use colSums:

df1[, colSums(df1 != 0) > 0]

output:

       date1      date2     Week DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
1 2021-06-28 2021-04-02   Friday    4    4    9    5    5    2    2    0     0
2 2021-06-28 2021-04-03 Saturday    1    2    5    4    4    4    5    0     4
3 2021-06-28 2021-04-08 Thursday    4    6    4    3    5    3    4    0     0
4 2021-06-28 2021-04-09   Friday    3    7    3    3    3    3    4    1     0
5 2021-06-28 2021-04-10 Saturday    3    3    3    6    6    5    9    2     0
6 2021-06-28 2021-07-01 Thursday    4    2    2    2    2    6    4    0     0
7 2021-06-28 2021-07-02   Friday    3    7    1    1    1    7    7    0     0
8 2021-06-28 2021-07-03   Monday    6    4    5    9    9    8    8    0     0

CodePudding user response:

We can use Filter with any -

Filter(function(x) any(x != 0), df1)

#       date1      date2     Week DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
#1 2021-06-28 2021-04-02   Friday    4    4    9    5    5    2    2    0     0
#2 2021-06-28 2021-04-03 Saturday    1    2    5    4    4    4    5    0     4
#3 2021-06-28 2021-04-08 Thursday    4    6    4    3    5    3    4    0     0
#4 2021-06-28 2021-04-09   Friday    3    7    3    3    3    3    4    1     0
#5 2021-06-28 2021-04-10 Saturday    3    3    3    6    6    5    9    2     0
#6 2021-06-28 2021-07-01 Thursday    4    2    2    2    2    6    4    0     0
#7 2021-06-28 2021-07-02   Friday    3    7    1    1    1    7    7    0     0
#8 2021-06-28 2021-07-03   Monday    6    4    5    9    9    8    8    0     0

Similar logic can also be implemented using keep and discard function in purrr.

purrr::keep(df1, ~any(.x != 0))
purrr::discard(df1, ~all(.x == 0))
  •  Tags:  
  • r
  • Related