Home > Mobile >  Dplyr - Select if a column exists and summarize if it does exist
Dplyr - Select if a column exists and summarize if it does exist

Time:11-17

I am able to check for a specific column if it exists using 'contains' in dplyr . I struggle with evaluating the summary of the expression if it does not exist.

Here is my code snippet:

  df <- Prod%>%
      group_by(Entity)%>%
      select(Entity,`Cum.Oil`,`Cum.Gas`,contains("EUR")%>%
      summarise(Oil = mean(`Cum.Oil`), Gas = mean(`Cum.Gas`), EUR=mean(EUR))

How can I ignore 'EUR' expression in the summarise expression if the EUR column does not exist?

CodePudding user response:

Something like this should work:

df <- Prod%>%
      group_by(Entity)%>%
      summarise(across(any_of(c('Cum.Oil', 'Cum.Gas', 'Eur')), ~mean(.x), 
                .names = '{.col %>% str_remove("Cum.")}' )

Can't test without some reprex, though.

Tip: You can also use any_of in select statements:

df <- Prod%>%
      group_by(Entity)%>%
      select(any_of(c('Entity', 'Cum.Oil', 'Cum.Gas', "EUR"))

CodePudding user response:

If it is more convinient you can use ifelse constructions within dplyr pipes as well though I consider @Juan answer more Rly elegant:

Prod <- data.frame(Entity = c("a", "b", "a"),`Cum.Oil` = 1:3,`Cum.Gas`=c(2,4,6)
                   , EUR = c(7,9,9)
)
Prod %>% 
  group_by(Entity) %>% {
  ifelse(exists("EUR", .), 
         . <- summarise(., Oil = mean(`Cum.Oil`), 
                        Gas = mean(`Cum.Gas`), 
                        EUR = mean(EUR)
         ), 
         . <- summarise(., Oil = mean(`Cum.Oil`), 
                        Gas = mean(`Cum.Gas`)
         )
  )  
  .
} 

CodePudding user response:

Juan C's answer is the way to go about this.

But just as reference, we can use purrr::when() to create pipe friendly conditionals. When we want to use functions like exists() in when() we can specify the envir argument as follows: exists("EUR", envir = as.environment(.)).

library(tidyverse)

Prod <- data.frame(Entity = c("a", "b", "a"),
                   `Cum.Oil` = 1:3,
                   `Cum.Gas`=c(2,4,6),
                   EUR = c(7,9,9))

Prod %>%
  group_by(Entity)%>%
  select(Entity,`Cum.Oil`,`Cum.Gas`, contains("EUR")) %>%
  
  when(exists("EUR", envir = as.environment(.)) ~ 
         summarise(., Oil = mean(`Cum.Oil`), Gas = mean(`Cum.Gas`), EUR=mean(EUR)),
       
       TRUE ~
         summarise(., Oil = mean(`Cum.Oil`), Gas = mean(`Cum.Gas`))
  )

#> # A tibble: 2 × 4
#>   Entity   Oil   Gas   EUR
#>   <chr>  <dbl> <dbl> <dbl>
#> 1 a          2     4     8
#> 2 b          2     4     9

Created on 2022-11-16 with reprex v2.0.2

  • Related