rename columns if present in a dataframe - pipe operator with if else-CodePudding

I'm importing and tidying a large number of data sets and trying to streamline my code.

While the data sets are similar, more recent ones have different column headings.

For example:

df1 <- data.frame(a = c(1, 1, 3), b = 4:6, c = 5:7)

df2 <- data.frame(a = c(1, 2, 1, 1), b1 = 1:4, c = 8:11)

I have created a function to tidy this data, ensuring that, where it is present, b1 is renamed to b, and filtering a = 1 in all cases, as follows (this is a simplified version of the various bits of wrangling).

fun1 <- function(x) {
  if(any(grep('b1', colnames(x)) > 0)) {
    x %>%  
      filter(a == "1") %>%
      rename(b = b1) 
  } else {
    x %>%  
      filter(a == 1)
  }
}


fun1(df1)

However this still requires that filter(a == 1) step to be repeated within the function for both if() and else()

I would therefore like to filter(a == 1) all datasets, and then apply the if(), else(), rename() stage, e.g.:

fun1 <- function(x) {
x %>%
filter(a == "1") %<%
  if(any(grep('b1', colnames(x)) > 0)) {
      rename(x, b = b1) 
  } else {
  }
}

fun1(df1)

However, this returns the following error:

Error in if (.) any(grep("b1", colnames(x)) > 0) else { : 
  argument is not interpretable as logical
In addition: Warning message:
In if (.) any(grep("b1", colnames(x)) > 0) else { :
  the condition has length > 1 and only the first element will be used

What am I getting wrong?

Thanks Jack

CodePudding user response：

You could also chain the if else part as shown below:

fun1 <- function(x) {
  x %>%
    {if('b1'%in% names(.)) rename(., b = b1) else .} %>%
    filter(a == 1)
}

fun1(df2)
  a b  c
1 1 1  8
2 1 3 10
3 1 4 11

 fun1(df1)
  a b c
1 1 4 5
2 1 5 6

CodePudding user response：

We can use dplyr with rename_with and filter, so we get a function that is more versatile, and will work with unexpected names in any of the variables. First rename with a function that extracts only the letters from the names, then filter the a==1, all within a single pipe chain.

library(dplyr)

my_wrangler<-function(df){
        df %>% rename_with(.cols=everything(), ~str_extract(.x, "[[:alpha:]]")) %>%
                filter(a==1)
}

> my_wrangler(df2)
  a b  c
1 1 1  8
2 1 3 10
3 1 4 11

CodePudding user response：

Updated Solution Here is a far more concise way of doing it suggested by dear Mr. Onyambu again:

df2 |> 
  subset(a == 1) |> 
  (\(.)setNames(., sub('^b1$', 'b', names(.))))()

  a b  c
1 1 1  8
3 1 3 10
4 1 4 11

We could also do it using base R pipe like this (inspired by Mr. Onyambu as always). I defined a custom function where you only need to change the data set at the start of the pipe line:

df1 |>
  {\(x) {
    tmp <- subset(x, a == 1) 
    if("b1" %in% colnames(tmp)) {
      names(tmp)[names(tmp) == "b1"] <- "b"
    } 
    tmp
  }  
}()

  a b c
1 1 4 5
2 1 5 6

Or with df2:

CodePudding user response：

fun1 <- function(x) {
  store <- x %>%filter(a == "1")
    if(any(grep('b1', colnames(store)) > 0)) {
      rename(store, b = b1) 
    }
}

fun1(df2)
  a b  c
1 1 1  8
2 1 3 10
3 1 4 11