Home > Software engineering >  Passing column name to a function in R to exclude certain columns using a function
Passing column name to a function in R to exclude certain columns using a function

Time:08-31

I have a dataset in the long format that contains columns a) with percentage change by years, and b) columns with the absolute change by years, besides c) the other data.

I need to write a function that would, depending on the value of TRUE or FALSE parameter I call difference, exclude the columns that contain PERC_CHANGE & EUR, or ABS_CHANGE & EUR in their names, and then return the resulting dataframe.

Here is the reproducible code chunk:

df=structure(list(SCENARIO = c("BC", "BC", "BC", "BC"), INSTITUTE = c("BCR", 
"BCR", "BCR", "BCR"), METHOD_DEC = c("BIL", "CARLA", 
"CARLA", "CARLA"), CLASS = c("SME", "BANK", "CORPORATE", 
"SME"), EUR_Y_2021 = c(13446986L, 0L, 0L, 0L), EUR_Y_2022 = c(16460885L, 
133047L, 728991L, 665L), ABS_CHANGE_N_2021 = c(0L, 0L, 0L, 0L
), ABS_CHANGE_N_2022 = c(1815796L, -1039290L, 2768626L, -499L
), PERC_CHANGE_N_2022 = c(0.0227073699984259, -0.00992854123296549, 
0.0608814672317806, -0.233723653395784), PERC_CHANGE_N_2023 = c(0.0722801890040687, 
-0.0115649941812915, 0.145799497480829, -0.402341920374707)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), groups = structure(list(
    CLASS = c("BANK", "CORPORATE", "SME", "SME"), METHOD_DEC = c("CARLA", 
    "CARLA", "BIL", "CARLA"), INSTITUTE = c("BCR", "BCR", 
    "BCR", "BCR"), SCENARIO = c("BC", "BC", "BC", "BC"), .rows = structure(list(
        2L, 3L, 1L, 4L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))

And this is what I had in mind:

test_func <- function(df, difference) {
    if (difference==TRUE) {
        df=df %>% select(-contains("ABS_CHANGE" | contains("EUR"))
       } else {                 
        df=df %>% select(-contains("PERC_CHANGE" | contains("EUR"))
                            }
       }
return (df)

 test_func(df,difference=FALSE)

CodePudding user response:

You miss a few parentheses (near contains) and misplaced the curly bracket (nothing was returned). Based on your code - try:

test_func <- function(df, difference) {
    if (difference==TRUE) {
        df=df %>% select(-(contains("ABS_CHANGE") | contains("EUR")))
    } else {          
        df=df %>% select(-(contains("PERC_CHANGE") | contains("EUR")))
    }
return (df)
}

Output:

# A tibble: 4 × 6
# Groups:   CLASS, METHOD_DEC, INSTITUTE, SCENARIO [4]
  SCENARIO INSTITUTE METHOD_DEC CLASS     PERC_CHANGE_N_2022 PERC_CHANGE_N_2023
  <chr>    <chr>     <chr>      <chr>                  <dbl>              <dbl>
1 BC       BCR       BIL        SME                  0.0227              0.0723
2 BC       BCR       CARLA      BANK                -0.00993            -0.0116
3 BC       BCR       CARLA      CORPORATE            0.0609              0.146 
4 BC       BCR       CARLA      SME                 -0.234              -0.402 

# A tibble: 4 × 6
# Groups:   CLASS, METHOD_DEC, INSTITUTE, SCENARIO [4]
  SCENARIO INSTITUTE METHOD_DEC CLASS     ABS_CHANGE_N_2021 ABS_CHANGE_N_2022
  <chr>    <chr>     <chr>      <chr>                 <int>             <int>
1 BC       BCR       BIL        SME                       0           1815796
2 BC       BCR       CARLA      BANK                      0          -1039290
3 BC       BCR       CARLA      CORPORATE                 0           2768626
4 BC       BCR       CARLA      SME                       0              -499

Update: Added output.

CodePudding user response:

As the only change is 'ABS' vs 'PERC', we could write the function as

test_func <- function(df, difference = TRUE) {
       nm <- if(difference) 'ABS_CHANGE' else 'PERC_CHANGE'
       df %>%
         select(-contains(nm), -contains("EUR"))
}

-testing

> test_func(df)
# A tibble: 4 × 6
# Groups:   CLASS, METHOD_DEC, INSTITUTE, SCENARIO [4]
  SCENARIO INSTITUTE METHOD_DEC CLASS     PERC_CHANGE_N_2022 PERC_CHANGE_N_2023
  <chr>    <chr>     <chr>      <chr>                  <dbl>              <dbl>
1 BC       BCR       BIL        SME                  0.0227              0.0723
2 BC       BCR       CARLA      BANK                -0.00993            -0.0116
3 BC       BCR       CARLA      CORPORATE            0.0609              0.146 
4 BC       BCR       CARLA      SME                 -0.234              -0.402 
> test_func(df, FALSE)
# A tibble: 4 × 6
# Groups:   CLASS, METHOD_DEC, INSTITUTE, SCENARIO [4]
  SCENARIO INSTITUTE METHOD_DEC CLASS     ABS_CHANGE_N_2021 ABS_CHANGE_N_2022
  <chr>    <chr>     <chr>      <chr>                 <int>             <int>
1 BC       BCR       BIL        SME                       0           1815796
2 BC       BCR       CARLA      BANK                      0          -1039290
3 BC       BCR       CARLA      CORPORATE                 0           2768626
4 BC       BCR       CARLA      SME                       0              -499

Instead of a logical argument in difference, it could be an argument with the substring of column name. In that case, we don't need any if/else

test_function <- function(df, col_sub) {
    df %>%
       select(-contains(col_sub), -contains("EUR"))
}

and then test as

test_function(df, "ABS_CHANGE")
test_function(df, "PERC_CHANGE")
  • Related