I have a dataset in the long format that contains columns a) with percentage change by years, and b) columns with the absolute change by years, besides c) the other data.
I need to write a function that would, depending on the value of TRUE or FALSE parameter I call difference
, exclude the columns that contain PERC_CHANGE
& EUR
, or ABS_CHANGE
& EUR
in their names, and then return the resulting dataframe.
Here is the reproducible code chunk:
df=structure(list(SCENARIO = c("BC", "BC", "BC", "BC"), INSTITUTE = c("BCR",
"BCR", "BCR", "BCR"), METHOD_DEC = c("BIL", "CARLA",
"CARLA", "CARLA"), CLASS = c("SME", "BANK", "CORPORATE",
"SME"), EUR_Y_2021 = c(13446986L, 0L, 0L, 0L), EUR_Y_2022 = c(16460885L,
133047L, 728991L, 665L), ABS_CHANGE_N_2021 = c(0L, 0L, 0L, 0L
), ABS_CHANGE_N_2022 = c(1815796L, -1039290L, 2768626L, -499L
), PERC_CHANGE_N_2022 = c(0.0227073699984259, -0.00992854123296549,
0.0608814672317806, -0.233723653395784), PERC_CHANGE_N_2023 = c(0.0722801890040687,
-0.0115649941812915, 0.145799497480829, -0.402341920374707)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), groups = structure(list(
CLASS = c("BANK", "CORPORATE", "SME", "SME"), METHOD_DEC = c("CARLA",
"CARLA", "BIL", "CARLA"), INSTITUTE = c("BCR", "BCR",
"BCR", "BCR"), SCENARIO = c("BC", "BC", "BC", "BC"), .rows = structure(list(
2L, 3L, 1L, 4L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))
And this is what I had in mind:
test_func <- function(df, difference) {
if (difference==TRUE) {
df=df %>% select(-contains("ABS_CHANGE" | contains("EUR"))
} else {
df=df %>% select(-contains("PERC_CHANGE" | contains("EUR"))
}
}
return (df)
test_func(df,difference=FALSE)
CodePudding user response:
You miss a few parentheses (near contains
) and misplaced the curly bracket (nothing was returned). Based on your code - try:
test_func <- function(df, difference) {
if (difference==TRUE) {
df=df %>% select(-(contains("ABS_CHANGE") | contains("EUR")))
} else {
df=df %>% select(-(contains("PERC_CHANGE") | contains("EUR")))
}
return (df)
}
Output:
# A tibble: 4 × 6
# Groups: CLASS, METHOD_DEC, INSTITUTE, SCENARIO [4]
SCENARIO INSTITUTE METHOD_DEC CLASS PERC_CHANGE_N_2022 PERC_CHANGE_N_2023
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 BC BCR BIL SME 0.0227 0.0723
2 BC BCR CARLA BANK -0.00993 -0.0116
3 BC BCR CARLA CORPORATE 0.0609 0.146
4 BC BCR CARLA SME -0.234 -0.402
# A tibble: 4 × 6
# Groups: CLASS, METHOD_DEC, INSTITUTE, SCENARIO [4]
SCENARIO INSTITUTE METHOD_DEC CLASS ABS_CHANGE_N_2021 ABS_CHANGE_N_2022
<chr> <chr> <chr> <chr> <int> <int>
1 BC BCR BIL SME 0 1815796
2 BC BCR CARLA BANK 0 -1039290
3 BC BCR CARLA CORPORATE 0 2768626
4 BC BCR CARLA SME 0 -499
Update: Added output.
CodePudding user response:
As the only change is 'ABS' vs 'PERC', we could write the function as
test_func <- function(df, difference = TRUE) {
nm <- if(difference) 'ABS_CHANGE' else 'PERC_CHANGE'
df %>%
select(-contains(nm), -contains("EUR"))
}
-testing
> test_func(df)
# A tibble: 4 × 6
# Groups: CLASS, METHOD_DEC, INSTITUTE, SCENARIO [4]
SCENARIO INSTITUTE METHOD_DEC CLASS PERC_CHANGE_N_2022 PERC_CHANGE_N_2023
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 BC BCR BIL SME 0.0227 0.0723
2 BC BCR CARLA BANK -0.00993 -0.0116
3 BC BCR CARLA CORPORATE 0.0609 0.146
4 BC BCR CARLA SME -0.234 -0.402
> test_func(df, FALSE)
# A tibble: 4 × 6
# Groups: CLASS, METHOD_DEC, INSTITUTE, SCENARIO [4]
SCENARIO INSTITUTE METHOD_DEC CLASS ABS_CHANGE_N_2021 ABS_CHANGE_N_2022
<chr> <chr> <chr> <chr> <int> <int>
1 BC BCR BIL SME 0 1815796
2 BC BCR CARLA BANK 0 -1039290
3 BC BCR CARLA CORPORATE 0 2768626
4 BC BCR CARLA SME 0 -499
Instead of a logical argument in difference, it could be an argument with the substring of column name. In that case, we don't need any if/else
test_function <- function(df, col_sub) {
df %>%
select(-contains(col_sub), -contains("EUR"))
}
and then test as
test_function(df, "ABS_CHANGE")
test_function(df, "PERC_CHANGE")