Could you help me to write the function correctly. First, I'll show you an example:
df1 <- structure(
list(
X1 = c(1, 1, 1, 1),
X2 = c("4","3","1","2"),
X3 = c("1", "2","3","2"),
X4 = c("1", "2","3","2"),
XM1 = c(200, 300, 200, 200),
XMR0 = c(300, 300, 300, 300),
XMR01 = c(300, 300, 300, 300),
XMR02 = c(300,300,300,300),
XMR03 = c(300,300,300,300),
XMR04 = c(300,250,350,350)),row.names = c(NA, 4L), class = "data.frame")
f1 <- function(data){
data %>%
transmute(across(matches("^X\\d $")),
XM1, across(starts_with("XMR"), ~ XM1 - .x,
.names = "{.col}_PV" ))
}
f1(df1)
> f1(df1)
X1 X2 X3 X4 XM1 XMR0_PV XMR01_PV XMR02_PV XMR03_PV XMR04_PV
1 1 4 1 1 200 -100 -100 -100 -100 -100
2 1 3 2 2 300 0 0 0 0 50
3 1 1 3 3 200 -100 -100 -100 -100 -150
4 1 2 2 2 200 -100 -100 -100 -100 -150
Now I have a similar database, but the column names are different.
df1 <- structure(
list(
Id = c(1, 1, 1, 1),
date1 = c("2022-01-06","2022-01-06","2022-01-06","2022-01-06"),
date2 = c("2022-01-02","2022-01-03","2022-01-09","2022-01-10"),
Week = c("Sunday","Monday","Sunday","Monday"),
Category = c("EFG", "ABC","EFG","ABC"),
DR1 = c(200, 300, 200, 200),
DRM0 = c(300, 300, 300, 300),
DRM01 = c(300, 300, 300, 300),
DRM02 = c(300,300,300,300),
DRM03 = c(300,300,300,300),
DRM04 = c(300,250,350,350)),row.names = c(NA, 4L), class = "data.frame")
So I would like to create a function that can be called f2
. What would my function look like now, compared to f1
above?
Output expected
Id date2 Week Category DR1 DRM0_PV DRM01_PV DRM02_PV DRM03_PV DRM04_PV
1 1 2022-01-02 Sunday EFG 200 -100 -100 -100 -100 -100
2 1 2022-01-03 Monday ABC 300 0 0 0 0 50
3 1 2022-01-09 Sunday EFG 200 -100 -100 -100 -100 -150
4 1 2022-01-10 Monday ABC 200 -100 -100 -100 -100 -150
CodePudding user response:
We may add some additional arguments in the function as input
colnm
- column name that is used to subtract as string (ensym
converts to symbol and it is evaluated with!!
- by usingensym
, we can also use unquoted argument as input)pat
- prefix pattern of the column name to be used for loopingacross
those columnscols_del
- columns to be deleted. By default it isNULL
. Thus, if we don't have the fourth argument, none of the columns are deleted.
f1 <- function(data, colnm, pat, cols_del = NULL){
colnm <- rlang::ensym(colnm)
data %>%
mutate(!! colnm, across(starts_with(pat), ~ !! colnm - .x,
.names = "{.col}_PV" ), .keep = "unused") %>%
select(-any_of(cols_del))
}
The code loops across
those columns that have prefix 'DRM/XMR' and subtract the value of column input in colnm
, and return only those columns unused
i.e. as we are creating new columns with .names
, the looped columns are not returned in the data, but we need 'DR1' or 'XM1', thus it is selected (!! colnm
), and in the last step remove any_of
'cols_del'ed from the output
-testing
> f1(df1, "DR1", "DRM", "date1")
Id date2 Week Category DR1 DRM0_PV DRM01_PV DRM02_PV DRM03_PV DRM04_PV
1 1 2022-01-02 Sunday EFG 200 -100 -100 -100 -100 -100
2 1 2022-01-03 Monday ABC 300 0 0 0 0 50
3 1 2022-01-09 Sunday EFG 200 -100 -100 -100 -100 -150
4 1 2022-01-10 Monday ABC 200 -100 -100 -100 -100 -150
-using the original 'df1'
> f1(df1, "XM1", "XMR")
X1 X2 X3 X4 XM1 XMR0_PV XMR01_PV XMR02_PV XMR03_PV XMR04_PV
1 1 4 1 1 200 -100 -100 -100 -100 -100
2 1 3 2 2 300 0 0 0 0 50
3 1 1 3 3 200 -100 -100 -100 -100 -150
4 1 2 2 2 200 -100 -100 -100 -100 -150