I have a tibble with many variables organised this way:
tibble(
A = rep("A",10),
xyz1 = rnorm(10),
xyz2 = rnorm(10),
xyz3 = rnorm(10),
abc1 = rnorm(10),
abc2 = rnorm(10),
abb3 = rnorm(10),
acc4 = rnorm(10)
)
where xyz
, abc
, etc. are placeholder. After the placeholder there is a number. Assume it can be any integer.
In my tibble, that number can be any integer.
I want to trasmute
it according to the formula
xyzn = xyzn - 'xyzn-1'
, where n
is the symbol for the counted integer.
Whereas 'xyzn-1'
does not exist, the result can be ignored and not join the transmute
.
Schematic output:
tibble(
A = A
xyz2 = xyz2 - xyz1,
xyz3 = xyz3 - xyz2,
abc2 = abc2 - abc1
)
CodePudding user response:
Perhaps this helps
library(dplyr)
library(tidyr)
df1 %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -c(A, rn), names_to = c(".value", "ind"),
names_pattern = "(\\D )(\\d )",
names_transform = list(ind = as.integer)) %>%
arrange(A, rn, ind) %>%
group_by(A, rn) %>%
mutate(across(-ind, ~ c(NA, diff(.x)))) %>%
ungroup %>%
pivot_wider(names_from = ind, values_from = xyz:acc,
names_sep = "") %>%
select(-rn) %>%
select(where(~ any(complete.cases(.x))))
-output
# A tibble: 10 × 4
A xyz2 xyz3 abc2
<chr> <dbl> <dbl> <dbl>
1 A -1.60 1.75 -1.53
2 A 2.89 -3.81 0.0701
3 A -0.657 0.920 -0.912
4 A 0.305 0.395 -0.477
5 A -0.289 1.39 1.38
6 A -0.103 0.426 -1.38
7 A -2.16 1.44 -0.913
8 A -0.260 -0.249 -1.45
9 A -1.15 1.14 -1.42
10 A -0.306 0.198 -0.118
Or may be
cbind(df1[1], do.call(cbind, unname(Filter(nrow, lapply(split.default(df1[-1],
sub("\\d $", "", names(df1)[-1])),
\(x) {
i1 <- order(as.integer(sub("\\D ", "", names(x))))
x <- x[i1]
x[-1]- x[-ncol(x)]
})))
))
-output
A abc2 xyz2 xyz3
1 A -1.52667071 -1.5985160 1.7533450
2 A 0.07013841 2.8939503 -3.8113492
3 A -0.91213998 -0.6573093 0.9197824
4 A -0.47712113 0.3049918 0.3945995
5 A 1.37871603 -0.2886773 1.3933839
6 A -1.37608993 -0.1031296 0.4264927
7 A -0.91313982 -2.1630265 1.4407289
8 A -1.45439105 -0.2598476 -0.2493127
9 A -1.41590040 -1.1490018 1.1383060
10 A -0.11775196 -0.3061306 0.1984115
CodePudding user response:
Usually the {dplyover} package can help with this kind of problems (disclaimer: I'm the maintainer). However, in your specific case the problem is not that easy to solve due to he specific conditions for variable selection.
In the approach below we first construct the variable names that we want to subtract from each other myvars1
and myvars2
.
After that we can use dplyover::across2()
together with all_of()
.
See the code comments for what we do in each step:
library(dplyr)
library(stringr)
library(dplyover) # https://timteafan.github.io/dplyover/
# get all variable stems
all_stems <- dplyover::cut_names("[0-9]$", .vars = names(df1))
# exlcude stems which don't start with 1
use_stems <- all_stems[paste0(all_stems, 1) %in% names(df1)]
# construct regex pattern to select all vars with > 1
patrn <- paste0("(", paste(use_stems, collapse = "|"), ")[^1]$")
# select vars with > 1
myvars1 <- grep(patrn, names(df1), value = TRUE)
# select vars to substract from `myvars1`
myvars2 <- str_replace(myvars1, "\\d$", ~ as.numeric(.x) - 1)
# use `dplyover::across2()` with `all_of()`
df1 %>%
transmute(
A = A, # dplyover doesn't support the `.keep` argument so we need a workaround
across2(all_of(myvars1),
all_of(myvars2),
~ .x - .y,
.names = "{xcol}")
)
#> # A tibble: 10 × 4
#> A xyz2 xyz3 abc2
#> <chr> <dbl> <dbl> <dbl>
#> 1 A 0.847 -1.19 0.413
#> 2 A 1.00 0.946 -3.26
#> 3 A 0.856 -1.11 -2.62
#> 4 A -0.325 1.47 1.11
#> 5 A -1.18 0.0830 2.78
#> 6 A -2.65 -0.520 -0.337
#> 7 A 0.197 -0.447 0.347
#> 8 A -0.484 1.18 -0.717
#> 9 A -1.94 1.81 1.05
#> 10 A -3.80 1.36 0.202
The from OP:
df1 <- tibble(
A = rep("A",10),
xyz1 = rnorm(10), # 2
xyz2 = rnorm(10), # 1 2
xyz3 = rnorm(10), # 1
abc1 = rnorm(10), # 2
abc2 = rnorm(10), # 1
abb3 = rnorm(10), #
acc4 = rnorm(10) #
)
Created on 2023-01-07 with reprex v2.0.2