Home > Blockchain >  How to scale up a transmute in tidyverse?
How to scale up a transmute in tidyverse?

Time:01-08

I have a tibble with many variables organised this way:

tibble(
  A = rep("A",10),
  xyz1 = rnorm(10),
  xyz2 = rnorm(10),
  xyz3 = rnorm(10),
  abc1 = rnorm(10),
  abc2 = rnorm(10),
  abb3 = rnorm(10),
  acc4 = rnorm(10)
)

where xyz, abc, etc. are placeholder. After the placeholder there is a number. Assume it can be any integer. In my tibble, that number can be any integer.

I want to trasmute it according to the formula

xyzn = xyzn - 'xyzn-1', where n is the symbol for the counted integer.

Whereas 'xyzn-1' does not exist, the result can be ignored and not join the transmute.

Schematic output:

tibble(
  A = A
  xyz2 = xyz2 - xyz1,
  xyz3 = xyz3 - xyz2,
  abc2 = abc2 - abc1
)

CodePudding user response:

Perhaps this helps

library(dplyr)
library(tidyr)
df1 %>% 
  mutate(rn = row_number()) %>% 
  pivot_longer(cols = -c(A, rn), names_to = c(".value", "ind"), 
   names_pattern = "(\\D )(\\d )",
    names_transform = list(ind = as.integer)) %>% 
  arrange(A, rn, ind) %>% 
  group_by(A, rn) %>%
  mutate(across(-ind, ~ c(NA, diff(.x)))) %>%
  ungroup %>% 
 pivot_wider(names_from = ind, values_from = xyz:acc,
   names_sep = "") %>% 
 select(-rn) %>% 
 select(where(~ any(complete.cases(.x))))

-output

# A tibble: 10 × 4
   A       xyz2   xyz3    abc2
   <chr>  <dbl>  <dbl>   <dbl>
 1 A     -1.60   1.75  -1.53  
 2 A      2.89  -3.81   0.0701
 3 A     -0.657  0.920 -0.912 
 4 A      0.305  0.395 -0.477 
 5 A     -0.289  1.39   1.38  
 6 A     -0.103  0.426 -1.38  
 7 A     -2.16   1.44  -0.913 
 8 A     -0.260 -0.249 -1.45  
 9 A     -1.15   1.14  -1.42  
10 A     -0.306  0.198 -0.118 

Or may be

cbind(df1[1],  do.call(cbind, unname(Filter(nrow, lapply(split.default(df1[-1], 
sub("\\d $", "", names(df1)[-1])), 
     \(x) {
    i1 <- order(as.integer(sub("\\D ", "", names(x))))
    x <- x[i1]
    x[-1]- x[-ncol(x)]
   })))
))

-output

  A        abc2       xyz2       xyz3
1  A -1.52667071 -1.5985160  1.7533450
2  A  0.07013841  2.8939503 -3.8113492
3  A -0.91213998 -0.6573093  0.9197824
4  A -0.47712113  0.3049918  0.3945995
5  A  1.37871603 -0.2886773  1.3933839
6  A -1.37608993 -0.1031296  0.4264927
7  A -0.91313982 -2.1630265  1.4407289
8  A -1.45439105 -0.2598476 -0.2493127
9  A -1.41590040 -1.1490018  1.1383060
10 A -0.11775196 -0.3061306  0.1984115

CodePudding user response:

Usually the {dplyover} package can help with this kind of problems (disclaimer: I'm the maintainer). However, in your specific case the problem is not that easy to solve due to he specific conditions for variable selection.

In the approach below we first construct the variable names that we want to subtract from each other myvars1 and myvars2.

After that we can use dplyover::across2() together with all_of().

See the code comments for what we do in each step:

library(dplyr)
library(stringr)
library(dplyover) # https://timteafan.github.io/dplyover/


# get all variable stems
all_stems <- dplyover::cut_names("[0-9]$", .vars = names(df1))

# exlcude stems which don't start with 1
use_stems <- all_stems[paste0(all_stems, 1) %in% names(df1)]

# construct regex pattern to select all vars with > 1
patrn <- paste0("(", paste(use_stems, collapse = "|"), ")[^1]$")

# select vars with > 1
myvars1 <- grep(patrn, names(df1), value = TRUE)
# select vars to substract from `myvars1`
myvars2 <- str_replace(myvars1, "\\d$", ~ as.numeric(.x) - 1)

# use `dplyover::across2()` with `all_of()`
df1 %>% 
  transmute(
    A = A, # dplyover doesn't support the `.keep` argument so we need a workaround
    across2(all_of(myvars1),
            all_of(myvars2),
            ~ .x - .y,
            .names = "{xcol}")
    )
#> # A tibble: 10 × 4
#>    A       xyz2    xyz3   abc2
#>    <chr>  <dbl>   <dbl>  <dbl>
#>  1 A      0.847 -1.19    0.413
#>  2 A      1.00   0.946  -3.26 
#>  3 A      0.856 -1.11   -2.62 
#>  4 A     -0.325  1.47    1.11 
#>  5 A     -1.18   0.0830  2.78 
#>  6 A     -2.65  -0.520  -0.337
#>  7 A      0.197 -0.447   0.347
#>  8 A     -0.484  1.18   -0.717
#>  9 A     -1.94   1.81    1.05 
#> 10 A     -3.80   1.36    0.202

The from OP:

df1 <- tibble(
  A = rep("A",10),
  xyz1 = rnorm(10), #   2 
  xyz2 = rnorm(10), # 1 2
  xyz3 = rnorm(10), # 1
  abc1 = rnorm(10), #   2
  abc2 = rnorm(10), # 1
  abb3 = rnorm(10), #
  acc4 = rnorm(10)  #
)

Created on 2023-01-07 with reprex v2.0.2

  • Related