Difference Between Adjacent Column Pairs in SAS and r-CodePudding

I would like to take the difference between columns 1 & 2, 3 & 4, 5 & 6, 7 & 8, and so on. I originally had 55 corresponding column pairs (110 columns total) and needed to get 55 difference columns. I ended up coding each column difference by hand, but I thought I could probably do this much more efficiently. Perhaps by the use of arrays in SAS. I would like to solve this problem in r as well.

Synthetic data is below and if anyone knows how to quickly generate sequential paired column names like var1_apple, var1_banana, var2_apple, var2_banana, var3_apple, var3_banana,..., in r (without just typing out a vector of column names) that would be very helpful as well.

Thank you!

## create a dataframe with random values of 1:10. ncols x nrows = 200
df <- data.frame(matrix(sample(1:10, 200, replace = TRUE), ncol = 20, nrow = 10))

CodePudding user response：

Adjusting data to be column pairs:

df <- data.frame(matrix(sample(1:10, 200, replace = TRUE), ncol = 20, nrow = 10))
names(df) <- paste0("var", rep(1:10, each = 2), "_", rep(c("apple", "banana")))


names(df)
[1] "var1_apple"   "var1_banana"  "var2_apple"   "var2_banana"  "var3_apple"   "var3_banana" 
 [7] "var4_apple"   "var4_banana"  "var5_apple"   "var5_banana"  "var6_apple"   "var6_banana" 
[13] "var7_apple"   "var7_banana"  "var8_apple"   "var8_banana"  "var9_apple"   "var9_banana" 
[19] "var10_apple"  "var10_banana"

library(tidyverse)
df %>%
  mutate(row = row_number()) %>%
  pivot_longer(-row, names_to = c("var", ".value"), names_sep = "_")

# A tibble: 100 × 4
     row var   apple banana
   <int> <chr> <int>  <int>
 1     1 var1      8      7
 2     1 var2      4      9
 3     1 var3      7      3
 4     1 var4      6     10
 5     1 var5     10     10
 6     1 var6      1      1
 7     1 var7      2     10
 8     1 var8      7      9
 9     1 var9      3      8
10     1 var10     2      6
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows

CodePudding user response：

I think @Tom's comment is spot-on. Restructuring the data probably makes sense if you are working with paired data. E.g.:

od <- names(df)[c(TRUE,FALSE)]
ev <- names(df)[c(FALSE,TRUE)]

data.frame(
    odd      = unlist(df[od]),
    oddname  = rep(od,each=nrow(df)),
    even     = unlist(df[ev]),
    evenname = rep(ev,each=nrow(df))
)
##      odd oddname even evenname
##X11     7      X1   10       X2
##X12     6      X1    1       X2
##X13     2      X1    6       X2
##X14     5      X1    2       X2
##X15     3      X1    1       X2
## ...

It is then trival to take one column from another in this structure.

If you must have the matrix-like output, then that is also achievable:

od <- names(df)[c(TRUE,FALSE)]
ev <- names(df)[c(FALSE,TRUE)]
setNames(df[od] - df[ev], paste(od, ev, sep="_"))

##   X1_X2 X3_X4 X5_X6 X7_X8 X9_X10 X11_X12 X13_X14 X15_X16 X17_X18 X19_X20
##1     -3     2     4     4     -2       4       3       1      -3       9
##2      5     5     4     3     -1       3      -1      -3       5      -2
##3     -4     3     7     4     -5       1       1       5      -4       4
##4      3     0     6     3      4      -5       6       6      -7       4
##5      2     2     1     4     -6      -3       6       2       3       1
##6     -6    -2     4    -2      0       1       3       0       0      -7
##7      0    -6     3     7     -1       0       0      -5       3       1
##8     -1     3     3     1      2      -2      -5       3       0       0
##9     -4     1    -5    -2     -4       7       6      -2       4      -4
##10     2    -7     4    -1      0      -6      -4      -4       0       0