I would like to take the difference between columns 1 & 2, 3 & 4, 5 & 6, 7 & 8, and so on. I originally had 55 corresponding column pairs (110 columns total) and needed to get 55 difference columns. I ended up coding each column difference by hand, but I thought I could probably do this much more efficiently. Perhaps by the use of arrays in SAS. I would like to solve this problem in r as well.
Synthetic data is below and if anyone knows how to quickly generate sequential paired column names like var1_apple, var1_banana, var2_apple, var2_banana, var3_apple, var3_banana,..., in r (without just typing out a vector of column names) that would be very helpful as well.
Thank you!
## create a dataframe with random values of 1:10. ncols x nrows = 200
df <- data.frame(matrix(sample(1:10, 200, replace = TRUE), ncol = 20, nrow = 10))
CodePudding user response:
Adjusting data to be column pairs:
df <- data.frame(matrix(sample(1:10, 200, replace = TRUE), ncol = 20, nrow = 10))
names(df) <- paste0("var", rep(1:10, each = 2), "_", rep(c("apple", "banana")))
names(df)
[1] "var1_apple" "var1_banana" "var2_apple" "var2_banana" "var3_apple" "var3_banana"
[7] "var4_apple" "var4_banana" "var5_apple" "var5_banana" "var6_apple" "var6_banana"
[13] "var7_apple" "var7_banana" "var8_apple" "var8_banana" "var9_apple" "var9_banana"
[19] "var10_apple" "var10_banana"
library(tidyverse)
df %>%
mutate(row = row_number()) %>%
pivot_longer(-row, names_to = c("var", ".value"), names_sep = "_")
# A tibble: 100 × 4
row var apple banana
<int> <chr> <int> <int>
1 1 var1 8 7
2 1 var2 4 9
3 1 var3 7 3
4 1 var4 6 10
5 1 var5 10 10
6 1 var6 1 1
7 1 var7 2 10
8 1 var8 7 9
9 1 var9 3 8
10 1 var10 2 6
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows
CodePudding user response:
I think @Tom's comment is spot-on. Restructuring the data probably makes sense if you are working with paired data. E.g.:
od <- names(df)[c(TRUE,FALSE)]
ev <- names(df)[c(FALSE,TRUE)]
data.frame(
odd = unlist(df[od]),
oddname = rep(od,each=nrow(df)),
even = unlist(df[ev]),
evenname = rep(ev,each=nrow(df))
)
## odd oddname even evenname
##X11 7 X1 10 X2
##X12 6 X1 1 X2
##X13 2 X1 6 X2
##X14 5 X1 2 X2
##X15 3 X1 1 X2
## ...
It is then trival to take one column from another in this structure.
If you must have the matrix-like output, then that is also achievable:
od <- names(df)[c(TRUE,FALSE)]
ev <- names(df)[c(FALSE,TRUE)]
setNames(df[od] - df[ev], paste(od, ev, sep="_"))
## X1_X2 X3_X4 X5_X6 X7_X8 X9_X10 X11_X12 X13_X14 X15_X16 X17_X18 X19_X20
##1 -3 2 4 4 -2 4 3 1 -3 9
##2 5 5 4 3 -1 3 -1 -3 5 -2
##3 -4 3 7 4 -5 1 1 5 -4 4
##4 3 0 6 3 4 -5 6 6 -7 4
##5 2 2 1 4 -6 -3 6 2 3 1
##6 -6 -2 4 -2 0 1 3 0 0 -7
##7 0 -6 3 7 -1 0 0 -5 3 1
##8 -1 3 3 1 2 -2 -5 3 0 0
##9 -4 1 -5 -2 -4 7 6 -2 4 -4
##10 2 -7 4 -1 0 -6 -4 -4 0 0