converting wide table to long programatically in R-CodePudding

my example dataset is wide and contains these values:

    olddata_wide <- read.table(header=TRUE, text='
 subject sex a b c a1 b1 c1  a2 b2 c2 
       1   M     7.9  12.3  10.7 7.5  12.1  10.3 8.1  12.5  10.9
       2   F     6.3  10.6  11.1 6.0  10.4  11.0 6.5  10.9  11.4
       3   F     9.5  13.1  13.8 9.3  13.0  13.5 9.8  13.5  13.9
       4   M    11.5  13.4  12.9 11.2  13.5  12.7 11.7  13.6  13.9
')

I would like to convert it to a long dataset. The problem is that I want to use multiple keycols at the same time - I want the columns a & b & c to become one long column called value, as well as columns a1 & b1 & c1 to value1 and a2 & b2 & c2 to value3. so the desired outcome is:

    subject sex value valueType value1 valueType1 value2 valueType2
 1:       1   M   7.9         a    7.5         a1    8.1         a2
 2:       2   F   6.3         a    6.0         a1    6.5         a2
 3:       3   F   9.5         a    9.3         a1    9.8         a2
 4:       4   M  11.5         a   11.2         a1   11.7         a2
 5:       1   M  12.3         b   12.1         b1   12.5         b2
 6:       2   F  10.6         b   10.4         b1   10.9         b2
 7:       3   F  13.1         b   13.0         b1   13.5         b2
 8:       4   M  13.4         b   13.5         b1   13.6         b2
 9:       1   M  10.7         c   10.3         c1   10.9         c2
10:       2   F  11.1         c   11.0         c1   11.4         c2
11:       3   F  13.8         c   13.5         c1   13.9         c2
12:       4   M  12.9         c   12.7         c1   13.9         c2

I know how to get the desired outcome programatically for one key column:

keycol <- "valueType"
valuecol <- "value"
gathercols <- c("a", "b", "c")

gather_(olddata_wide, keycol, valuecol, gathercols)

but how can I do this for multiple keycols at the same time?

CodePudding user response：

Here is one approach using melt() and dcast() from the data.table package.

setDT(olddata_wide)
setnames(olddata_wide, old=c("a","b","c"),  new=c("a0", "b0", "c0"))

df_long = dcast(
  melt(olddata_wide,c("subject","sex"),variable.name = "v")[,(c("v", "t")):=tstrsplit(v,"")],
  subject sex v~t, value.var="value",
)

df_long[, .(subject, sex, value=`0`,valueType=v, value1=`1`,valueType1 = paste0(v,"1"), value2=`2`,valueType2 = paste0(v,"2"))]

Output:

    subject sex value valueType value1 valueType1 value2 valueType2
 1:       1   M   7.9         a    7.5         a1    8.1         a2
 2:       1   M  12.3         b   12.1         b1   12.5         b2
 3:       1   M  10.7         c   10.3         c1   10.9         c2
 4:       2   F   6.3         a    6.0         a1    6.5         a2
 5:       2   F  10.6         b   10.4         b1   10.9         b2
 6:       2   F  11.1         c   11.0         c1   11.4         c2
 7:       3   F   9.5         a    9.3         a1    9.8         a2
 8:       3   F  13.1         b   13.0         b1   13.5         b2
 9:       3   F  13.8         c   13.5         c1   13.9         c2
10:       4   M  11.5         a   11.2         a1   11.7         a2
11:       4   M  13.4         b   13.5         b1   13.6         b2
12:       4   M  12.9         c   12.7         c1   13.9         c2

CodePudding user response：

Here is a (fairly clunky) tidyverse approach:

olddata_wide %>%
    pivot_longer(matches("^[abc]"), names_to = "valueType") %>%
    mutate(suffix = str_remove(valueType, "^.")) %>%
    pivot_wider(
        names_from = "suffix", values_from = c("value", "valueType"), names_sep = "", values_fn = list) %>%
    unnest(matches("value"))
## A tibble: 12 × 8
#   subject sex   value value1 value2 valueType valueType1 valueType2
#     <int> <chr> <dbl>  <dbl>  <dbl> <chr>     <chr>      <chr>     
# 1       1 M       7.9    7.5    8.1 a         a1         a2        
# 2       1 M      12.3   12.1   12.5 b         b1         b2        
# 3       1 M      10.7   10.3   10.9 c         c1         c2        
# 4       2 F       6.3    6      6.5 a         a1         a2        
# 5       2 F      10.6   10.4   10.9 b         b1         b2        
# 6       2 F      11.1   11     11.4 c         c1         c2        
# 7       3 F       9.5    9.3    9.8 a         a1         a2        
# 8       3 F      13.1   13     13.5 b         b1         b2        
# 9       3 F      13.8   13.5   13.9 c         c1         c2        
#10       4 M      11.5   11.2   11.7 a         a1         a2        
#11       4 M      13.4   13.5   13.6 b         b1         b2        
#12       4 M      12.9   12.7   13.9 c         c1         c2

The general idea is to reshape all columns matching "^[abc]" from wide to long, and then rebuild into a wide format according to your expected output.

CodePudding user response：

There are different solutions.

If your columns have names that don't follow a certain pattern, then I'd go with the sjmisc package.

    sjmisc::reshape_longer(
    olddata_wide ,
    columns = list(
    c("a", "b", "c"),
    c("a1", "b1", "c1"),
    c("a2", "b2", "c2")),
    values.to = c("value", "value1", "value2"))

You could also use data.table.

melt(setDT(olddata_wide), 
     measure = patterns("^[^0-9]$","[abc] 1", "[abc] 2"),
     variable.name = c("id"),
     value.name = c("value","value1", "value2"))