Home > Enterprise >  R data.table - Is there a more efficient way of melting multiple ways and merging results?
R data.table - Is there a more efficient way of melting multiple ways and merging results?

Time:11-03

I am attempting to melt a data.table in 2 different ways. I then end up having to merge the results - it's awkward because I have different measure.vars and wouldn't be scaleable if I had a wider table / more complex column names.

I start with this data.table:

    id p1 p2     p1_pos     p2_pos
 1:  1  A  F 0.70644404 0.75523969
 2:  2  B  G 0.96798381 0.26280453
 3:  3  C  H 0.35558517 0.45418777
 4:  4  D  I 0.14662296 0.01969177
 5:  5  E  J 0.45155647 0.41373110
 6:  6  A  F 0.81074292 0.19421395
 7:  7  B  G 0.49014540 0.02094569
 8:  8  C  H 0.01445689 0.20199638
 9:  9  D  I 0.80327645 0.73982715
10: 10  E  J 0.17625955 0.88250913

Next I proceed to melt twice and merge as so:

dat = data.table(id = as.character(rep(1:10)),
           p1 = rep(c("A", "B", "C", "D", "E"), 2),
           p2 = rep(c("F", "G", "H", "I", "J"), 2),
           p1_pos = runif(10),
           p2_pos = runif(10))

           
first_melt = melt(dat, id.vars = "id", 
                  measure.vars = c("p1", "p2"), 
                  variable.name = "loc", 
                  value.name = "name", 
                  value.factor = F, 
                  variable.factor = F)

second_melt = melt(dat,
                   id.vars = "id", 
                   measure.vars = c("p1_pos", "p2_pos"), 
                   variable.name = "loc", 
                   value.name = "pos", 
                   value.factor = F, 
                   variable.factor = F)

second_melt[, loc := substr(loc, 1,2)]
result = merge(first_melt, second_melt, by = c("id", "loc"))
result[order(id)]

The awkwardness comes from having different "measure.vars" that then need to be merged on.

This produces the desired result though:

    id loc name        pos
 1:  1  p1    A 0.70644404
 2:  1  p2    F 0.75523969
 3: 10  p1    E 0.17625955
 4: 10  p2    J 0.88250913
 5:  2  p1    B 0.96798381
 6:  2  p2    G 0.26280453
 7:  3  p1    C 0.35558517
 8:  3  p2    H 0.45418777
 9:  4  p1    D 0.14662296
10:  4  p2    I 0.01969177
11:  5  p1    E 0.45155647
12:  5  p2    J 0.41373110
13:  6  p1    A 0.81074292
14:  6  p2    F 0.19421395
15:  7  p1    B 0.49014540
16:  7  p2    G 0.02094569
17:  8  p1    C 0.01445689
18:  8  p2    H 0.20199638
19:  9  p1    D 0.80327645
20:  9  p2    I 0.73982715

My question is whether these is a more efficient way to do this (ie. in a single melt command)? Or am I approaching this as best as is possible?

CodePudding user response:

This is simple way to merge two data frame by one same column name

B = merge(df,otherdataname,by.x=1,by.y=1)

CodePudding user response:

How about this?

data.table::melt(
  dat, id.vars = "id", measure.vars = patterns("p[0-9] $", "p[0-9] _pos"), 
  variable.name = "loc", value.name = c("name", "pos")
)[, loc := paste0("p", loc)][]
#        id    loc   name        pos
#     <int> <char> <char>      <num>
#  1:     1     p1      A 0.70644404
#  2:     2     p1      B 0.96798381
#  3:     3     p1      C 0.35558517
#  4:     4     p1      D 0.14662296
#  5:     5     p1      E 0.45155647
#  6:     6     p1      A 0.81074292
#  7:     7     p1      B 0.49014540
#  8:     8     p1      C 0.01445689
#  9:     9     p1      D 0.80327645
# 10:    10     p1      E 0.17625955
# 11:     1     p2      F 0.75523969
# 12:     2     p2      G 0.26280453
# 13:     3     p2      H 0.45418777
# 14:     4     p2      I 0.01969177
# 15:     5     p2      J 0.41373110
# 16:     6     p2      F 0.19421395
# 17:     7     p2      G 0.02094569
# 18:     8     p2      H 0.20199638
# 19:     9     p2      I 0.73982715
# 20:    10     p2      J 0.88250913
#        id    loc   name        pos

Data

library(data.table)
dat <- setDT(structure(list(id = 1:10, p1 = c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E"), p2 = c("F", "G", "H", "I", "J", "F", "G", "H", "I", "J"), p1_pos = c(0.70644404, 0.96798381, 0.35558517, 0.14662296, 0.45155647, 0.81074292, 0.4901454, 0.01445689, 0.80327645, 0.17625955), p2_pos = c(0.75523969, 0.26280453, 0.45418777, 0.01969177, 0.4137311, 0.19421395, 0.02094569, 0.20199638, 0.73982715, 0.88250913)), row.names = c(NA, -10L), class = c("data.table", "data.frame"))_
  • Related