I am attempting to melt
a data.table
in 2 different ways. I then end up having to merge the results - it's awkward because I have different measure.vars
and wouldn't be scaleable if I had a wider table / more complex column names.
I start with this data.table:
id p1 p2 p1_pos p2_pos
1: 1 A F 0.70644404 0.75523969
2: 2 B G 0.96798381 0.26280453
3: 3 C H 0.35558517 0.45418777
4: 4 D I 0.14662296 0.01969177
5: 5 E J 0.45155647 0.41373110
6: 6 A F 0.81074292 0.19421395
7: 7 B G 0.49014540 0.02094569
8: 8 C H 0.01445689 0.20199638
9: 9 D I 0.80327645 0.73982715
10: 10 E J 0.17625955 0.88250913
Next I proceed to melt twice and merge as so:
dat = data.table(id = as.character(rep(1:10)),
p1 = rep(c("A", "B", "C", "D", "E"), 2),
p2 = rep(c("F", "G", "H", "I", "J"), 2),
p1_pos = runif(10),
p2_pos = runif(10))
first_melt = melt(dat, id.vars = "id",
measure.vars = c("p1", "p2"),
variable.name = "loc",
value.name = "name",
value.factor = F,
variable.factor = F)
second_melt = melt(dat,
id.vars = "id",
measure.vars = c("p1_pos", "p2_pos"),
variable.name = "loc",
value.name = "pos",
value.factor = F,
variable.factor = F)
second_melt[, loc := substr(loc, 1,2)]
result = merge(first_melt, second_melt, by = c("id", "loc"))
result[order(id)]
The awkwardness comes from having different "measure.vars" that then need to be merged on.
This produces the desired result though:
id loc name pos
1: 1 p1 A 0.70644404
2: 1 p2 F 0.75523969
3: 10 p1 E 0.17625955
4: 10 p2 J 0.88250913
5: 2 p1 B 0.96798381
6: 2 p2 G 0.26280453
7: 3 p1 C 0.35558517
8: 3 p2 H 0.45418777
9: 4 p1 D 0.14662296
10: 4 p2 I 0.01969177
11: 5 p1 E 0.45155647
12: 5 p2 J 0.41373110
13: 6 p1 A 0.81074292
14: 6 p2 F 0.19421395
15: 7 p1 B 0.49014540
16: 7 p2 G 0.02094569
17: 8 p1 C 0.01445689
18: 8 p2 H 0.20199638
19: 9 p1 D 0.80327645
20: 9 p2 I 0.73982715
My question is whether these is a more efficient way to do this (ie. in a single melt
command)? Or am I approaching this as best as is possible?
CodePudding user response:
This is simple way to merge two data frame by one same column name
B = merge(df,otherdataname,by.x=1,by.y=1)
CodePudding user response:
How about this?
data.table::melt(
dat, id.vars = "id", measure.vars = patterns("p[0-9] $", "p[0-9] _pos"),
variable.name = "loc", value.name = c("name", "pos")
)[, loc := paste0("p", loc)][]
# id loc name pos
# <int> <char> <char> <num>
# 1: 1 p1 A 0.70644404
# 2: 2 p1 B 0.96798381
# 3: 3 p1 C 0.35558517
# 4: 4 p1 D 0.14662296
# 5: 5 p1 E 0.45155647
# 6: 6 p1 A 0.81074292
# 7: 7 p1 B 0.49014540
# 8: 8 p1 C 0.01445689
# 9: 9 p1 D 0.80327645
# 10: 10 p1 E 0.17625955
# 11: 1 p2 F 0.75523969
# 12: 2 p2 G 0.26280453
# 13: 3 p2 H 0.45418777
# 14: 4 p2 I 0.01969177
# 15: 5 p2 J 0.41373110
# 16: 6 p2 F 0.19421395
# 17: 7 p2 G 0.02094569
# 18: 8 p2 H 0.20199638
# 19: 9 p2 I 0.73982715
# 20: 10 p2 J 0.88250913
# id loc name pos
Data
library(data.table)
dat <- setDT(structure(list(id = 1:10, p1 = c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E"), p2 = c("F", "G", "H", "I", "J", "F", "G", "H", "I", "J"), p1_pos = c(0.70644404, 0.96798381, 0.35558517, 0.14662296, 0.45155647, 0.81074292, 0.4901454, 0.01445689, 0.80327645, 0.17625955), p2_pos = c(0.75523969, 0.26280453, 0.45418777, 0.01969177, 0.4137311, 0.19421395, 0.02094569, 0.20199638, 0.73982715, 0.88250913)), row.names = c(NA, -10L), class = c("data.table", "data.frame"))_