I am trying to create a long data frame whose values are created from a lookup dataframe
df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))
df_lookup
#> id one two three
#> 1 1 10 0 0
#> 2 2 9 1 0
#> 3 3 7 2 1
The output I am looking for is a data frame with 30 rows, where first ten rows are all 1s, for the next 10 rows the values are nine 1s and one 2, for the last 10 rows the values are seven 1s, two 2s and one 3.
df_output
id bin
1: 1 1
2: 1 1
3: 1 1
4: 1 1
5: 1 1
6: 1 1
7: 1 1
8: 1 1
9: 1 1
10: 1 1
11: 2 1
12: 2 1
13: 2 1
14: 2 1
15: 2 1
16: 2 1
17: 2 1
18: 2 1
19: 2 1
20: 2 2
21: 3 1
22: 3 1
23: 3 1
24: 3 1
25: 3 1
26: 3 1
27: 3 1
28: 3 2
29: 3 2
30: 3 3
Based on some online search for similar questions like here
I was able to come up with the following code
df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))
col_names = c("one","two","three")
setDT(df_lookup)
df_output = data.frame()
for (j in 1:length(col_names)){
temp_df = df_lookup[, .(rep(j, get(as.character(col_names[j])))),.(id)]
df_output = rbind(df_output,temp_df)
}
names(df_output) = c("id","bin")
df_output = df_output[order(df_output$id,df_output$bin),]
While this solves the purpose, it can take some time to run when there are many 'id' or many 'df_lookup' tables which I need to loop through.
So wanted to check if there's any optimal/faster way to achieve 'df_output'
CodePudding user response:
A data.table
solution using melt()
and rep()
library(data.table)
df_lookup = data.frame(id = c(1,2,3),
one = c(10,9,7),
two = c(0,1,2),
three = c(0,0,1))
dt <- data.table::as.data.table(df_lookup)
# into long format
dt_melt <- melt(dt, id.vars = "id")
dt_melt
#> id variable value
#> 1: 1 one 10
#> 2: 2 one 9
#> 3: 3 one 7
#> 4: 1 two 0
#> 5: 2 two 1
#> 6: 3 two 2
#> 7: 1 three 0
#> 8: 2 three 0
#> 9: 3 three 1
dt_exploded <- dt_melt[, rep(variable, value), by = id]
dt_exploded[, bin := data.table::fcase(V1 == "one", 1,
V1 == "two", 2,
V1 == "three", 3)][]
#> id V1 bin
#> 1: 1 one 1
#> 2: 1 one 1
#> 3: 1 one 1
#> 4: 1 one 1
#> 5: 1 one 1
#> 6: 1 one 1
#> 7: 1 one 1
#> 8: 1 one 1
#> 9: 1 one 1
#> 10: 1 one 1
#> 11: 2 one 1
#> 12: 2 one 1
#> 13: 2 one 1
#> 14: 2 one 1
#> 15: 2 one 1
#> 16: 2 one 1
#> 17: 2 one 1
#> 18: 2 one 1
#> 19: 2 one 1
#> 20: 2 two 2
#> 21: 3 one 1
#> 22: 3 one 1
#> 23: 3 one 1
#> 24: 3 one 1
#> 25: 3 one 1
#> 26: 3 one 1
#> 27: 3 one 1
#> 28: 3 two 2
#> 29: 3 two 2
#> 30: 3 three 3
#> id V1 bin
You could do the mapping of one
to 1
at any time though