Home > Back-end >  R Repeat values based on value in dataframe
R Repeat values based on value in dataframe

Time:08-19

I am trying to create a long data frame whose values are created from a lookup dataframe

df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))

df_lookup
#>   id one two three
#> 1  1  10   0     0
#> 2  2   9   1     0
#> 3  3   7   2     1

The output I am looking for is a data frame with 30 rows, where first ten rows are all 1s, for the next 10 rows the values are nine 1s and one 2, for the last 10 rows the values are seven 1s, two 2s and one 3.

df_output

   id bin
 1:  1   1
 2:  1   1
 3:  1   1
 4:  1   1
 5:  1   1
 6:  1   1
 7:  1   1
 8:  1   1
 9:  1   1
10:  1   1
11:  2   1
12:  2   1
13:  2   1
14:  2   1
15:  2   1
16:  2   1
17:  2   1
18:  2   1
19:  2   1
20:  2   2
21:  3   1
22:  3   1
23:  3   1
24:  3   1
25:  3   1
26:  3   1
27:  3   1
28:  3   2
29:  3   2
30:  3   3

Based on some online search for similar questions like here

I was able to come up with the following code

df_lookup = data.frame(id = c(1,2,3), one = c(10,9,7), two = c(0,1,2), three = c(0,0,1))

col_names = c("one","two","three")
setDT(df_lookup)

df_output = data.frame()

for (j in 1:length(col_names)){
  temp_df = df_lookup[, .(rep(j, get(as.character(col_names[j])))),.(id)] 
  df_output = rbind(df_output,temp_df) 
}

names(df_output) = c("id","bin")

df_output = df_output[order(df_output$id,df_output$bin),]

While this solves the purpose, it can take some time to run when there are many 'id' or many 'df_lookup' tables which I need to loop through.

So wanted to check if there's any optimal/faster way to achieve 'df_output'

CodePudding user response:

A data.table solution using melt() and rep()

library(data.table)

df_lookup = data.frame(id = c(1,2,3),
                       one = c(10,9,7),
                       two = c(0,1,2), 
                       three = c(0,0,1))

dt <- data.table::as.data.table(df_lookup)

# into long format
dt_melt <- melt(dt, id.vars = "id")
dt_melt
#>    id variable value
#> 1:  1      one    10
#> 2:  2      one     9
#> 3:  3      one     7
#> 4:  1      two     0
#> 5:  2      two     1
#> 6:  3      two     2
#> 7:  1    three     0
#> 8:  2    three     0
#> 9:  3    three     1
dt_exploded <- dt_melt[, rep(variable, value), by = id]
dt_exploded[, bin := data.table::fcase(V1 == "one", 1,
                                       V1 == "two", 2,
                                       V1 == "three", 3)][]
#>     id    V1 bin
#>  1:  1   one   1
#>  2:  1   one   1
#>  3:  1   one   1
#>  4:  1   one   1
#>  5:  1   one   1
#>  6:  1   one   1
#>  7:  1   one   1
#>  8:  1   one   1
#>  9:  1   one   1
#> 10:  1   one   1
#> 11:  2   one   1
#> 12:  2   one   1
#> 13:  2   one   1
#> 14:  2   one   1
#> 15:  2   one   1
#> 16:  2   one   1
#> 17:  2   one   1
#> 18:  2   one   1
#> 19:  2   one   1
#> 20:  2   two   2
#> 21:  3   one   1
#> 22:  3   one   1
#> 23:  3   one   1
#> 24:  3   one   1
#> 25:  3   one   1
#> 26:  3   one   1
#> 27:  3   one   1
#> 28:  3   two   2
#> 29:  3   two   2
#> 30:  3 three   3
#>     id    V1 bin

You could do the mapping of one to 1 at any time though

  • Related