Home > Enterprise >  How can I flatten/expand lists embedded in a data frame?
How can I flatten/expand lists embedded in a data frame?

Time:06-23

I have a data frame in which one column is a list of vectors, with various lengths. This code creates an example:

df1 <- data.frame(
  A=c("able", "baker", "carr"),
  B=c("whiskey", "tango", "foxtrot")
)
df1$C <- list(14, c(2,18,32), c(10,6))

The actual data originated in an SPSS database. (BTW, I could not figure out how to create this example with a single statement.)

I'd like to convert it to a data frame like the one created by the following code:

df2 <- data.frame(
  A=c("able", rep("baker",3), rep("carr",2)),
  B=c("whiskey", rep("tango",3), rep("foxtrot",2)),
  C=c(14, 2, 18, 32, 10, 6)
)

I don't want to resort to ugly surgery and looping -- been there, done that.

CodePudding user response:

library(tidyr)
df2 <- unnest(df1, cols = "C")

Result:

# A tibble: 6 × 3
  A     B           C
  <chr> <chr>   <dbl>
1 able  whiskey    14
2 baker tango       2
3 baker tango      18
4 baker tango      32
5 carr  foxtrot    10
6 carr  foxtrot     6

CodePudding user response:

Not explicit looping, and not sure whether or not this is ugly surgery, but here is a base R approach, much cleaner thanks to @onyambu

# For each row Unroll by length of df1$C: res => data.frame
res <- transform(
  df1[
    rep(
      seq(
        nrow(df1)
      ),
      lengths(df1$C)
    ),
  ], 
  C = unlist(df1$C), 
  row.names = NULL
)

CodePudding user response:

We might repeat rows according to the lengths of the lists and unlist them.

cbind(df1[rep.int(seq_len(dim(df1)[1]), lengths(df1$C)), -3], C=unlist(df1$C))
#         A       B  C
# 1    able whiskey 14
# 2   baker   tango  2
# 2.1 baker   tango 18
# 2.2 baker   tango 32
# 3    carr foxtrot 10
# 3.1  carr foxtrot  6

Accordingly, to generalize it to data frames like this,

df2
#       A       B         C         D         E
# 1  able whiskey        14 2, 18, 32 2, 18, 32
# 2 baker   tango 2, 18, 32        14    10, 99
# 3  carr foxtrot     10, 6    10, 99        14

we might do:

flatten <- function(dat, cols) {
  ls <- as.data.frame(sapply(dat[cols], lengths))
  lsp <- apply(ls, 1, prod)
  cbind(dat[rep.int(seq_len(dim(dat)[1]), lsp), setdiff(names(dat), cols)], 
        sapply(cols, function(x) 
          unlist(Map(function(x, y) rep(x, each=prod(y)), 
                     dat[[x]], 
                     apply(ls[setdiff(names(ls), x)], 1, prod)))))
}

flatten(df2, cols=c('C', 'D', 'E'))
#         A       B  C  D  E
# 1    able whiskey 14  2  2
# 1.1  able whiskey 14  2  2
# 1.2  able whiskey 14  2  2
# 1.3  able whiskey 14 18 18
# 1.4  able whiskey 14 18 18
# 1.5  able whiskey 14 18 18
# 1.6  able whiskey 14 32 32
# 1.7  able whiskey 14 32 32
# 1.8  able whiskey 14 32 32
# 2   baker   tango  2 14 10
# 2.1 baker   tango  2 14 10
# 2.2 baker   tango 18 14 10
# 2.3 baker   tango 18 14 99
# 2.4 baker   tango 32 14 99
# 2.5 baker   tango 32 14 99
# 3    carr foxtrot 10 10 14
# 3.1  carr foxtrot 10 10 14
# 3.2  carr foxtrot  6 99 14
# 3.3  carr foxtrot  6 99 14

BTW, to create your example with a single statement you may use list2DF.

list2DF(list(A=c("able", "baker", "carr"),
             B=c("whiskey", "tango", "foxtrot"),
             C=list(14, c(2, 18, 32), c(10, 6))))
#       A       B         C
# 1  able whiskey        14
# 2 baker   tango 2, 18, 32
# 3  carr foxtrot     10, 6

Data:

df1 <- structure(list(A = c("able", "baker", "carr"), B = c("whiskey", 
"tango", "foxtrot"), C = list(14, c(2, 18, 32), c(10, 6))), row.names = c(NA, 
-3L), class = "data.frame")

df2 <- structure(list(A = c("able", "baker", "carr"), B = c("whiskey", 
"tango", "foxtrot"), C = list(14, c(2, 18, 32), c(10, 6)), D = list(
    c(2, 18, 32), 14, c(10, 99)), E = list(c(2, 18, 32), c(10, 
99), 14)), row.names = c(NA, -3L), class = "data.frame")
  • Related