I have a data frame in which one column is a list of vectors, with various lengths. This code creates an example:
df1 <- data.frame(
A=c("able", "baker", "carr"),
B=c("whiskey", "tango", "foxtrot")
)
df1$C <- list(14, c(2,18,32), c(10,6))
The actual data originated in an SPSS database. (BTW, I could not figure out how to create this example with a single statement.)
I'd like to convert it to a data frame like the one created by the following code:
df2 <- data.frame(
A=c("able", rep("baker",3), rep("carr",2)),
B=c("whiskey", rep("tango",3), rep("foxtrot",2)),
C=c(14, 2, 18, 32, 10, 6)
)
I don't want to resort to ugly surgery and looping -- been there, done that.
CodePudding user response:
library(tidyr)
df2 <- unnest(df1, cols = "C")
Result:
# A tibble: 6 × 3
A B C
<chr> <chr> <dbl>
1 able whiskey 14
2 baker tango 2
3 baker tango 18
4 baker tango 32
5 carr foxtrot 10
6 carr foxtrot 6
CodePudding user response:
Not explicit looping, and not sure whether or not this is ugly surgery, but here is a base R approach, much cleaner thanks to @onyambu
# For each row Unroll by length of df1$C: res => data.frame
res <- transform(
df1[
rep(
seq(
nrow(df1)
),
lengths(df1$C)
),
],
C = unlist(df1$C),
row.names = NULL
)
CodePudding user response:
We might rep
eat rows according to the lengths
of the lists and unlist
them.
cbind(df1[rep.int(seq_len(dim(df1)[1]), lengths(df1$C)), -3], C=unlist(df1$C))
# A B C
# 1 able whiskey 14
# 2 baker tango 2
# 2.1 baker tango 18
# 2.2 baker tango 32
# 3 carr foxtrot 10
# 3.1 carr foxtrot 6
Accordingly, to generalize it to data frames like this,
df2
# A B C D E
# 1 able whiskey 14 2, 18, 32 2, 18, 32
# 2 baker tango 2, 18, 32 14 10, 99
# 3 carr foxtrot 10, 6 10, 99 14
we might do:
flatten <- function(dat, cols) {
ls <- as.data.frame(sapply(dat[cols], lengths))
lsp <- apply(ls, 1, prod)
cbind(dat[rep.int(seq_len(dim(dat)[1]), lsp), setdiff(names(dat), cols)],
sapply(cols, function(x)
unlist(Map(function(x, y) rep(x, each=prod(y)),
dat[[x]],
apply(ls[setdiff(names(ls), x)], 1, prod)))))
}
flatten(df2, cols=c('C', 'D', 'E'))
# A B C D E
# 1 able whiskey 14 2 2
# 1.1 able whiskey 14 2 2
# 1.2 able whiskey 14 2 2
# 1.3 able whiskey 14 18 18
# 1.4 able whiskey 14 18 18
# 1.5 able whiskey 14 18 18
# 1.6 able whiskey 14 32 32
# 1.7 able whiskey 14 32 32
# 1.8 able whiskey 14 32 32
# 2 baker tango 2 14 10
# 2.1 baker tango 2 14 10
# 2.2 baker tango 18 14 10
# 2.3 baker tango 18 14 99
# 2.4 baker tango 32 14 99
# 2.5 baker tango 32 14 99
# 3 carr foxtrot 10 10 14
# 3.1 carr foxtrot 10 10 14
# 3.2 carr foxtrot 6 99 14
# 3.3 carr foxtrot 6 99 14
BTW, to create your example with a single statement you may use list2DF
.
list2DF(list(A=c("able", "baker", "carr"),
B=c("whiskey", "tango", "foxtrot"),
C=list(14, c(2, 18, 32), c(10, 6))))
# A B C
# 1 able whiskey 14
# 2 baker tango 2, 18, 32
# 3 carr foxtrot 10, 6
Data:
df1 <- structure(list(A = c("able", "baker", "carr"), B = c("whiskey",
"tango", "foxtrot"), C = list(14, c(2, 18, 32), c(10, 6))), row.names = c(NA,
-3L), class = "data.frame")
df2 <- structure(list(A = c("able", "baker", "carr"), B = c("whiskey",
"tango", "foxtrot"), C = list(14, c(2, 18, 32), c(10, 6)), D = list(
c(2, 18, 32), 14, c(10, 99)), E = list(c(2, 18, 32), c(10,
99), 14)), row.names = c(NA, -3L), class = "data.frame")