I have a tibble
that includes some list-columns.
library(dplyr)
df <- structure(
list(ID = 1:5, V1 = list(1.71, -0.59, 0.73, -0.93, 0.18),
V2 = list(-0.08, c(0.59, 0.87), -1.87, -1.38, 0.83),
V3 = list(-0.25, -0.02, -0.97, -1.62, 0.54),
V4 = list(-0.12, 0.73, -0.36, 0.55, c(0.92, -0.49)),
V5 = list(c(-0.11, 0.14), -0.2, c(-1.12, -0.91), 0.14, c(1.56, 0.33))),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
# # A tibble: 5 × 6
# ID V1 V2 V3 V4 V5
# <int> <list> <list> <list> <list> <list>
# 1 1 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <dbl [2]>
# 2 2 <dbl [1]> <dbl [2]> <dbl [1]> <dbl [1]> <dbl [1]>
# 3 3 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <dbl [2]>
# 4 4 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
# 5 5 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [2]> <dbl [2]>
I want to simplify those list-columns where all cells have lengths 1, i.e. V1
and V3
, into vector-columns. If there is any cell whose number is greater than 1, just keep that column as it is. The expected output is following:
# A tibble: 5 × 6
ID V1 V2 V3 V4 V5
<int> <dbl> <list> <dbl> <list> <list>
1 1 1.71 <dbl [1]> -0.25 <dbl [1]> <dbl [2]>
2 2 -0.59 <dbl [2]> -0.02 <dbl [1]> <dbl [1]>
3 3 0.73 <dbl [1]> -0.97 <dbl [1]> <dbl [2]>
4 4 -0.93 <dbl [1]> -1.62 <dbl [1]> <dbl [1]>
5 5 0.18 <dbl [1]> 0.54 <dbl [2]> <dbl [2]>
I have achieved it with tedious lapply()
and if
statements. I look forward to a tidyverse
solution or a neat base
one. Thanks for any help.
CodePudding user response:
You can use where()
in across()
to determine which list-columns are all length 1.
library(dplyr)
df %>%
mutate(across(where(~ all(lengths(.x) == 1)), unlist))
# # A tibble: 5 × 6
# ID V1 V2 V3 V4 V5
# <int> <dbl> <list> <dbl> <list> <list>
# 1 1 1.71 <dbl [1]> -0.25 <dbl [1]> <dbl [2]>
# 2 2 -0.59 <dbl [2]> -0.02 <dbl [1]> <dbl [1]>
# 3 3 0.73 <dbl [1]> -0.97 <dbl [1]> <dbl [2]>
# 4 4 -0.93 <dbl [1]> -1.62 <dbl [1]> <dbl [1]>
# 5 5 0.18 <dbl [1]> 0.54 <dbl [2]> <dbl [2]>
CodePudding user response:
A base R solution can be,
i1 <- sapply(df, \(i)all(lengths(i) == 1))
df[i1] <- lapply(df[i1], unlist)
df
# A tibble: 5 × 6
ID V1 V2 V3 V4 V5
<int> <dbl> <list> <dbl> <list> <list>
1 1 1.71 <dbl [1]> -0.25 <dbl [1]> <dbl [2]>
2 2 -0.59 <dbl [2]> -0.02 <dbl [1]> <dbl [1]>
3 3 0.73 <dbl [1]> -0.97 <dbl [1]> <dbl [2]>
4 4 -0.93 <dbl [1]> -1.62 <dbl [1]> <dbl [1]>
5 5 0.18 <dbl [1]> 0.54 <dbl [2]> <dbl [2]>