Home > database >  Simplify list-columns where all cells have lengths 1 into vector-columns
Simplify list-columns where all cells have lengths 1 into vector-columns

Time:01-13

I have a tibble that includes some list-columns.

library(dplyr)

df <- structure(
  list(ID = 1:5, V1 = list(1.71, -0.59, 0.73, -0.93, 0.18),
       V2 = list(-0.08, c(0.59, 0.87), -1.87, -1.38, 0.83), 
       V3 = list(-0.25, -0.02, -0.97, -1.62, 0.54),
       V4 = list(-0.12, 0.73, -0.36, 0.55, c(0.92, -0.49)),
       V5 = list(c(-0.11, 0.14), -0.2, c(-1.12, -0.91), 0.14, c(1.56, 0.33))),
  row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

# # A tibble: 5 × 6
#      ID V1        V2        V3        V4        V5       
#   <int> <list>    <list>    <list>    <list>    <list>   
# 1     1 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <dbl [2]>
# 2     2 <dbl [1]> <dbl [2]> <dbl [1]> <dbl [1]> <dbl [1]>
# 3     3 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <dbl [2]>
# 4     4 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
# 5     5 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [2]> <dbl [2]>

I want to simplify those list-columns where all cells have lengths 1, i.e. V1 and V3, into vector-columns. If there is any cell whose number is greater than 1, just keep that column as it is. The expected output is following:

# A tibble: 5 × 6
     ID    V1 V2           V3 V4        V5       
  <int> <dbl> <list>    <dbl> <list>    <list>   
1     1  1.71 <dbl [1]> -0.25 <dbl [1]> <dbl [2]>
2     2 -0.59 <dbl [2]> -0.02 <dbl [1]> <dbl [1]>
3     3  0.73 <dbl [1]> -0.97 <dbl [1]> <dbl [2]>
4     4 -0.93 <dbl [1]> -1.62 <dbl [1]> <dbl [1]>
5     5  0.18 <dbl [1]>  0.54 <dbl [2]> <dbl [2]>

I have achieved it with tedious lapply() and if statements. I look forward to a tidyverse solution or a neat base one. Thanks for any help.

CodePudding user response:

You can use where() in across() to determine which list-columns are all length 1.

library(dplyr)

df %>%
  mutate(across(where(~ all(lengths(.x) == 1)), unlist))

# # A tibble: 5 × 6
#      ID    V1 V2           V3 V4        V5       
#   <int> <dbl> <list>    <dbl> <list>    <list>   
# 1     1  1.71 <dbl [1]> -0.25 <dbl [1]> <dbl [2]>
# 2     2 -0.59 <dbl [2]> -0.02 <dbl [1]> <dbl [1]>
# 3     3  0.73 <dbl [1]> -0.97 <dbl [1]> <dbl [2]>
# 4     4 -0.93 <dbl [1]> -1.62 <dbl [1]> <dbl [1]>
# 5     5  0.18 <dbl [1]>  0.54 <dbl [2]> <dbl [2]>

CodePudding user response:

A base R solution can be,

i1 <- sapply(df, \(i)all(lengths(i) == 1))
df[i1] <- lapply(df[i1], unlist)

df
# A tibble: 5 × 6
     ID    V1 V2           V3 V4        V5       
  <int> <dbl> <list>    <dbl> <list>    <list>   
1     1  1.71 <dbl [1]> -0.25 <dbl [1]> <dbl [2]>
2     2 -0.59 <dbl [2]> -0.02 <dbl [1]> <dbl [1]>
3     3  0.73 <dbl [1]> -0.97 <dbl [1]> <dbl [2]>
4     4 -0.93 <dbl [1]> -1.62 <dbl [1]> <dbl [1]>
5     5  0.18 <dbl [1]>  0.54 <dbl [2]> <dbl [2]>
  • Related