Home > Software design >  R separate variables in one row of multiple columns into multiple rows
R separate variables in one row of multiple columns into multiple rows

Time:06-22

I have a data frame with 563 columns

Each column currently has one row containing multiple values of different length

For example:

col_1            col_2               col_3       ...   col_563  
c("1","2","3")   c("1","2","3"...)   c("1","2")        c("1","2","3"...)

I want to separate the value in the columns into multiple rows:

col_1     col_2   col_3   ...   col_563  
"1"       "1"     "1"           "1"
"2"       "2"     "2"           "2"
"3"       "3"                   "3"
          "4"                   "4" 
          "5"

I have tried:

separate_rows(df, "row1":"row563", convert = TRUE)

But I got the error:

Error in `fn()`:
! In row 1, can't recycle input of size 778 to size 124.

Does anyone know how I should proceed?

Sorry if this question has already been posted. I've spent several hours searching and haven't an answer.

Thank you!

CodePudding user response:

In Base R:

a <- unlist(df, FALSE)
data.frame(lapply(a, `length<-`, max(lengths(a))))

  col_1 col_2 col3 col_5
1     1     1    1     1
2     2     2    2     2
3     3     3   NA     3
4    NA     4   NA     4
5    NA     5   NA    NA

You could do:

library(tidyverse)

flatten(df) %>%
   map_dfc(`length<-`, max(length(.)))

    # A tibble: 4 x 4
  col_1 col_2  col3 col_5
  <int> <int> <int> <int>
1     1     1     1     1
2     2     2     2     2
3     3     3    NA     3
4    NA     4    NA     4

where the data looks as:

df <- structure(list(col_1 = list(1:3), col_2 = list(1:5), col3 = list(
1:2), col_5 = list(1:4)), row.names = c(NA, -1L), class = "data.frame")
  • Related