Home > OS >  R Split one column in a df into different columns based on length
R Split one column in a df into different columns based on length

Time:12-09

I have a data frame with a column containing text data. I want to SPLIT that one column into many columns based on length. Like say, the column has text that says, "I need more data today," I want to split it such that the max every split column has is five long. So, in this case, the total length is 22. I need the first 5 in col1, the next 5 in col2, the next 5 in col3, and so.

CodePudding user response:

You could use read.fwf: A function to read fixed widths:

n <- 5
mx <- max(nchar(txt))
read.fwf(textConnection(txt),c(rep(n, mx%/%n), mx%%n))

    V1    V2    V3    V4   V5
1 I nee d mor e dat a tod   ay
2 I nee     d  <NA>  <NA> <NA>
3     I  <NA>  <NA>  <NA> <NA>
4 I nee d mor e dat     a <NA>

CodePudding user response:

Here is one possible way using strsplit to split at every n characters (here, five) into a list, do.call to rbind the list elements, and lapply to get the max lengths for the number of columns:

Example data:

txt <- c("I need more data today", "I need", "I", "I need more data")

Code

n <- 5
splttxt <- strsplit(txt, 
                    paste0("(?<=", paste(rep(".", times = n), 
                                         collapse = ""), ")"),
                    perl = TRUE)
finaltxt <- do.call(rbind, lapply(lapply(splttxt, unlist), 
                                          `length<-`, max(lengths(splttxt))))
colnames(finaltxt) <- paste0("col", 1:ncol(finaltxt))

Output:

#      col1    col2    col3    col4    col5
# [1,] "I nee" "d mor" "e dat" "a tod" "ay"
# [2,] "I nee" "d"     NA      NA      NA  
# [3,] "I"     NA      NA      NA      NA  
# [4,] "I nee" "d mor" "e dat" "a"     NA  
  • Related