I have a data frame with a column containing text data. I want to SPLIT that one column into many columns based on length. Like say, the column has text that says, "I need more data today," I want to split it such that the max every split column has is five long. So, in this case, the total length is 22. I need the first 5 in col1, the next 5 in col2, the next 5 in col3, and so.
CodePudding user response:
You could use read.fwf
: A function to read fixed widths:
n <- 5
mx <- max(nchar(txt))
read.fwf(textConnection(txt),c(rep(n, mx%/%n), mx%%n))
V1 V2 V3 V4 V5
1 I nee d mor e dat a tod ay
2 I nee d <NA> <NA> <NA>
3 I <NA> <NA> <NA> <NA>
4 I nee d mor e dat a <NA>
CodePudding user response:
Here is one possible way using strsplit
to split at every n
characters (here, five) into a list, do.call
to rbind
the list elements, and lapply
to get the max lengths for the number of columns:
Example data:
txt <- c("I need more data today", "I need", "I", "I need more data")
Code
n <- 5
splttxt <- strsplit(txt,
paste0("(?<=", paste(rep(".", times = n),
collapse = ""), ")"),
perl = TRUE)
finaltxt <- do.call(rbind, lapply(lapply(splttxt, unlist),
`length<-`, max(lengths(splttxt))))
colnames(finaltxt) <- paste0("col", 1:ncol(finaltxt))
Output:
# col1 col2 col3 col4 col5
# [1,] "I nee" "d mor" "e dat" "a tod" "ay"
# [2,] "I nee" "d" NA NA NA
# [3,] "I" NA NA NA NA
# [4,] "I nee" "d mor" "e dat" "a" NA