I have a data frame and for one of the variables I need to split each of the observations by “,”
I used:
y <- strsplit(as.character(x), “,”)
I get a data set that shows every split character in a row not in the same row they were in before
I have this: “a,b,c,d…” And need this: “a” “b” “c”… For each row
CodePudding user response:
strsplit
returns a list
of vector
s. If we have elements with different number of ,
, the lengths
of the list
will be different. In that case, pad NA
at the end (general case) based on the max
imum length
s of the list
and then rbind
to create a matrix
in base R
# assuming the data.frame object name as 'df1', split the column x
# by `,` followed by zero or more spaces `\\s*`)
lst1 <- with(df1, strpslit(as.character(x), ",\\s*"))
# find the max lengths of the list
mx <- max(lengths(lst1))
# pad NA at the end for elements with lesser length `length<-`
# and rbind the list elements
out <- do.call(rbind, lapply(lst1, `length<-`, mx))
This can also be done with tidyverse
after splitting into a list
library(dplyr)
library(tidyr)
df1 %>%
mutate(y = strsplit(as.character(x), ",\\s*")) %>%
unnest_wider(y, names_sep = "")
CodePudding user response:
you could use separate()
from tidyr()
and dplyr()
library(tidyr)
library(dplyr)
#Create data
data <- tibble(rep(c("a,b,c", "ab,c", "cb,a"),5)) %>%
set_names("var1")
data %>%
separate(var1, into = c("var2", "var3", "var4"), #Names of new columns
sep = ",", #Specify to separate at comma
fill = "right", #Pad remaining side with NA
remove = FALSE) #Keep original variable