I have this first dataset, and I want to create the desired dataset by splitting the text in the first dataset, I'm wondering how could I do this:
Basically the new variables will be split after "XYZ-1" or "AAA-2". I appreciate all the help there is!Thanks!
1st dataset:
Name <- c("A B XYZ-1 Where","C AAA-2 When","ABC R SS XYZ-1 Where")
x <- data.frame(Name)
desired dataset:
Name <- c("A B XYZ-1 Where","C AAA-2 When","ABC R SS XYZ-1 Where")
Study <- c("A B XYZ-1","C AAA-2","ABC R SS XYZ-1")
Question <- c("Where","When","Where")
x <- data.frame(Name,Study,Question)
Name Study Question
A B XYZ-1 Where A B XYZ-1 Where
C AAA-2 When C AAA-2 When
ABC R SS XYZ-1 Where ABC R SS XYZ-1 Where
CodePudding user response:
Use separate
- pass a regex lookaround in sep
to match one or more spaces (\\s
) that follows three upper case letters and a -
and a digit ([A-Z]{3}-\\d
) and that precedes an uppercase letter ([A-Z]
)
library(tidyr)
separate(x, Name, into = c("Study", "Question"),
sep = "(?<=[A-Z]{3}-\\d)\\s (?=[A-Z])", remove = FALSE)
-output
Name Study Question
1 A B XYZ-1 Where A B XYZ-1 Where
2 C AAA-2 When C AAA-2 When
3 ABC R SS XYZ-1 Where ABC R SS XYZ-1 Where
CodePudding user response:
Here is a base R solution using strsplit
with regex:
df <- do.call(rbind, strsplit(x$Name, ' (?=[^ ] $)', perl=TRUE)) %>%
data.frame()
colnames(df) <- c("Study", "Question")
cbind(x[1], df)
Name Study Question
1 A B XYZ-1 Where A B XYZ-1 Where
2 C AAA-2 When C AAA-2 When
3 ABC R SS XYZ-1 Where ABC R SS XYZ-1 Where