I've been beating my head against this for awhile and was hoping for some suggestions. I'm trying to extract semicolon delimited text from a row in a data frame, performing an internal lookup on a string in that row based on the extracted values, and then outputting that (along with another extracted variable) into a long format...and then repeating for every row in the data frame. I can do the first and last manipulations with str_split, and I think i could just loop everything with apply, but the internal lookup (join?) has me tied in knots. I'd like to imagine that i could do this w/ dplyr but
Starting with a data frame:
name<-"Adam, B.C.; Dave, E.F.; Gerald, H."
school<-"[Adam, B.C.; Gerald, H.]U.Penn; [Dave, E.F.]U.Georgia"
index<-12345
foo<-data.frame(name,school,index)
foo
name school index
1 Adam, B.C.; Dave, E.F.; Gerald, H. [Adam, B.C.; Gerald, H.]U.Penn; [Dave, E.F.]U.Georgia 12345
Desired output:
name school index
Adam, B.C. U.Penn 12345
Dave, E.F. U.Georgia 12345
Gerald, H. U.Penn 12345
etc. etc. etc.
thanks!
CodePudding user response:
A mixture of tidyr::separate()
and tidyr::seperate_rows()
could do the trick:
library(tidyverse)
foo |>
tidyr::separate_rows(school, sep = "\\[", convert = T) |>
tidyr::separate(col = school, into = c("name", "school"), sep = "]") |>
tidyr::separate_rows(name, sep = ";", convert = T) |>
slice(-1) |>
mutate(across(everything(), trimws)) |>
mutate(across(everything(), str_remove, ";" ))
Output:
# A tibble: 3 x 3
index name school
<chr> <chr> <chr>
1 12345 Adam, B.C. U.Penn
2 12345 Gerald, H. U.Penn
3 12345 Dave, E.F. U.Georgia