Home > Net >  extracting data from a data frame row, performing internal lookup, and restructuring into long forma
extracting data from a data frame row, performing internal lookup, and restructuring into long forma

Time:11-22

I've been beating my head against this for awhile and was hoping for some suggestions. I'm trying to extract semicolon delimited text from a row in a data frame, performing an internal lookup on a string in that row based on the extracted values, and then outputting that (along with another extracted variable) into a long format...and then repeating for every row in the data frame. I can do the first and last manipulations with str_split, and I think i could just loop everything with apply, but the internal lookup (join?) has me tied in knots. I'd like to imagine that i could do this w/ dplyr but

Starting with a data frame:

name<-"Adam, B.C.; Dave, E.F.; Gerald, H."
school<-"[Adam, B.C.; Gerald, H.]U.Penn; [Dave, E.F.]U.Georgia"
index<-12345
foo<-data.frame(name,school,index)
foo
                                name                                                school index
1 Adam, B.C.; Dave, E.F.; Gerald, H. [Adam, B.C.; Gerald, H.]U.Penn; [Dave, E.F.]U.Georgia 12345

Desired output:

name  school  index
Adam, B.C.  U.Penn  12345
Dave, E.F.  U.Georgia 12345
Gerald, H.  U.Penn 12345

etc. etc. etc.

thanks!

CodePudding user response:

A mixture of tidyr::separate() and tidyr::seperate_rows() could do the trick:

library(tidyverse)
foo |> 
  tidyr::separate_rows(school, sep = "\\[", convert = T) |> 
  tidyr::separate(col = school, into = c("name", "school"), sep = "]") |> 
  tidyr::separate_rows(name, sep = ";", convert = T) |> 
  slice(-1) |> 
  mutate(across(everything(), trimws)) |> 
  mutate(across(everything(), str_remove, ";" ))

Output:

# A tibble: 3 x 3
  index name       school   
  <chr> <chr>      <chr>    
1 12345 Adam, B.C. U.Penn   
2 12345 Gerald, H. U.Penn   
3 12345 Dave, E.F. U.Georgia
  • Related