Home > database >  Convert a properly formatted string to data frame
Convert a properly formatted string to data frame

Time:12-26

I have

x<-"1, A | 2, B | 10, C "

x is always this way formatted, | denotes a new row and the first value is the variable1, the second value is variable2.

I would like to convert it to a data.frame

  variable1 variable2
1         1         A
2         2         B
3        10         C

I haven't found any package that can understand the escape character |

How can I convert it to data.frame?

CodePudding user response:

We may use read.table from base R to read the string into two columns after replacing the | with \n

read.table(text = gsub("|", "\n", x, fixed = TRUE), sep=",", 
    header = FALSE, col.names = c("variable1", "variable2"), strip.white = TRUE )

-output

 variable1 variable2
1         1        A 
2         2        B 
3        10        C 

Or use fread from data.table

library(data.table)
fread(gsub("|", "\n", x, fixed = TRUE), col.names = c("variable1", "variable2"))
   variable1 variable2
1:         1         A
2:         2         B
3:        10         C

Or using tidyverse - separate_rows to split the column and then create two columns with separate

library(tidyr)
library(dplyr)
tibble(x = trimws(x)) %>% 
  separate_rows(x, sep = "\\s*\\|\\s*") %>%
  separate(x, into = c("variable1", "variable2"), sep=",\\s ", convert = TRUE)
# A tibble: 3 × 2
  variable1 variable2
      <int> <chr>    
1         1 A        
2         2 B        
3        10 C      

CodePudding user response:

Here's a way using scan().

x <- "1, A | 2, B | 10, C "

do.call(rbind.data.frame,
        strsplit(scan(text=x, what="A", sep='|', quiet=T, strip.white=T), ', ')) |>
  setNames(c('variable1', 'variable2'))
#   variable1 variable2
# 1         1         A
# 2         2         B
# 3        10         C

Note: R version 4.1.2 (2021-11-01).

  •  Tags:  
  • r
  • Related