Home > front end >  Regex separate based on specific space - R (dplyr::separate)
Regex separate based on specific space - R (dplyr::separate)

Time:09-28

I want to separate a column based on a specific space.

For example, I do not want to separate in every space, but only if the space have met certain condition.

I've tried to separate every space that met the condition, but this remove the space and the condition. For example, if I try to remove every space that is preceded by a letter and after there is a number this remove the letter, the space and the number.

In the code I tried using separate from dplyr, but if there is another optimal solution I would take!

Thanks in advance!

Code

library(tidyverse)

df <- tibble(
column = c("Current Assets 3a 10.001", "Cash and Equivalents 2b 1.009", "Debt 2.050" )
)


df %>% 
  dplyr::separate(value,
           into = c("column1","column2","column3"),
           sep = 'insert regex pattern here')

#Ideally i would want something like that

tibble(
  column1 = c("Current Assets", "Cash and Equivalents", "Debt"),
  column2 = c("3a", "2b", NA),
  column3 = c(10.001, 1.009, 2.050)
  
)

CodePudding user response:

You may pass a regex pattern in tidyr::extract.

tidyr::extract(df, column,  c("column1","column2","column3"), 
               '(.*?)\\s(\\d[a-z])?\\s?(\\d \\.\\d )', convert = TRUE)

#  column1              column2 column3
#  <chr>                <chr>     <dbl>
#1 Current Assets       "3a"      10.0 
#2 Cash and Equivalents "2b"       1.01
#3 Debt                 ""         2.05
  • Related