Home > Blockchain >  Regex - separate multiple words and whitespace from decimal numbers at the end
Regex - separate multiple words and whitespace from decimal numbers at the end

Time:11-15

I have a string containing words, whitespace and numbers (integers and decimals). I want to separate them into two columns in a data frame so that column A contains the text and column B contains the number. It seems like a super simple task but I cannot figure out how to capture the text. I did capture the numbers though.

require(tidyr)
df <- data.frame(x = c("This is text0", "This is a bit more text 0.01", "Even more text12.231"))

Captured the number in column B but I cannot figure out how what regex to put in the first set of parentheses to get the text in A:

df |> 
  extract(x, c("A", "B"), "()(\\d \\.*\\d*)")
#  A      B
#1        0
#2     0.01
#3   12.231

CodePudding user response:

You can use

extract(x, c("A", "B"), "^(.*?)\\s*(\\d (?:\\.\\d )?)$")

See the regex demo

Details:

  • ^ - start of string
  • (.*?) - Group 1: any zero or more chars other than line break chars as few as possible
  • \s* - zero or more whitespaces
  • (\d (?:\.\d )?) - Group 2: one or more digits and then an optional sequence of . and one or more digits
  • $ - end of string

CodePudding user response:

We capture one or more letters/space (([A-Za-z ] )) followed by any space and the digits with . ([0-9.] )

library(tidyr)
extract(df, x, into = c("A", "B"), "([A-Za-z ] )\\s*([0-9.] )", convert = TRUE)
                         A      B
1             This is text  0.000
2 This is a bit more text   0.010
3           Even more text 12.231
  • Related