I have a string containing words, whitespace and numbers (integers and decimals). I want to separate them into two columns in a data frame so that column A
contains the text and column B
contains the number. It seems like a super simple task but I cannot figure out how to capture the text. I did capture the numbers though.
require(tidyr)
df <- data.frame(x = c("This is text0", "This is a bit more text 0.01", "Even more text12.231"))
Captured the number in column B
but I cannot figure out how what regex to put in the first set of parentheses to get the text in A
:
df |>
extract(x, c("A", "B"), "()(\\d \\.*\\d*)")
# A B
#1 0
#2 0.01
#3 12.231
CodePudding user response:
You can use
extract(x, c("A", "B"), "^(.*?)\\s*(\\d (?:\\.\\d )?)$")
See the regex demo
Details:
^
- start of string(.*?)
- Group 1: any zero or more chars other than line break chars as few as possible\s*
- zero or more whitespaces(\d (?:\.\d )?)
- Group 2: one or more digits and then an optional sequence of.
and one or more digits$
- end of string
CodePudding user response:
We capture one or more letters/space (([A-Za-z ] )
) followed by any space and the digits with . ([0-9.]
)
library(tidyr)
extract(df, x, into = c("A", "B"), "([A-Za-z ] )\\s*([0-9.] )", convert = TRUE)
A B
1 This is text 0.000
2 This is a bit more text 0.010
3 Even more text 12.231