I have a data column called col1
in a dataframe df
with each value formatted something like the following:
{"option1":"option2","option3":4,"options":[0.1,0.9]}
How do I clean this up so that each value in this field only reads the first number in the hard brackets? (i.e. 0.1)
CodePudding user response:
string <- '{"option1":"option2","option3":4,"options":[0.1,0.9]}'
(a <- jsonlite::fromJSON(string))
$option1
[1] "option2"
$option3
[1] 4
$options
[1] 0.1 0.9
if you want the first value:
a$options[1]
[1] 0.1
CodePudding user response:
df <- data.frame(col1 = '{"option1":"option2","option3":4,"options":[0.1,0.9]}')
df %>%
mutate(
num1 = as.numeric(gsub('.*\\[([\\d.] )(.*)', '\\1', col1, perl = T))
)
col1 num1
1 {"option1":"option2","option3":4,"options":[0.1,0.9]} 0.1
CodePudding user response:
Using gsub twice, remove all up to "["
, then remove anything after ","
, and convert to numeric:
x <- '{"option1":"option2","option3":4,"options":[0.1,0.9]}'
as.numeric(gsub(",.*", "", gsub(".*\\[", "", x)))
# [1] 0.1
(There must a better single pass regex solution)
CodePudding user response:
Here is an alternative solution using parse_number
from readr
package combined with stringr
s str_extract
:
\\[.*?\\]
... matches all between square brackets and parse_number
gets the first number:
library(readr)
library(stringr)
parse_number(str_extract(string, '\\[.*?\\]'))
[1] 0.1