How do I get only a portion of this string to populate a column in R?-CodePudding

I have a data column called col1 in a dataframe df with each value formatted something like the following:

{"option1":"option2","option3":4,"options":[0.1,0.9]}

How do I clean this up so that each value in this field only reads the first number in the hard brackets? (i.e. 0.1)

CodePudding user response：

string <- '{"option1":"option2","option3":4,"options":[0.1,0.9]}'
(a <- jsonlite::fromJSON(string))
$option1
[1] "option2"

$option3
[1] 4

$options
[1] 0.1 0.9

if you want the first value:

a$options[1]
[1] 0.1

CodePudding user response：

df <- data.frame(col1 = '{"option1":"option2","option3":4,"options":[0.1,0.9]}')

df %>% 
  mutate(
    num1 = as.numeric(gsub('.*\\[([\\d.] )(.*)', '\\1', col1, perl = T))
  )

                                                   col1 num1
1 {"option1":"option2","option3":4,"options":[0.1,0.9]}  0.1

CodePudding user response：

Using gsub twice, remove all up to "[", then remove anything after ",", and convert to numeric:

x <- '{"option1":"option2","option3":4,"options":[0.1,0.9]}'

as.numeric(gsub(",.*", "", gsub(".*\\[", "", x)))
# [1] 0.1

(There must a better single pass regex solution)

CodePudding user response：

Here is an alternative solution using parse_number from readr package combined with stringrs str_extract:

\\[.*?\\] ... matches all between square brackets and parse_number gets the first number:

library(readr)
library(stringr)

parse_number(str_extract(string, '\\[.*?\\]'))

[1] 0.1