Convert one string column in three columns-CodePudding

I am trying to separate values for the estimates and CIs into three columns, so that the column with info of the type 99.99[-99.9,99.9] is converted into three separated columns.

Please consider the data:

out <- 
structure(list(name = c("total_gray_vol_0_to_psychosis_24", "total_gray_vol_24_to_psychosis_48",
"psychosis_0_to_total_gray_vol_24", "psychosis_24_to_total_gray_vol_48"
), Std.Estimate = c(0.304045656442265, 1.48352171485462, 0.673583361513608,
0.703098685562618), Std.SE = c(0.239964279466103, 2.72428816136731,
0.112111316151443, 0.14890331153936), CI = c("0.3 [-0.17, 0.77]",
"1.48 [-3.86, 6.82]", "0.67 [0.45, 0.89]", "0.7 [0.41, 0.99]"
)), class = "data.frame", row.names = c(NA, -4L))

The farthest I got was to extract the first digit with:

library(stringr)
str_match(out$CI, pattern= "([[0-9] ]*)([[0-9] ]*)([[0-9] ]*)")

But this is not working, as it is returning only the first digits, and for some reason four columns.

How do I split the column CI into three columns (estimate, lower, upper) correctly?

CodePudding user response：

You could also use tidyr::extract as follows. Also note that in regex argument you need to define as many capturing groups as the length of into argument.

out %>%
  extract(CI, c('estimate', 'lower', 'upper'), '([-\\d.] )\\s \\[([-\\d.] )\\W ([-\\d.] )\\]')

                               name Std.Estimate    Std.SE estimate lower upper
1  total_gray_vol_0_to_psychosis_24    0.3040457 0.2399643      0.3 -0.17  0.77
2 total_gray_vol_24_to_psychosis_48    1.4835217 2.7242882     1.48 -3.86  6.82
3  psychosis_0_to_total_gray_vol_24    0.6735834 0.1121113     0.67  0.45  0.89
4 psychosis_24_to_total_gray_vol_48    0.7030987 0.1489033      0.7  0.41  0.99

CodePudding user response：

Here is an option using tidyr::separate

out %>%
    separate(CI, c("estimate", "lower", "upper"), sep = "\\s|[|]") %>%
    mutate(across(
        c(estimate, lower, upper), 
        ~ .x %>% str_remove_all("\\[|\\]|,|\\s") %>% as.numeric()))
#                               name Std.Estimate    Std.SE estimate lower upper
#1  total_gray_vol_0_to_psychosis_24    0.3040457 0.2399643     0.30 -0.17  0.77
#2 total_gray_vol_24_to_psychosis_48    1.4835217 2.7242882     1.48 -3.86  6.82
#3  psychosis_0_to_total_gray_vol_24    0.6735834 0.1121113     0.67  0.45  0.89
#4 psychosis_24_to_total_gray_vol_48    0.7030987 0.1489033     0.70  0.41  0.99

First, split entries on a white space, "[" or "]", then remove these characters from the resulting new columns and coerce to numeric.

CodePudding user response：

Using base R

out <- cbind(out, read.table(text = gsub("[][]|,", "", out$CI),
    header = FALSE, col.names = c("estimate", "lower", "upper")))

-output

> out$CI <- NULL
> out
                               name Std.Estimate    Std.SE estimate lower upper
1  total_gray_vol_0_to_psychosis_24    0.3040457 0.2399643     0.30 -0.17  0.77
2 total_gray_vol_24_to_psychosis_48    1.4835217 2.7242882     1.48 -3.86  6.82
3  psychosis_0_to_total_gray_vol_24    0.6735834 0.1121113     0.67  0.45  0.89
4 psychosis_24_to_total_gray_vol_48    0.7030987 0.1489033     0.70  0.41  0.99