I am trying to separate values for the estimates and CIs into three columns, so that the column with info of the type 99.99[-99.9,99.9] is converted into three separated columns.
Please consider the data:
out <-
structure(list(name = c("total_gray_vol_0_to_psychosis_24", "total_gray_vol_24_to_psychosis_48",
"psychosis_0_to_total_gray_vol_24", "psychosis_24_to_total_gray_vol_48"
), Std.Estimate = c(0.304045656442265, 1.48352171485462, 0.673583361513608,
0.703098685562618), Std.SE = c(0.239964279466103, 2.72428816136731,
0.112111316151443, 0.14890331153936), CI = c("0.3 [-0.17, 0.77]",
"1.48 [-3.86, 6.82]", "0.67 [0.45, 0.89]", "0.7 [0.41, 0.99]"
)), class = "data.frame", row.names = c(NA, -4L))
The farthest I got was to extract the first digit with:
library(stringr)
str_match(out$CI, pattern= "([[0-9] ]*)([[0-9] ]*)([[0-9] ]*)")
But this is not working, as it is returning only the first digits, and for some reason four columns.
- How do I split the column CI into three columns (estimate, lower, upper) correctly?
CodePudding user response:
You could also use tidyr::extract
as follows. Also note that in regex
argument you need to define as many capturing groups as the length of into
argument.
out %>%
extract(CI, c('estimate', 'lower', 'upper'), '([-\\d.] )\\s \\[([-\\d.] )\\W ([-\\d.] )\\]')
name Std.Estimate Std.SE estimate lower upper
1 total_gray_vol_0_to_psychosis_24 0.3040457 0.2399643 0.3 -0.17 0.77
2 total_gray_vol_24_to_psychosis_48 1.4835217 2.7242882 1.48 -3.86 6.82
3 psychosis_0_to_total_gray_vol_24 0.6735834 0.1121113 0.67 0.45 0.89
4 psychosis_24_to_total_gray_vol_48 0.7030987 0.1489033 0.7 0.41 0.99
CodePudding user response:
Here is an option using tidyr::separate
out %>%
separate(CI, c("estimate", "lower", "upper"), sep = "\\s|[|]") %>%
mutate(across(
c(estimate, lower, upper),
~ .x %>% str_remove_all("\\[|\\]|,|\\s") %>% as.numeric()))
# name Std.Estimate Std.SE estimate lower upper
#1 total_gray_vol_0_to_psychosis_24 0.3040457 0.2399643 0.30 -0.17 0.77
#2 total_gray_vol_24_to_psychosis_48 1.4835217 2.7242882 1.48 -3.86 6.82
#3 psychosis_0_to_total_gray_vol_24 0.6735834 0.1121113 0.67 0.45 0.89
#4 psychosis_24_to_total_gray_vol_48 0.7030987 0.1489033 0.70 0.41 0.99
First, split entries on a white space, "["
or "]"
, then remove these characters from the resulting new columns and coerce to numeric
.
CodePudding user response:
Using base R
out <- cbind(out, read.table(text = gsub("[][]|,", "", out$CI),
header = FALSE, col.names = c("estimate", "lower", "upper")))
-output
> out$CI <- NULL
> out
name Std.Estimate Std.SE estimate lower upper
1 total_gray_vol_0_to_psychosis_24 0.3040457 0.2399643 0.30 -0.17 0.77
2 total_gray_vol_24_to_psychosis_48 1.4835217 2.7242882 1.48 -3.86 6.82
3 psychosis_0_to_total_gray_vol_24 0.6735834 0.1121113 0.67 0.45 0.89
4 psychosis_24_to_total_gray_vol_48 0.7030987 0.1489033 0.70 0.41 0.99