The confidence interval column is of type character
confidence_interval |
---|
(245.0 - 345.2) |
(434.1 - 432.1) |
(123.5 - 1,120.2) |
I want to create two numeric columns like Upper Interval which has first value in the parentheses and lower interval which contains the second value
Upper Interval | Lower Interval |
---|---|
245.0 | 345.2 |
434.1 | 432.1 |
123.5 | 1120.2 |
How can this be done using R?
Thanks
CodePudding user response:
extract()
from tidyr
fits your case.
library(tidyr)
df %>%
extract(confidence_interval, into = c("Upper", "Lower"),
regex = "\\((. ),(. )\\)", convert = TRUE)
# # A tibble: 3 × 2
# Upper Lower
# <dbl> <dbl>
# 1 245 345.
# 2 434. 432.
# 3 124. 901.
CodePudding user response:
This is one approach using sapply
with strsplit
and gsub
setNames(data.frame(t(sapply(strsplit(df$confidence_interval, " - "), function(x)
gsub("\\(|\\)", "", x)))), c("Upper Interval", "Lower Interval"))
Upper Interval Lower Interval
1 245.0 345.2
2 434.1 432.1
3 123.5 1,901.2
Data
df)
structure(list(confidence_interval = c("(245.0 - 345.2)", "(434.1 - 432.1)",
"(123.5 - 1,901.2)")), class = "data.frame", row.names = c(NA,
-3L))
CodePudding user response:
Here is a solution.
ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')
values <- strsplit(gsub('\\(|\\)', '', ci), split = ",")
upper <- sapply(values, function(x) as.numeric(x[[1]]))
lower <- sapply(values, function(x) as.numeric(x[[2]]))
upper
#> [1] 245.0 434.1 123.5
lower
#> [1] 345.2 432.1 901.2
I use gsub
to remove the parentheses, and then strsplit
to split the values of each side of the ,
. Then i use sapply
to return this a vector as the return value of strsplit
is a list of lists.
OP question was edited
If separator between value is is ' - ' then you should use values <- strsplit(gsub('\\(|\\)', '', ci), split = " - ")
The split
parameter in strsplit
is what the function will use to split the strings into two parts.
CodePudding user response:
library(tidyverse)
ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')
data.frame(ci) |>
mutate(ci2 = stringr::str_replace_all(ci, "\\(|\\)", "")) |>
separate(ci2, c('upper', 'lower'), sep =",", convert = TRUE)
#> ci upper lower
#> 1 (245.0,345.2) 245.0 345.2
#> 2 (434.1,432.1) 434.1 432.1
#> 3 (123.5,901.2) 123.5 901.2
CodePudding user response:
df %>%
mutate(across(confidence_interval, ~ str_remove_all(.x, "[^0-9,\\.]"))) %>%
separate(col = confidence_interval,
into = c("higher", "lower"),
sep = ",", convert = TRUE)
# A tibble: 3 × 2
higher lower
<dbl> <dbl>
1 245 345.
2 434. 432.
3 124. 901.
CodePudding user response:
Using strcapture
:
ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')
pattern <- "\\(([-.0-9] ),([-.0-9] )\\)"
strcapture(pattern, ci, data.frame(upper.interval=numeric(), lower.interval=numeric()))
upper.interval lower.interval
1 245.0 345.2
2 434.1 432.1
3 123.5 901.2