Home > database >  Create two numeric columns from One character column in R
Create two numeric columns from One character column in R

Time:01-18

The confidence interval column is of type character

confidence_interval
(245.0 - 345.2)
(434.1 - 432.1)
(123.5 - 1,120.2)

I want to create two numeric columns like Upper Interval which has first value in the parentheses and lower interval which contains the second value

Upper Interval Lower Interval
245.0 345.2
434.1 432.1
123.5 1120.2

How can this be done using R?

Thanks

CodePudding user response:

extract() from tidyr fits your case.

library(tidyr)

df %>%
  extract(confidence_interval, into = c("Upper", "Lower"),
          regex = "\\((. ),(. )\\)", convert = TRUE)

# # A tibble: 3 × 2
#   Upper Lower
#   <dbl> <dbl>
# 1  245   345.
# 2  434.  432.
# 3  124.  901.

CodePudding user response:

This is one approach using sapply with strsplit and gsub

setNames(data.frame(t(sapply(strsplit(df$confidence_interval, " - "), function(x)
  gsub("\\(|\\)", "", x)))), c("Upper Interval", "Lower Interval"))
  Upper Interval Lower Interval
1          245.0          345.2
2          434.1          432.1
3          123.5        1,901.2

Data

df)
structure(list(confidence_interval = c("(245.0 - 345.2)", "(434.1 - 432.1)",
"(123.5 - 1,901.2)")), class = "data.frame", row.names = c(NA,
-3L))

CodePudding user response:

Here is a solution.

ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')

values <- strsplit(gsub('\\(|\\)', '', ci), split = ",")

upper <- sapply(values, function(x) as.numeric(x[[1]]))
lower <- sapply(values, function(x) as.numeric(x[[2]]))

upper
#> [1] 245.0 434.1 123.5
lower
#> [1] 345.2 432.1 901.2

I use gsub to remove the parentheses, and then strsplit to split the values of each side of the ,. Then i use sapply to return this a vector as the return value of strsplit is a list of lists.

OP question was edited

If separator between value is is ' - ' then you should use values <- strsplit(gsub('\\(|\\)', '', ci), split = " - ")

The split parameter in strsplit is what the function will use to split the strings into two parts.

CodePudding user response:

library(tidyverse)

ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')      
data.frame(ci) |> 
  mutate(ci2 = stringr::str_replace_all(ci, "\\(|\\)", "")) |> 
  separate(ci2, c('upper', 'lower'), sep =",", convert = TRUE)
#>              ci upper lower
#> 1 (245.0,345.2) 245.0 345.2
#> 2 (434.1,432.1) 434.1 432.1
#> 3 (123.5,901.2) 123.5 901.2

CodePudding user response:

df %>%
  mutate(across(confidence_interval, ~ str_remove_all(.x, "[^0-9,\\.]"))) %>%
  separate(col = confidence_interval,
           into = c("higher", "lower"),
           sep = ",", convert = TRUE)

# A tibble: 3 × 2
  higher lower
   <dbl> <dbl>
1   245   345.
2   434.  432.
3   124.  901.

CodePudding user response:

Using strcapture:

ci <- c('(245.0,345.2)', '(434.1,432.1)', '(123.5,901.2)')

pattern <- "\\(([-.0-9] ),([-.0-9] )\\)"
strcapture(pattern, ci, data.frame(upper.interval=numeric(), lower.interval=numeric()))

  upper.interval lower.interval
1          245.0          345.2
2          434.1          432.1
3          123.5          901.2

  • Related