Separate numeric columns without special characters in R-CodePudding

I want to separate the variable "population" in two different columns. The first one ("pop1") must be composed by the first 2 values. The second one ("pop2"), the last value.

df <- dplyr::tibble(
  city = c("a", "a", "b", "b", "c", "c"), 
  sex = c(1,0,1,0,1,0),
  age = c(1,2,1,2,1,2),
  population = c(100, 123, 189, 234, 221, 435),
  accidents = c(87, 98, 79, 43,45,65)
)

Expected output


df <- dplyr::tibble(
  city = c("a", "a", "b", "b", "c", "c"), 
  sex = c(1,0,1,0,1,0),
  age = c(1,2,1,2,1,2),
  pop1 = c(10, 12, 18, 23, 22, 43),
  pop2 = c(0,3,9,4,1,5),
  accidents = c(87, 98, 79, 43,45,65)
)

Thanks

CodePudding user response：

A possible solution:

library(tidyverse)

df %>% 
  separate(population, into = paste0("pop", 1:2), sep = "(?=\\d$)", convert = T)

#> # A tibble: 6 × 6
#>   city    sex   age  pop1  pop2 accidents
#>   <chr> <dbl> <dbl> <int> <int>     <dbl>
#> 1 a         1     1    10     0        87
#> 2 a         0     2    12     3        98
#> 3 b         1     1    18     9        79
#> 4 b         0     2    23     4        43
#> 5 c         1     1    22     1        45
#> 6 c         0     2    43     5        65

CodePudding user response：

Another solution based on extract:

library(tidyr)

df %>%
  extract(population,
          into = c("pop1", "pop2"),
          regex = "(\\d\\d)(\\d)")
# A tibble: 6 × 6
  city    sex   age pop1  pop2  accidents
  <chr> <dbl> <dbl> <chr> <chr>     <dbl>
1 a         1     1 10    0            87
2 a         0     2 12    3            98
3 b         1     1 18    9            79
4 b         0     2 23    4            43
5 c         1     1 22    1            45
6 c         0     2 43    5            65