Home > Back-end >  Create columns dynamically with dplyr
Create columns dynamically with dplyr

Time:05-01

I have a column with 8 digit numbers from which I want to create 8 additional columns. Each column must contain n amount of digits, ranging from 1 to 8. So far, I have done the following successfully:

consolidado = tablas %>% 
        bind_rows() %>% 
        arrange(fecha) %>% 
        clean_names() %>% 
        mutate(n1 = str_extract(codigo, '[0-9]{1}'),
               n2 = str_extract(codigo, '[0-9]{2}'),
               n3 = str_extract(codigo, '[0-9]{3}'),
               n4 = str_extract(codigo, '[0-9]{4}'),
               n5 = str_extract(codigo, '[0-9]{5}'),
               n6 = str_extract(codigo, '[0-9]{6}'),
               n7 = str_extract(codigo, '[0-9]{7}'),
               n8 = str_extract(codigo, '[0-9]{8}'))

Where each column created in mutate contains the desired amount of digits. However, I was wondering if there was a shorter and more elegant way to approach this.

Here is some sample data from the first 10 rows of the data set:

structure(list(concepto = c("ACTIVO", "ACTIVO", "ACTIVO", "ACTIVO", 
"ACTIVO", "ACTIVO", "ACTIVO", "ACTIVO", "ACTIVO", "ACTIVO"), 
    fecha = structure(c(13879, 13879, 13879, 13879, 13879, 13879, 
    13879, 13879, 13879, 13879), class = "Date"), codigo = c("10000000", 
    "10000000", "10000000", "10000000", "10000000", "10000000", 
    "10000000", "10000000", "10000000", "10000000"), bancos = c("BCR   Banco de Costa Rica", 
    "BNCR   Banco Nacional de Costa Rica", "BANHVI   Banco Hipotecario de la Vivienda", 
    "POPULAR   Banco Popular y de desarrollo Comunal", "BANCO BAC SAN JOSE S A", 
    "BANCO BCT S A", "Banco Cathay de Costa Rica S A", "Banco Davivienda Costa Rica Sociedad Anonima", 
    "BANCO GENERAL COSTA RICA SOCIEDAD ANONIMA", "BANCO IMPROSA S A"
    ), valor = c(1675930811215, 2745357417558, 57759616119, 1119228752031, 
    658336496744, 71173584265, 24596619019, 469100562766, 6068169547, 
    202528077603), n1 = c("1", "1", "1", "1", "1", "1", "1", 
    "1", "1", "1"), n2 = c("10", "10", "10", "10", "10", "10", 
    "10", "10", "10", "10"), n3 = c("100", "100", "100", "100", 
    "100", "100", "100", "100", "100", "100"), n4 = c("1000", 
    "1000", "1000", "1000", "1000", "1000", "1000", "1000", "1000", 
    "1000"), n5 = c("10000", "10000", "10000", "10000", "10000", 
    "10000", "10000", "10000", "10000", "10000"), n6 = c("100000", 
    "100000", "100000", "100000", "100000", "100000", "100000", 
    "100000", "100000", "100000"), n7 = c("1000000", "1000000", 
    "1000000", "1000000", "1000000", "1000000", "1000000", "1000000", 
    "1000000", "1000000"), n8 = c("10000000", "10000000", "10000000", 
    "10000000", "10000000", "10000000", "10000000", "10000000", 
    "10000000", "10000000")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

Thanks in advance for any help you can provide!

CodePudding user response:

We can use map_dfc with set_names:

library(tidyverse)

consolidado %>% 
  mutate(map_dfc(set_names(1:8, paste0("n", 1:8)),
             ~ str_extract(codigo, paste0('[0-9]{',.x,'}'))))

#> # A tibble: 10 × 13
#>    concepto fecha      codigo bancos   valor n1    n2    n3    n4    n5    n6   
#>    <chr>    <date>     <chr>  <chr>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 ACTIVO   2008-01-01 10000… BCR  … 1.68e12 1     10    100   1000  10000 1000…
#>  2 ACTIVO   2008-01-01 10000… BNCR … 2.75e12 1     10    100   1000  10000 1000…
#>  3 ACTIVO   2008-01-01 10000… BANHV… 5.78e10 1     10    100   1000  10000 1000…
#>  4 ACTIVO   2008-01-01 10000… POPUL… 1.12e12 1     10    100   1000  10000 1000…
#>  5 ACTIVO   2008-01-01 10000… BANCO… 6.58e11 1     10    100   1000  10000 1000…
#>  6 ACTIVO   2008-01-01 10000… BANCO… 7.12e10 1     10    100   1000  10000 1000…
#>  7 ACTIVO   2008-01-01 10000… Banco… 2.46e10 1     10    100   1000  10000 1000…
#>  8 ACTIVO   2008-01-01 10000… Banco… 4.69e11 1     10    100   1000  10000 1000…
#>  9 ACTIVO   2008-01-01 10000… BANCO… 6.07e 9 1     10    100   1000  10000 1000…
#> 10 ACTIVO   2008-01-01 10000… BANCO… 2.03e11 1     10    100   1000  10000 1000…
#> # … with 2 more variables: n7 <chr>, n8 <chr>

Created on 2022-04-30 by the reprex package (v0.3.0)

Another option is dplyover::over (disclaimer: I'm the package maintainer). This makes the syntax more readable:

library(tidyverse)
library(dplyover) # https://timteafan.github.io/dplyover/

consolidado %>% 
  mutate(over(1:8,
              ~ str_extract(codigo, paste0('[0-9]{',.x,'}')),
              .names = "n{x}"))

#> # A tibble: 10 × 13
#>    concepto fecha      codigo bancos   valor n1    n2    n3    n4    n5    n6   
#>    <chr>    <date>     <chr>  <chr>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 ACTIVO   2008-01-01 10000… BCR  … 1.68e12 1     10    100   1000  10000 1000…
#>  2 ACTIVO   2008-01-01 10000… BNCR … 2.75e12 1     10    100   1000  10000 1000…
#>  3 ACTIVO   2008-01-01 10000… BANHV… 5.78e10 1     10    100   1000  10000 1000…
#>  4 ACTIVO   2008-01-01 10000… POPUL… 1.12e12 1     10    100   1000  10000 1000…
#>  5 ACTIVO   2008-01-01 10000… BANCO… 6.58e11 1     10    100   1000  10000 1000…
#>  6 ACTIVO   2008-01-01 10000… BANCO… 7.12e10 1     10    100   1000  10000 1000…
#>  7 ACTIVO   2008-01-01 10000… Banco… 2.46e10 1     10    100   1000  10000 1000…
#>  8 ACTIVO   2008-01-01 10000… Banco… 4.69e11 1     10    100   1000  10000 1000…
#>  9 ACTIVO   2008-01-01 10000… BANCO… 6.07e 9 1     10    100   1000  10000 1000…
#> 10 ACTIVO   2008-01-01 10000… BANCO… 2.03e11 1     10    100   1000  10000 1000…
#> # … with 2 more variables: n7 <chr>, n8 <chr>

Created on 2022-04-30 by the reprex package (v0.3.0)

  • Related