Home > OS >  How to divide a data factor into subsets in R
How to divide a data factor into subsets in R

Time:12-03

I have a factor column which looks like this

id               
2000.1.ABC0123
2010.11.BCD3652   

The logic is "year" "month" and "identifier" I simply want to entangle them into two columns with the specification that I need single digit months with a "0" in front as follows.

Identifier     yearmonth    
AB0123         200001 
BCD3652        201011  

I played around with "paste0" and "substra", but could not get it to work.

Any help is very much appreciated.

Thanks and best, D.

CodePudding user response:

Combination of strsplit and sprintf should give you the output you need.

x = unlist(strsplit('2000.1.ABC0123', split='\\.'))
y = as.numeric(x[1:2])
sprintf('Md', y[1], y[2])
x[3]

CodePudding user response:

In Base R:

read.table(text=df$id, sep='.', col.names = c('year', 'month', 'Identifier')) |>
  transform(yearmonth = sprintf("%dd", year, month))

  year month Identifier yearmonth
1 2000     1    ABC0123    200001
2 2010    11    BCD3652    201011

Tidyverse:

df %>%
  separate(id, c('year', 'month', 'Identifier'), convert = TRUE) %>%
  mutate(month = sprintf('d', month)) %>%
  unite('yearmonth', year, month, sep='')

yearmonth Identifier
  <chr>     <chr>     
1 200001    ABC0123   
2 201011    BCD3652

CodePudding user response:

library(data.table)
DT <- fread("id               
2000.1.ABC0123
2010.11.BCD3652 ")

DT[, c("year", "month", "Identifier") := tstrsplit(id, ".", fixed = TRUE)]
DT[, yearmonth := paste0(year, sprintf("d", as.numeric(month)))]
#                 id year month Identifier yearmonth
# 1:  2000.1.ABC0123 2000     1    ABC0123    200001
# 2: 2010.11.BCD3652 2010    11    BCD3652    201011

CodePudding user response:

Here is an example using stringr and dplyr

library(tidyverse)

df_example <- tribble(~id,
                      '2000.1.ABC0123',
                      '2010.11.BCD3652')


df_example |> 
  mutate(split_cols = str_split(id,pattern = "\\.(?![:digit:])"),
         yearmonth  = split_cols |> map_chr(pluck(1)) |>  str_remove('\\.'),
         Identifier =  split_cols |> map_chr(pluck(2))
         )
#> # A tibble: 2 x 4
#>   id              split_cols yearmonth Identifier
#>   <chr>           <list>     <chr>     <chr>     
#> 1 2000.1.ABC0123  <chr [2]>  20001     ABC0123   
#> 2 2010.11.BCD3652 <chr [2]>  201011    BCD3652

Created on 2021-12-02 by the reprex package (v2.0.1)

  • Related