Extracting strings by Position in R, preferably the tidyverse-CodePudding

I have a dataset as follows;

My_data <- tibble(ref = 1:3, codes = c(12204, 35478, 67456))

I want to separate the codes column as follows.

The first digit of the codes column forms a new variable clouds.

The second and third digits of the codes column forms a new variable wind_direction.

The last two digits of the codes column form a new variable wind_speed.

NB: I know that str_match and str_match_all can do this. The problem is that they return a matrix. I want a solution that will extend the tibble to include the three additional variables.

Thank you.

CodePudding user response：

You can use the tidyr::extract function with the appropriate regular expression to do the splitting

My_data %>% 
  mutate(codes = as.character(codes)) %>% 
  extract(codes, c("clouds","wind_direction","wind_speed"), r"{(\d )(\d{2})(\d{2})}")

#     ref clouds wind_direction wind_speed
#   <int> <chr>  <chr>          <chr>     
# 1     1 1      22             04        
# 2     2 3      54             78        
# 3     3 6      74             56

CodePudding user response：

Another option would be to use subsequent separate statements to put into new columns based on position (but @MrFlick's is a lot more efficient).

library(tidyverse)

My_data %>%
  separate(codes, into=c("clouds", "wind_direction"), sep = 1) %>% 
  separate(wind_direction, into=c("wind_direction", "wind_speed"), sep = 2)

Or we could add in a separator between the numbers, and then again use separate:

My_data %>%
  mutate(codes = str_replace_all(codes, '^(.{1})(.{2})(.*)$', '\\1_\\2_\\3')) %>% 
  separate(codes, c("clouds","wind_direction","wind_speed"), sep = "_")

Output

    ref clouds wind_direction wind_speed
  <int> <chr>  <chr>          <chr>     
1     1 1      22             04        
2     2 3      54             78        
3     3 6      74             56