I have a dataset as follows;
My_data <- tibble(ref = 1:3, codes = c(12204, 35478, 67456))
I want to separate the codes
column as follows.
The first digit of the codes
column forms a new variable clouds
.
The second and third digits of the codes
column forms a new variable wind_direction
.
The last two digits of the codes
column form a new variable wind_speed
.
NB: I know that str_match
and str_match_all
can do this. The problem is that they return a matrix. I want a solution that will extend the tibble to include the three additional variables.
Thank you.
CodePudding user response:
You can use the tidyr::extract
function with the appropriate regular expression to do the splitting
My_data %>%
mutate(codes = as.character(codes)) %>%
extract(codes, c("clouds","wind_direction","wind_speed"), r"{(\d )(\d{2})(\d{2})}")
# ref clouds wind_direction wind_speed
# <int> <chr> <chr> <chr>
# 1 1 1 22 04
# 2 2 3 54 78
# 3 3 6 74 56
CodePudding user response:
Another option would be to use subsequent separate
statements to put into new columns based on position (but @MrFlick's is a lot more efficient).
library(tidyverse)
My_data %>%
separate(codes, into=c("clouds", "wind_direction"), sep = 1) %>%
separate(wind_direction, into=c("wind_direction", "wind_speed"), sep = 2)
Or we could add in a separator between the numbers, and then again use separate
:
My_data %>%
mutate(codes = str_replace_all(codes, '^(.{1})(.{2})(.*)$', '\\1_\\2_\\3')) %>%
separate(codes, c("clouds","wind_direction","wind_speed"), sep = "_")
Output
ref clouds wind_direction wind_speed
<int> <chr> <chr> <chr>
1 1 1 22 04
2 2 3 54 78
3 3 6 74 56