I have data in untidy form. One of the columns contains combined information -
- section (a, b)
- question no. (1 or 01, 2 or 03, 3 or 03)
- the question (name, address, ...)
I wish to create a separate column for each.
Below is an example with the desired result.
library(tidyverse)
tbl <- tibble(
variable = c("a01_name", "a02_address", "a3_phone_number",
"b1a_total_income", "b01b_no_of_cars")
)
# desired result
desired_tbl <- tibble(
section = c("a", "a", "a", "b", "b"),
question_no = c(1, 2, 3, 1, 1),
sub_question_no = c(NA, NA, NA, "a", "b"),
question = c("name", "address", "phone_number", "total_income", "no_of_cars")
)
CodePudding user response:
We can use extract
library(tidyr)
library(dplyr)
extract(tbl, variable, into = c("section", "question_no", "sub_question_no",
"question"), "^(\\D )(\\d )([a-z] )?_(.*)", convert = TRUE) %>%
na_if("")
-output
# A tibble: 5 × 4
section question_no sub_question_no question
<chr> <int> <chr> <chr>
1 a 1 <NA> name
2 a 2 <NA> address
3 a 3 <NA> phone_number
4 b 1 a total_income
5 b 1 b no_of_cars