Given the following data:
data <- data.frame("Street" = c("Example1", "Example2", "Example3", "Example4"),
"Number" = c("10-20a", "4b-8", NA, "14-16"))
Street Number
1 Example1 10-20a
2 Example2 4b-8
3 Example3 <NA>
4 Example4 14-16
How can I create a data frame with each individual address that looks like this:
data_long <- data.frame("Street" = c(rep("Example1", 11), rep("Example2", 5),
"Example3", rep("Example4", 3)),
"Number" = c(10:19,"20a", "4b", 5:8, NA, c(14:16)))
Street Number
1 Example1 10
2 Example1 11
3 Example1 12
4 Example1 13
5 Example1 14
6 Example1 15
7 Example1 16
8 Example1 17
9 Example1 18
10 Example1 19
11 Example1 20a
12 Example2 4b
13 Example2 5
14 Example2 6
15 Example2 7
16 Example2 8
17 Example3 <NA>
18 Example4 14
19 Example4 15
20 Example4 16
I tried splitting the numbers and the letters and then using rowwise
andmutate
but it did not work because then the letter was next to each address.
CodePudding user response:
We filter
the non-NA elements, extract one or more digits (\d ) with the letters ([a-z]) as 'v1' column in mutate, loop over the numbers extracted between the - using map2, get the sequence (:) as a list column, unnest (from tidyr) the list column to expand the data and replace the value in 'Number' based on the 'v1' column in case_when
and bind with the NA
rows
library(dplyr)
library(stringr)
library(tidyr)
library(purrr)
data %>%
filter(complete.cases(Number)) %>%
mutate(v1 = str_extract(Number, "\\d [a-z]"),
Number = map2(readr::parse_number(Number),
as.numeric(str_extract(Number, "-(\\d )", group = 1)), `:`)) %>%
unnest(Number) %>%
mutate(Number = case_when(str_detect(v1, as.character(Number))
~ v1,
TRUE ~ as.character(Number)), v1 = NULL) %>%
bind_rows(data %>%
filter(is.na(Number))) %>%
arrange(Street)
-output
# A tibble: 20 × 2
Street Number
<chr> <chr>
1 Example1 10
2 Example1 11
3 Example1 12
4 Example1 13
5 Example1 14
6 Example1 15
7 Example1 16
8 Example1 17
9 Example1 18
10 Example1 19
11 Example1 20a
12 Example2 4b
13 Example2 5
14 Example2 6
15 Example2 7
16 Example2 8
17 Example3 <NA>
18 Example4 14
19 Example4 15
20 Example4 16
CodePudding user response:
Here is another approach that was generated with the help of @akrun (here: How to expand rows and fill in the numbers between given start and end):
library(dplyr)
library(tidyr)
library(readr)
data %>%
separate_rows(Number, sep = "-") %>%
group_by(Street) %>%
mutate(Number1 = ifelse(is.na(Number), 0, parse_number(Number))) %>%
tidyr::complete(Number1 = full_seq(Number1, period = 1)) %>%
mutate(Number = coalesce(Number, as.character(Number1)), .keep="unused")
Street Number
<chr> <chr>
1 Example1 10
2 Example1 11
3 Example1 12
4 Example1 13
5 Example1 14
6 Example1 15
7 Example1 16
8 Example1 17
9 Example1 18
10 Example1 19
11 Example1 20a
12 Example2 4b
13 Example2 5
14 Example2 6
15 Example2 7
16 Example2 8
17 Example3 0
18 Example4 14
19 Example4 15
20 Example4 16