Use range of addresses in column values to create new rows with single addresses in R-CodePudding

Given the following data:

data <- data.frame("Street" = c("Example1", "Example2", "Example3", "Example4"),
                   "Number" = c("10-20a", "4b-8", NA, "14-16"))

    Street Number
1 Example1 10-20a
2 Example2   4b-8
3 Example3   <NA>
4 Example4  14-16

How can I create a data frame with each individual address that looks like this:

data_long <- data.frame("Street" = c(rep("Example1", 11), rep("Example2", 5),
                                     "Example3", rep("Example4", 3)),
                        "Number" = c(10:19,"20a", "4b", 5:8, NA, c(14:16)))

     Street Number
1  Example1     10
2  Example1     11
3  Example1     12
4  Example1     13
5  Example1     14
6  Example1     15
7  Example1     16
8  Example1     17
9  Example1     18
10 Example1     19
11 Example1    20a
12 Example2     4b
13 Example2      5
14 Example2      6
15 Example2      7
16 Example2      8
17 Example3   <NA>
18 Example4     14
19 Example4     15
20 Example4     16

I tried splitting the numbers and the letters and then using rowwise andmutate but it did not work because then the letter was next to each address.

CodePudding user response：

We filter the non-NA elements, extract one or more digits (\d ) with the letters ([a-z]) as 'v1' column in mutate, loop over the numbers extracted between the - using map2, get the sequence (:) as a list column, unnest (from tidyr) the list column to expand the data and replace the value in 'Number' based on the 'v1' column in case_when and bind with the NA rows

library(dplyr)
library(stringr)
library(tidyr)
library(purrr)
data %>% 
  filter(complete.cases(Number)) %>% 
  mutate(v1 = str_extract(Number, "\\d [a-z]"),
  Number = map2(readr::parse_number(Number), 
   as.numeric(str_extract(Number, "-(\\d )", group = 1)), `:`)) %>% 
  unnest(Number) %>% 
  mutate(Number = case_when(str_detect(v1, as.character(Number)) 
   ~ v1, 
  TRUE ~ as.character(Number)), v1 = NULL) %>% 
  bind_rows(data %>%
    filter(is.na(Number))) %>% 
    arrange(Street)

-output

# A tibble: 20 × 2
   Street   Number
   <chr>    <chr> 
 1 Example1 10    
 2 Example1 11    
 3 Example1 12    
 4 Example1 13    
 5 Example1 14    
 6 Example1 15    
 7 Example1 16    
 8 Example1 17    
 9 Example1 18    
10 Example1 19    
11 Example1 20a   
12 Example2 4b    
13 Example2 5     
14 Example2 6     
15 Example2 7     
16 Example2 8     
17 Example3 <NA>  
18 Example4 14    
19 Example4 15    
20 Example4 16

CodePudding user response：

Here is another approach that was generated with the help of @akrun (here: How to expand rows and fill in the numbers between given start and end):

library(dplyr)
library(tidyr)
library(readr)

data %>% 
  separate_rows(Number, sep = "-") %>% 
  group_by(Street) %>% 
  mutate(Number1 = ifelse(is.na(Number), 0, parse_number(Number))) %>% 
  tidyr::complete(Number1 = full_seq(Number1, period = 1)) %>% 
  mutate(Number = coalesce(Number, as.character(Number1)), .keep="unused")

   Street   Number
   <chr>    <chr> 
 1 Example1 10    
 2 Example1 11    
 3 Example1 12    
 4 Example1 13    
 5 Example1 14    
 6 Example1 15    
 7 Example1 16    
 8 Example1 17    
 9 Example1 18    
10 Example1 19    
11 Example1 20a   
12 Example2 4b    
13 Example2 5     
14 Example2 6     
15 Example2 7     
16 Example2 8     
17 Example3 0     
18 Example4 14    
19 Example4 15    
20 Example4 16