Home > database >  Use range of addresses in column values to create new rows with single addresses in R
Use range of addresses in column values to create new rows with single addresses in R

Time:01-19

Given the following data:

data <- data.frame("Street" = c("Example1", "Example2", "Example3", "Example4"),
                   "Number" = c("10-20a", "4b-8", NA, "14-16"))
    Street Number
1 Example1 10-20a
2 Example2   4b-8
3 Example3   <NA>
4 Example4  14-16

How can I create a data frame with each individual address that looks like this:

data_long <- data.frame("Street" = c(rep("Example1", 11), rep("Example2", 5),
                                     "Example3", rep("Example4", 3)),
                        "Number" = c(10:19,"20a", "4b", 5:8, NA, c(14:16)))
     Street Number
1  Example1     10
2  Example1     11
3  Example1     12
4  Example1     13
5  Example1     14
6  Example1     15
7  Example1     16
8  Example1     17
9  Example1     18
10 Example1     19
11 Example1    20a
12 Example2     4b
13 Example2      5
14 Example2      6
15 Example2      7
16 Example2      8
17 Example3   <NA>
18 Example4     14
19 Example4     15
20 Example4     16

I tried splitting the numbers and the letters and then using rowwise andmutate but it did not work because then the letter was next to each address.

CodePudding user response:

We filter the non-NA elements, extract one or more digits (\d ) with the letters ([a-z]) as 'v1' column in mutate, loop over the numbers extracted between the - using map2, get the sequence (:) as a list column, unnest (from tidyr) the list column to expand the data and replace the value in 'Number' based on the 'v1' column in case_when and bind with the NA rows

library(dplyr)
library(stringr)
library(tidyr)
library(purrr)
data %>% 
  filter(complete.cases(Number)) %>% 
  mutate(v1 = str_extract(Number, "\\d [a-z]"),
  Number = map2(readr::parse_number(Number), 
   as.numeric(str_extract(Number, "-(\\d )", group = 1)), `:`)) %>% 
  unnest(Number) %>% 
  mutate(Number = case_when(str_detect(v1, as.character(Number)) 
   ~ v1, 
  TRUE ~ as.character(Number)), v1 = NULL) %>% 
  bind_rows(data %>%
    filter(is.na(Number))) %>% 
    arrange(Street)

-output

# A tibble: 20 × 2
   Street   Number
   <chr>    <chr> 
 1 Example1 10    
 2 Example1 11    
 3 Example1 12    
 4 Example1 13    
 5 Example1 14    
 6 Example1 15    
 7 Example1 16    
 8 Example1 17    
 9 Example1 18    
10 Example1 19    
11 Example1 20a   
12 Example2 4b    
13 Example2 5     
14 Example2 6     
15 Example2 7     
16 Example2 8     
17 Example3 <NA>  
18 Example4 14    
19 Example4 15    
20 Example4 16    

CodePudding user response:

Here is another approach that was generated with the help of @akrun (here: How to expand rows and fill in the numbers between given start and end):

library(dplyr)
library(tidyr)
library(readr)

data %>% 
  separate_rows(Number, sep = "-") %>% 
  group_by(Street) %>% 
  mutate(Number1 = ifelse(is.na(Number), 0, parse_number(Number))) %>% 
  tidyr::complete(Number1 = full_seq(Number1, period = 1)) %>% 
  mutate(Number = coalesce(Number, as.character(Number1)), .keep="unused")
   Street   Number
   <chr>    <chr> 
 1 Example1 10    
 2 Example1 11    
 3 Example1 12    
 4 Example1 13    
 5 Example1 14    
 6 Example1 15    
 7 Example1 16    
 8 Example1 17    
 9 Example1 18    
10 Example1 19    
11 Example1 20a   
12 Example2 4b    
13 Example2 5     
14 Example2 6     
15 Example2 7     
16 Example2 8     
17 Example3 0     
18 Example4 14    
19 Example4 15    
20 Example4 16 
  • Related