Home > database >  (Row)binding vectors with different lengths by pattern
(Row)binding vectors with different lengths by pattern


my question seems to be a mixture of the following two threads:

Combine two data frames by rows (rbind) when they have different sets of columns

rbind data frames based on a common pattern in data frame name

I want to combine (by adding rows) the content of several vectors with different lengths based on the pattern of the vector name. Example data set:

million_cities_USA <- c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix", "Philadelphia", "San Antonio", "San Diego", "Dallas")
million_cities_UK <- c("London", "Birmingham")
million_cities_Canada <- c("Toronto", "Montreal", "Calgary", "Ottawa", "Edmonton")

When I combine the vectors with rbind (see second answer in the second link), R starts recycling the shorter vectors:

        as.data.frame(do.call(rbind, mget(ls(pattern="million_cities_*"))))                        

                            V1          V2      V3         V4       V5           V6          V7         V8     V9
million_cities_Canada  Toronto    Montreal Calgary     Ottawa Edmonton      Toronto    Montreal    Calgary Ottawa
million_cities_UK       London  Birmingham  London Birmingham   London   Birmingham      London Birmingham London
million_cities_USA    New York Los Angeles Chicago    Houston  Phoenix Philadelphia San Antonio  San Diego Dallas

Regarding the first link, I suppose that this can be avoided with dplyr's bind_rows. However, I couldn't combine the vectors at all with bind_rows. The error message indicated that bind_rows needs vectors with identical lengths to work:

as.data.frame(mget(ls(pattern="million_cities_*")) %>%

Error: Argument 2 must be length 5, not 2

How can I combine all vectors with the same name pattern by row and leave the missing columns of the shorter vectors empty instead of inserting the vector elements again?

CodePudding user response:

You can create a list column in a data frame and then unnest it.


l <- mget(ls(pattern="million_cities_*"))

tibble(cities = l) %>% 

The main annoyance is the message about "new names" that you will get for each row.

# A tibble: 3 x 9
  ...1     ...2        ...3    ...4    ...5     ...6         ...7        ...8      ...9  
  <chr>    <chr>       <chr>   <chr>   <chr>    <chr>        <chr>       <chr>     <chr> 
1 Toronto  Montreal    Calgary Ottawa  Edmonton NA           NA          NA        NA    
2 London   Birmingham  NA      NA      NA       NA           NA          NA        NA    
3 New York Los Angeles Chicago Houston Phoenix  Philadelphia San Antonio San Diego Dallas

You can avoid that by jumping to purrr so that you can create named tibble rows.


map_dfr(l, ~ as_tibble_row(set_names(.x, seq.int(.x))))
# A tibble: 3 x 9
  `1`      `2`         `3`     `4`     `5`      `6`          `7`         `8`       `9`   
  <chr>    <chr>       <chr>   <chr>   <chr>    <chr>        <chr>       <chr>     <chr> 
1 Toronto  Montreal    Calgary Ottawa  Edmonton NA           NA          NA        NA    
2 London   Birmingham  NA      NA      NA       NA           NA          NA        NA    
3 New York Los Angeles Chicago Houston Phoenix  Philadelphia San Antonio San Diego Dallas

CodePudding user response:

This is one solution. You can use as.data.frame() on the transpose t() of the vector and then bind_rows() for each vector and it won't repeat.


as.data.frame(t(million_cities_USA)) %>% 
  bind_rows(as.data.frame(t(million_cities_UK))) %>% 

CodePudding user response:

Using base R

lst1 <- mget(ls(pattern = 'million_cities'))
mx <- max(lengths(lst1))
out <- do.call(rbind.data.frame, lapply(lst1, `length<-`,mx))
names(out) <- paste0("col", seq_along(out))


> out
      col1        col2    col3    col4     col5         col6        col7      col8   col9
1  Toronto    Montreal Calgary  Ottawa Edmonton         <NA>        <NA>      <NA>   <NA>
2   London  Birmingham    <NA>    <NA>     <NA>         <NA>        <NA>      <NA>   <NA>
3 New York Los Angeles Chicago Houston  Phoenix Philadelphia San Antonio San Diego Dallas
  •  Tags:  
  • r
  • Related