(Row)binding vectors with different lengths by pattern-CodePudding

my question seems to be a mixture of the following two threads:

Combine two data frames by rows (rbind) when they have different sets of columns

rbind data frames based on a common pattern in data frame name

I want to combine (by adding rows) the content of several vectors with different lengths based on the pattern of the vector name. Example data set:

million_cities_USA <- c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix", "Philadelphia", "San Antonio", "San Diego", "Dallas")
million_cities_UK <- c("London", "Birmingham")
million_cities_Canada <- c("Toronto", "Montreal", "Calgary", "Ottawa", "Edmonton")

When I combine the vectors with rbind (see second answer in the second link), R starts recycling the shorter vectors:

        as.data.frame(do.call(rbind, mget(ls(pattern="million_cities_*"))))                        

                            V1          V2      V3         V4       V5           V6          V7         V8     V9
million_cities_Canada  Toronto    Montreal Calgary     Ottawa Edmonton      Toronto    Montreal    Calgary Ottawa
million_cities_UK       London  Birmingham  London Birmingham   London   Birmingham      London Birmingham London
million_cities_USA    New York Los Angeles Chicago    Houston  Phoenix Philadelphia San Antonio  San Diego Dallas

Regarding the first link, I suppose that this can be avoided with dplyr's bind_rows. However, I couldn't combine the vectors at all with bind_rows. The error message indicated that bind_rows needs vectors with identical lengths to work:

library(dplyr)
as.data.frame(mget(ls(pattern="million_cities_*")) %>%
  bind_rows())

Error: Argument 2 must be length 5, not 2

How can I combine all vectors with the same name pattern by row and leave the missing columns of the shorter vectors empty instead of inserting the vector elements again?

CodePudding user response：

You can create a list column in a data frame and then unnest it.

library(dplyr)
library(tidyr)

l <- mget(ls(pattern="million_cities_*"))

tibble(cities = l) %>% 
  unnest_wider("cities)

The main annoyance is the message about "new names" that you will get for each row.

# A tibble: 3 x 9
  ...1     ...2        ...3    ...4    ...5     ...6         ...7        ...8      ...9  
  <chr>    <chr>       <chr>   <chr>   <chr>    <chr>        <chr>       <chr>     <chr> 
1 Toronto  Montreal    Calgary Ottawa  Edmonton NA           NA          NA        NA    
2 London   Birmingham  NA      NA      NA       NA           NA          NA        NA    
3 New York Los Angeles Chicago Houston Phoenix  Philadelphia San Antonio San Diego Dallas

You can avoid that by jumping to purrr so that you can create named tibble rows.

library(purrr)
library(tibble)

map_dfr(l, ~ as_tibble_row(set_names(.x, seq.int(.x))))

# A tibble: 3 x 9
  `1`      `2`         `3`     `4`     `5`      `6`          `7`         `8`       `9`   
  <chr>    <chr>       <chr>   <chr>   <chr>    <chr>        <chr>       <chr>     <chr> 
1 Toronto  Montreal    Calgary Ottawa  Edmonton NA           NA          NA        NA    
2 London   Birmingham  NA      NA      NA       NA           NA          NA        NA    
3 New York Los Angeles Chicago Houston Phoenix  Philadelphia San Antonio San Diego Dallas

CodePudding user response：

This is one solution. You can use as.data.frame() on the transpose t() of the vector and then bind_rows() for each vector and it won't repeat.

library(dplyr)

as.data.frame(t(million_cities_USA)) %>% 
  bind_rows(as.data.frame(t(million_cities_UK))) %>% 
  bind_rows(as.data.frame(t(million_cities_Canada)))

CodePudding user response：

Using base R

lst1 <- mget(ls(pattern = 'million_cities'))
mx <- max(lengths(lst1))
out <- do.call(rbind.data.frame, lapply(lst1, `length<-`,mx))
names(out) <- paste0("col", seq_along(out))

-output

> out
      col1        col2    col3    col4     col5         col6        col7      col8   col9
1  Toronto    Montreal Calgary  Ottawa Edmonton         <NA>        <NA>      <NA>   <NA>
2   London  Birmingham    <NA>    <NA>     <NA>         <NA>        <NA>      <NA>   <NA>
3 New York Los Angeles Chicago Houston  Phoenix Philadelphia San Antonio San Diego Dallas