my question seems to be a mixture of the following two threads:
Combine two data frames by rows (rbind) when they have different sets of columns
rbind data frames based on a common pattern in data frame name
I want to combine (by adding rows) the content of several vectors with different lengths based on the pattern of the vector name. Example data set:
million_cities_USA <- c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix", "Philadelphia", "San Antonio", "San Diego", "Dallas")
million_cities_UK <- c("London", "Birmingham")
million_cities_Canada <- c("Toronto", "Montreal", "Calgary", "Ottawa", "Edmonton")
When I combine the vectors with rbind (see second answer in the second link), R starts recycling the shorter vectors:
as.data.frame(do.call(rbind, mget(ls(pattern="million_cities_*"))))
V1 V2 V3 V4 V5 V6 V7 V8 V9
million_cities_Canada Toronto Montreal Calgary Ottawa Edmonton Toronto Montreal Calgary Ottawa
million_cities_UK London Birmingham London Birmingham London Birmingham London Birmingham London
million_cities_USA New York Los Angeles Chicago Houston Phoenix Philadelphia San Antonio San Diego Dallas
Regarding the first link, I suppose that this can be avoided with dplyr's bind_rows. However, I couldn't combine the vectors at all with bind_rows. The error message indicated that bind_rows needs vectors with identical lengths to work:
library(dplyr)
as.data.frame(mget(ls(pattern="million_cities_*")) %>%
bind_rows())
Error: Argument 2 must be length 5, not 2
How can I combine all vectors with the same name pattern by row and leave the missing columns of the shorter vectors empty instead of inserting the vector elements again?
CodePudding user response:
You can create a list column in a data frame and then unnest it.
library(dplyr)
library(tidyr)
l <- mget(ls(pattern="million_cities_*"))
tibble(cities = l) %>%
unnest_wider("cities)
The main annoyance is the message about "new names" that you will get for each row.
# A tibble: 3 x 9
...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Toronto Montreal Calgary Ottawa Edmonton NA NA NA NA
2 London Birmingham NA NA NA NA NA NA NA
3 New York Los Angeles Chicago Houston Phoenix Philadelphia San Antonio San Diego Dallas
You can avoid that by jumping to purrr
so that you can create named tibble rows.
library(purrr)
library(tibble)
map_dfr(l, ~ as_tibble_row(set_names(.x, seq.int(.x))))
# A tibble: 3 x 9
`1` `2` `3` `4` `5` `6` `7` `8` `9`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Toronto Montreal Calgary Ottawa Edmonton NA NA NA NA
2 London Birmingham NA NA NA NA NA NA NA
3 New York Los Angeles Chicago Houston Phoenix Philadelphia San Antonio San Diego Dallas
CodePudding user response:
This is one solution. You can use as.data.frame()
on the transpose t()
of the vector and then bind_rows()
for each vector and it won't repeat.
library(dplyr)
as.data.frame(t(million_cities_USA)) %>%
bind_rows(as.data.frame(t(million_cities_UK))) %>%
bind_rows(as.data.frame(t(million_cities_Canada)))
CodePudding user response:
Using base R
lst1 <- mget(ls(pattern = 'million_cities'))
mx <- max(lengths(lst1))
out <- do.call(rbind.data.frame, lapply(lst1, `length<-`,mx))
names(out) <- paste0("col", seq_along(out))
-output
> out
col1 col2 col3 col4 col5 col6 col7 col8 col9
1 Toronto Montreal Calgary Ottawa Edmonton <NA> <NA> <NA> <NA>
2 London Birmingham <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 New York Los Angeles Chicago Houston Phoenix Philadelphia San Antonio San Diego Dallas