Home > front end >  Add rows to list of dataframes from another dataframe
Add rows to list of dataframes from another dataframe

Time:10-29

Let's have a list lis

chicago = data.frame('city' = rep('chicago'), 'year' = c(2018,2019,2020), 'population' = c(100, 105, 110))
paris = data.frame('city' = rep('paris'), 'year' = c(2018,2019,2020), 'population' = c(200, 205, 210))
berlin = data.frame('city' = rep('berlin'), 'year' = c(2018,2019,2020), 'population' = c(300, 305, 310))
bangalore = data.frame('city' = rep('bangalore'), 'year' = c(2018,2019,2020), 'population' = c(400, 405, 410))
lis = list(chicago = chicago, paris = paris, berlin = berlin, bangalore = bangalore)

Now I have a new df containing latest data for each city,

df = data.frame('city' = c('chicago', 'paris', 'berlin', 'bangalore'), 'year' = rep(2021), 'population' = c(115, 215, 315, 415))

I want to add each row of df to lis based on city.

I do it by,

#convert to datframe
lis = dplyr::bind_rows(lis)
#rbind
lis = rbind(lis, df)
#again convert to list
lis = split(lis, lis$city)

which is inefficient for large datsets. Is their any efficient alternate for large datsets?

Thank you.

Edit

Unit: seconds
 expr      min       lq     mean   median       uq      max neval
 ac() 22.43719 23.17452 27.85401 24.80335 25.62127 43.23373     5

The list contains 2239 dataframes and dimension of each dataframe is 310x15. Each of these dataframe grow daily.

CodePudding user response:

We may use imap to loop over the list, and filter the 'df' based on the names of the list to append the row in each of the list elements

library(dplyr)
library(purrr)
lis2 <- imap(lis, ~ .x %>%
                   bind_rows(df %>%
                              filter(city == .y)))

-output

> lis2
$chicago
     city year population
1 chicago 2018        100
2 chicago 2019        105
3 chicago 2020        110
4 chicago 2021        115

$paris
   city year population
1 paris 2018        200
2 paris 2019        205
3 paris 2020        210
4 paris 2021        215

$berlin
    city year population
1 berlin 2018        300
2 berlin 2019        305
3 berlin 2020        310
4 berlin 2021        315

$bangalore
       city year population
1 bangalore 2018        400
2 bangalore 2019        405
3 bangalore 2020        410
4 bangalore 2021        415

Or using base R with Map and rbind

Map(function(x, nm) rbind(x, df[df$city == nm,]), lis, names(lis))

Or use rbindlist from data.table

library(data.table)
rbindlist(c(lis, list(df)))[, .(split(.SD, city))]$V1

Or a slightly more efficient, will be with split

Map(rbind, lis, split(df, df$city)[names(lis)])
  • Related