Let's have a list lis
chicago = data.frame('city' = rep('chicago'), 'year' = c(2018,2019,2020), 'population' = c(100, 105, 110))
paris = data.frame('city' = rep('paris'), 'year' = c(2018,2019,2020), 'population' = c(200, 205, 210))
berlin = data.frame('city' = rep('berlin'), 'year' = c(2018,2019,2020), 'population' = c(300, 305, 310))
bangalore = data.frame('city' = rep('bangalore'), 'year' = c(2018,2019,2020), 'population' = c(400, 405, 410))
lis = list(chicago = chicago, paris = paris, berlin = berlin, bangalore = bangalore)
Now I have a new df
containing latest data for each city
,
df = data.frame('city' = c('chicago', 'paris', 'berlin', 'bangalore'), 'year' = rep(2021), 'population' = c(115, 215, 315, 415))
I want to add each row of df
to lis
based on city
.
I do it by,
#convert to datframe
lis = dplyr::bind_rows(lis)
#rbind
lis = rbind(lis, df)
#again convert to list
lis = split(lis, lis$city)
which is inefficient for large datsets. Is their any efficient alternate for large datsets?
Thank you.
Edit
Unit: seconds
expr min lq mean median uq max neval
ac() 22.43719 23.17452 27.85401 24.80335 25.62127 43.23373 5
The list contains 2239
dataframes and dimension of each dataframe is 310x15
. Each of these dataframe grow daily.
CodePudding user response:
We may use imap
to loop over the list
, and filter
the 'df' based on the names of the list
to append the row in each of the list
elements
library(dplyr)
library(purrr)
lis2 <- imap(lis, ~ .x %>%
bind_rows(df %>%
filter(city == .y)))
-output
> lis2
$chicago
city year population
1 chicago 2018 100
2 chicago 2019 105
3 chicago 2020 110
4 chicago 2021 115
$paris
city year population
1 paris 2018 200
2 paris 2019 205
3 paris 2020 210
4 paris 2021 215
$berlin
city year population
1 berlin 2018 300
2 berlin 2019 305
3 berlin 2020 310
4 berlin 2021 315
$bangalore
city year population
1 bangalore 2018 400
2 bangalore 2019 405
3 bangalore 2020 410
4 bangalore 2021 415
Or using base R
with Map
and rbind
Map(function(x, nm) rbind(x, df[df$city == nm,]), lis, names(lis))
Or use rbindlist
from data.table
library(data.table)
rbindlist(c(lis, list(df)))[, .(split(.SD, city))]$V1
Or a slightly more efficient, will be with split
Map(rbind, lis, split(df, df$city)[names(lis)])