I've created a nested list of data frames (note the data frames have different lengths):
[[1]]
[[1]][[1]]
# A tibble: 25 × 5
[[2]]
[[2]][[1]]
# A tibble: 35 × 5
[[3]]
[[3]][[1]]
# A tibble: 20 × 5
...
I tried to use mapply()
to generate a column containing a unique identifier for each data frame in my list. The column would be based on two sequences of numbers: 1:10
and 1:100
. For example, the column in the first data frame would contain 1.1, the second would contain 2.1, and so on, all the way up to 10.100.
a.b <- apply(expand.grid(c(1:10), c(1:100)), 1, paste, collapse = '.')
mapply(cbind, dfs_list, "Identifier" = a.b, SIMPLIFY = F)
However, the column is inserted to the parent list, instead of directly to the data frames:
[[1]]
Identifier
[1,] tbl_df,11 "1.1"
[[2]]
Identifier
[1,] tbl_df,11 "2.1"
[[3]]
Identifier
[1,] tbl_df,11 "3.1"
...
Is this a syntax error or do I need to be using a different approach entirely?
[Update]
After some trial and error, I attempted a slightly different approach since I first wrote this post. At first I thought I'd solved my problem, but the list generated was 13 GB instead of 18 MB, and took (relatively) much, much longer to write. Not fully grasping the differences between the various apply functions, something tells me I'm still off.
apply(dfs_list, function(z) mapply(cbind, z, "Identifier" = a.b, SIMPLIFY = F))
CodePudding user response:
You may try this approach with lapply
and Map
-
result <- lapply(seq_along(dfs_list), function(x) {
Map(cbind, dfs_list[[x]],
Identifier = paste(x, seq_along(dfs_list[[x]]), sep = '.'))
})
result
#[[1]]
#[[1]][[1]]
# mpg cyl disp hp drat Identifier
#Mazda RX4 21.0 6 160 110 3.90 1.1
#Mazda RX4 Wag 21.0 6 160 110 3.90 1.1
#Datsun 710 22.8 4 108 93 3.85 1.1
#Hornet 4 Drive 21.4 6 258 110 3.08 1.1
#Hornet Sportabout 18.7 8 360 175 3.15 1.1
#[[1]][[2]]
# mpg cyl disp hp drat Identifier
#Mazda RX4 Wag 21.0 6 160 110 3.90 1.2
#Datsun 710 22.8 4 108 93 3.85 1.2
#Hornet 4 Drive 21.4 6 258 110 3.08 1.2
#Hornet Sportabout 18.7 8 360 175 3.15 1.2
#[[2]]
#[[2]][[1]]
# mpg cyl disp hp drat Identifier
#Mazda RX4 21.0 6 160 110 3.90 2.1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.1
#Datsun 710 22.8 4 108 93 3.85 2.1
#Hornet 4 Drive 21.4 6 258 110 3.08 2.1
#Hornet Sportabout 18.7 8 360 175 3.15 2.1
data
It is easier to help if you provide data in a reproducible format
dfs_list <- list(list(mtcars[1:5, 1:5],mtcars[2:5, 1:5]), list(mtcars[1:5, 1:5]))
CodePudding user response:
Map(`names<-`, dfs_list, a.b)
This gives each list item the name you made. It doesn't say "Identifier", but I think this is what you are after.
Edit:
Map(function(x, y) list(cbind(x[[1]], "Identifier" = y)), dfs_list, a.b)
This gives a new column of identifiers. x[[1]] is to get inside the nested structure, and list() re-creates the original nesting structure. Map is the same as mapply(..., simplify = FALSE)