Home > OS >  Add unique identifier column to a nested list of data frames
Add unique identifier column to a nested list of data frames

Time:10-12

I've created a nested list of data frames (note the data frames have different lengths):

[[1]]
[[1]][[1]]
# A tibble: 25 × 5

[[2]]
[[2]][[1]]
# A tibble: 35 × 5

[[3]]
[[3]][[1]]
# A tibble: 20 × 5

...       

I tried to use mapply() to generate a column containing a unique identifier for each data frame in my list. The column would be based on two sequences of numbers: 1:10 and 1:100. For example, the column in the first data frame would contain 1.1, the second would contain 2.1, and so on, all the way up to 10.100.

a.b <- apply(expand.grid(c(1:10), c(1:100)), 1, paste, collapse = '.')
mapply(cbind, dfs_list, "Identifier" = a.b, SIMPLIFY = F)

However, the column is inserted to the parent list, instead of directly to the data frames:

[[1]]
               Identifier    
[1,] tbl_df,11 "1.1"

[[2]]
               Identifier    
[1,] tbl_df,11 "2.1"

[[3]]
               Identifier    
[1,] tbl_df,11 "3.1"

...

Is this a syntax error or do I need to be using a different approach entirely?


[Update]

After some trial and error, I attempted a slightly different approach since I first wrote this post. At first I thought I'd solved my problem, but the list generated was 13 GB instead of 18 MB, and took (relatively) much, much longer to write. Not fully grasping the differences between the various apply functions, something tells me I'm still off.

apply(dfs_list, function(z) mapply(cbind, z, "Identifier" = a.b, SIMPLIFY = F))

CodePudding user response:

You may try this approach with lapply and Map -

result <- lapply(seq_along(dfs_list), function(x) {
  Map(cbind, dfs_list[[x]], 
             Identifier = paste(x, seq_along(dfs_list[[x]]), sep = '.'))
})

result

#[[1]]
#[[1]][[1]]
#                   mpg cyl disp  hp drat Identifier
#Mazda RX4         21.0   6  160 110 3.90        1.1
#Mazda RX4 Wag     21.0   6  160 110 3.90        1.1
#Datsun 710        22.8   4  108  93 3.85        1.1
#Hornet 4 Drive    21.4   6  258 110 3.08        1.1
#Hornet Sportabout 18.7   8  360 175 3.15        1.1

#[[1]][[2]]
#                   mpg cyl disp  hp drat Identifier
#Mazda RX4 Wag     21.0   6  160 110 3.90        1.2
#Datsun 710        22.8   4  108  93 3.85        1.2
#Hornet 4 Drive    21.4   6  258 110 3.08        1.2
#Hornet Sportabout 18.7   8  360 175 3.15        1.2


#[[2]]
#[[2]][[1]]
#                   mpg cyl disp  hp drat Identifier
#Mazda RX4         21.0   6  160 110 3.90        2.1
#Mazda RX4 Wag     21.0   6  160 110 3.90        2.1
#Datsun 710        22.8   4  108  93 3.85        2.1
#Hornet 4 Drive    21.4   6  258 110 3.08        2.1
#Hornet Sportabout 18.7   8  360 175 3.15        2.1

data

It is easier to help if you provide data in a reproducible format

dfs_list <- list(list(mtcars[1:5, 1:5],mtcars[2:5, 1:5]), list(mtcars[1:5, 1:5]))

CodePudding user response:

Map(`names<-`, dfs_list, a.b)

This gives each list item the name you made. It doesn't say "Identifier", but I think this is what you are after.

Edit:

Map(function(x, y) list(cbind(x[[1]], "Identifier" = y)), dfs_list, a.b)

This gives a new column of identifiers. x[[1]] is to get inside the nested structure, and list() re-creates the original nesting structure. Map is the same as mapply(..., simplify = FALSE)

  • Related