I am working with the R programming language. I have a list ("my_list") that looks something like this - each element in the list (e.g. [[i]]) has a different number of subelements (e.g. [[i]][j]) :
> my_list
my_list
[[1]]
[1] "subelement1" "subelement2" "subelement3"
[[2]]
[1] "subelement1" "subelement2" "subelement3" "subelement4" "subelement5"
[[3]]
[1] "subelement1" "subelement2" "subelement3" "subelement4" "subelement5"
[[4]]
[1] "subelement1" "subelement2" "subelement3" "subelement4" "subelement5"
> summary(my_list)
Length Class Mode
[1,] 3 -none- character
[2,] 5 -none- character
[3,] 5 -none- character
[4,] 5 -none- character
[5,] 5 -none- character
[6,] 5 -none- character
[7,] 5 -none- character
[8,] 5 -none- character
[9,] 5 -none- character
[10,] 5 -none- character
[11,] 5 -none- character
[12,] 6 -none- character
For each element in this list, I want to extract each of these subelement and make them into a dataframe all together (each row in this dataframe will not necessarily have the same number of columns). Since I don't the maximum number of subelements, I tried to find out the maximum number of subelements - but some parsing is still involved (many entries in the "Length" column are not numbers for some reason?):
summary = summary(my_list)
> summary
Var1 Var2 Freq
1 A Length 3
2 B Length 5
3 C Length 5
4 D Length 5
5 E Length 5
6 F Length 5
7 G Length 5
8 H Length 5
####
96 R3 Length 5
97 S3 Length 5
98 T3 Length 5
99 U3 Length 5
100 V3 Length 5
####
101 A Class -none-
102 B Class -none-
103 C Class -none-
104 D Class -none-
######
296 R3 Mode character
297 S3 Mode character
298 T3 Mode character
299 U3 Mode character
300 V3 Mode character
Next:
summary = data.frame(summary)
freq = as.numeric(gsub("([0-9] ).*$", "\\1", summary$Freq))
freq = freq[!is.na(freq)]
> max(freq)
[1] 6
With this very "roundabout way" - I now know there at most 6 subelements, and I can create 6 corresponding columns:
col1 = sapply(my_list,function(x) x[1])
col2 = sapply(my_list,function(x) x[2])
col3 = sapply(my_list,function(x) x[3])
col4 = sapply(my_list,function(x) x[4])
col5 = sapply(my_list,function(x) x[5])
col6 = sapply(my_list,function(x) x[6])
#final answer : desired output
final_data = data.frame(col1, col2, col3, col4, col5, col6)
My Question: Would there have been an easier way to find out the maximum number of subelements in this list and then create a data frame with the correct number of columns? I.e. Is there an "automatic" way to create a data frame with the same number of columns as subelements in the list and name these columns accordingly (e.g. col1, col2, col3, etc.)?
Thanks!
CodePudding user response:
Your solution is functional, so obviously take this with a grain of salt, but it's possible to find the maximum length of a sublist with one loop.
max_length <- 0
lapply(my_list, \(x){if (length(x) > max_length){max_length = length(x)} }
> max_length
[1] 6
To make a dataframe with the corresponding columns a similar approach can be used:
#create an empty dataframe to add rows to
df <- data.frame(matrix(ncol = max_length, nrow = 0))
colnames(df) <- sprintf("raster[%d]",seq(1:max_length))
#add rows
lapply(listanswer, \(x){df[nrow(df) 1,] <- x})
See this post regarding sprintf
. Since you need to know the maximum row length going in, two loops are necessary, one to find the max length, and one to fill the data frame.
CodePudding user response:
Try this
mx <- max(sapply(my_list , length))
df <- do.call(rbind , lapply(my_list , \(x) if(length(x) == mx) x
else c(x , rep(NA , mx - length(x)))))
df <- data.frame(df)
colnames(df) <- paste0("col" , 1:mx)
- output
col1 col2 col3 col4 col5
1 subelement1 subelement2 subelement3 <NA> <NA>
2 subelement1 subelement2 subelement3 subelement4 subelement5
3 subelement1 subelement2 subelement3 subelement4 subelement5
4 subelement1 subelement2 subelement3 subelement4 subelement5