Home > OS >  How to subset a list of data.frames?
How to subset a list of data.frames?

Time:04-09

I have a list of data.frames and want to subset a part of the dataframes. In this case, I want to subset the rainfall and yield in sublist1 and sublist2 of mainlist and create a new list called mainlist_new.

But I am facing this error Error in mainlist[[1]][[, 2:3]] : incorrect number of subscripts.

Any ideas and thoughts?

here is code and created data

> df1 <- data.frame(station = c("MADA1", "MADA2", "MADA3", "MADA4", "MADA5"),
                   rainfall = c(0, 5, 10, 15, 20),
                   yield = c(2000, 3000, 4000, 5000, 6000))
> df2 <- df1
> df3 <- df1
> 
> list_1 <- list(df1, df2, df3)
> 
> list_2 <- list(df1, df2, df3)
> 
> mainlist <- list(list_1, list_2)
> names(mainlist) <- c("sublist1", 'sublist2')
> names(mainlist[[1]]) <- c("station", "rainfall", "yield")
> names(mainlist[[2]]) <-  c("station", "rainfall", "yield")
> 
> names(mainlist)
[1] "sublist1" "sublist2"
> names(mainlist[[1]])
[1] "station"  "rainfall" "yield"   
> 
> # subset `rainfall` and `yield` is sublist1 and sublist2 and create a a new list
> mainlist_new <- list()
> mainlist_new[[1]] <- mainlist[[1]][[,2:3]] 
Error in mainlist[[1]][[, 2:3]] : incorrect number of subscripts
> mainlist_new[[2]] <- mainlist[[2]][[,2:3]] 
Error in mainlist[[2]][[, 2:3]] : incorrect number of subscripts
>

CodePudding user response:

If we want to subset the list elements based on names

mainlist_new <- lapply(mainlist, `[`, c("rainfall", "yield"))

-output

> str(mainlist_new)
List of 2
 $ :List of 2
  ..$ rainfall:'data.frame':    5 obs. of  3 variables:
  .. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
  .. ..$ rainfall: num [1:5] 0 5 10 15 20
  .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
  ..$ yield   :'data.frame':    5 obs. of  3 variables:
  .. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
  .. ..$ rainfall: num [1:5] 0 5 10 15 20
  .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
 $ :List of 2
  ..$ rainfall:'data.frame':    5 obs. of  3 variables:
  .. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
  .. ..$ rainfall: num [1:5] 0 5 10 15 20
  .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
  ..$ yield   :'data.frame':    5 obs. of  3 variables:
  .. ..$ station : chr [1:5] "MADA1" "MADA2" "MADA3" "MADA4" ...
  .. ..$ rainfall: num [1:5] 0 5 10 15 20
  .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000

CodePudding user response:

If you want to subset a list by index at multiple positions you can use the single square brackets [ instead of double [[. Then it should behave as you expect. The difference between these subsetting operators is nicely explained here and here.

In your example, all you need to do is change mainlist_new[[1]] <- mainlist[[1]][[,2:3]] to mainlist_new[[1]] <- mainlist[[1]][2:3].

Simple example here:

# create list
l <- as.list(1:4)
# subset list by index
l[2:3]
#> [[1]]
#> [1] 2
#> 
#> [[2]]
#> [1] 3

Created on 2022-04-08 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related