Home > database >  How to change a certain columns in a list of data.frames to factor in r?
How to change a certain columns in a list of data.frames to factor in r?

Time:04-11

I have a list of data.frames, and want to change certain columns to a factor. The certain columns I want to change to a factor are c("station", "season"). I have tried various ways, but they did not work for me.

Any help please?

Here is a code for creating a data representing my dataset.

> df1 <- data.frame(station = c("MADA1", "MADA2", "MADA3", "MADA4", "MADA5"),
                   rainfall = c(0, 5, 10, 15, 20),
                   yield = c(2000, 3000, 4000, 5000, 6000),
                   season = c('S1', 'S1', 'S2', 'S2', 'S1'))
> df2 <- df1
> df3 <- df1
> 
> list_1 <- list(df1, df2, df3)
> list_2 <- list(df1, df2, df3)
> mainlist <- list(list_1, list_2)
> 
> lapply(mainlist, head)
[[1]]
[[1]][[1]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1

[[1]][[2]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1

[[1]][[3]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1


[[2]]
[[2]][[1]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1

[[2]][[2]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1

[[2]][[3]]
  station rainfall yield season
1   MADA1        0  2000     S1
2   MADA2        5  3000     S1
3   MADA3       10  4000     S2
4   MADA4       15  5000     S2
5   MADA5       20  6000     S1


CodePudding user response:

A possible solution, based on rrapply::rrapply (recursive apply):

rrapply::rrapply(mainlist, condition = \(x, .xname) .xname %in%
       c("station", "season"), f = \(x) as.factor(x))

#> List of 2
#>  $ :List of 3
#>   ..$ :'data.frame': 5 obs. of  4 variables:
#>   .. ..$ station : Factor w/ 5 levels "MADA1","MADA2",..: 1 2 3 4 5
#>   .. ..$ rainfall: num [1:5] 0 5 10 15 20
#>   .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
#>   .. ..$ season  : Factor w/ 2 levels "S1","S2": 1 1 2 2 1
#>   ..$ :'data.frame': 5 obs. of  4 variables:
#>   .. ..$ station : Factor w/ 5 levels "MADA1","MADA2",..: 1 2 3 4 5
#>   .. ..$ rainfall: num [1:5] 0 5 10 15 20
#>   .. ..$ yield   : num [1:5] 2000 3000 4000 5000 6000
#>   .. ..$ season  : Factor w/ 2 levels "S1","S2": 1 1 2 2 1
#> ...

CodePudding user response:

A purrr approach (depth to get to the second list "layer"):

mainlist %>% map_depth(., 2,~.x %>% mutate(across(c("station", "season"), as.factor)))

CodePudding user response:

You could use a nested lapply which is a bit cumbersome:

lapply(mainlist, 
       function(.x, .cols) { 
         lapply(.x, 
                function(.y) {
                  .y[.cols] <- lapply(.y[.cols], as.factor)
                  return(.y) 
                  }
                )
         }, 
       .cols = c("station", "season")
       )
  •  Tags:  
  • r
  • Related