Home > Software engineering >  for loop to change columns with a specified unique length to factor in multiple dataframes
for loop to change columns with a specified unique length to factor in multiple dataframes

Time:10-05

I have several dataframes for which I need to fix the classes of multiple columns, before I can proceed. Because the dataframes all have the same variables but the classes seemed to differ from one dataframe to the other, I figured I would go for a 'for loop'and specify the unique length upon which a column should be coded as factor or numeric.

I tried the following for factor:

dataframes <- list(dataframe1, dataframe2, dataframe2, dataframe3)

for (i in dataframes){

cols.to.factor <-sapply(i, function(col) length(unique(col)) < 6)

i[cols.to.factor] <- apply(i[cols.to.factor] , factor)
}

now the code runs, but it doesn't change anything. What am I missing? Thanks for the help in advance!

CodePudding user response:

library(tidyverse)
# example data
list(
  iris,
  iris %>% mutate(Sepal.Length = Sepal.Length %>% as.character())
) %>%
  # unify column classes
  map(~ .x %>% mutate(across(everything(), as.character))) %>%
  # optional joining if wished
  bind_rows() %>%
  mutate(Species = Species %>% as.factor()) %>%
  as_tibble()
#> # A tibble: 300 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>    <chr>        <chr>       <chr>        <chr>       <fct>  
#>  1 5.1          3.5         1.4          0.2         setosa 
#>  2 4.9          3           1.4          0.2         setosa 
#>  3 4.7          3.2         1.3          0.2         setosa 
#>  4 4.6          3.1         1.5          0.2         setosa 
#>  5 5            3.6         1.4          0.2         setosa 
#>  6 5.4          3.9         1.7          0.4         setosa 
#>  7 4.6          3.4         1.4          0.3         setosa 
#>  8 5            3.4         1.5          0.2         setosa 
#>  9 4.4          2.9         1.4          0.2         setosa 
#> 10 4.9          3.1         1.5          0.1         setosa 
#> # … with 290 more rows

Created on 2021-10-05 by the reprex package (v2.0.1)

CodePudding user response:

The instruction

for(i in dataframes)

extracts i from the list dataframes and the loop changes the copy, that is never reassigned to the original. A way to correct the problem is

for (i in seq_along(dataframes)){
  x <- dataframes[[i]]
  cols.to.factor <-sapply(x, function(col) length(unique(col)) < 6)
  x[cols.to.factor] <- lapply(x[cols.to.factor] , factor)
  dataframes[[i]] <- x
}

An equivalent lapply based solution is

dataframes <- lapply(dataframes, \(x){
  cols.to.factor <- sapply(x, function(col) length(unique(col)) < 6)
  x[cols.to.factor] <- lapply(x[cols.to.factor], factor)
  x
})
  • Related