Home > database >  How to get and work with the information from a column through a for loop in R?
How to get and work with the information from a column through a for loop in R?

Time:11-09

I have a list of column names and my intention is run a for loop to get the information for each column in order to work with it later.

For instance, Imagine that I have this dataframe:

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4

and I only want to work with some columns. So, I create a vector with the columns that I want:

MyList <- c("mpg", "cyl")

Therefore, for each column I want to calculate the min and max value. (I want to create a function with the code).

I created this function, but it doesn't work because it doesn't recognize the columns itself.

my_func <- function(DF, MyList){
    for(element in MyList){
    print(DF$element) # it doesn't work
    
    print(c(min(DF$element), max(DF$element))) # it doesn't work
  }
}

#Calling the function
my_func(DF=mtcars, MyList = Mylist)

What I get:

NULL
[1]  Inf -Inf
NULL
[1]  Inf -Inf
Warning messages:
1: In min(DF$element) : no non-missing arguments to min; returning Inf
2: In max(DF$element) : no non-missing arguments to max; returning -Inf
3: In min(DF$element) : no non-missing arguments to min; returning Inf
4: In max(DF$element) : no non-missing arguments to max; returning -Inf

I know that I can use dplyr::select(DF, element) to select the columns that I want, but, although I save it in a variable, I cannot continue with the next step (calculating the min and max).

I would like to have something like this *but with every column that I have in the list)

print(c(min(mtcars$mpg), max(mtcars$mpg)))
[1] 10.4 33.9

Could someone help me with this?

Thanks very much in advance

Regards

CodePudding user response:

You could use across and from there you can reshape to your liking:

mtcars %>%
  summarize(across(c(MyList), list(min = min, max = max), .names = "{.col}_{.fn}"))

which gives:

  mpg_min mpg_max cyl_min cyl_max
1    10.4    33.9       4       8

Or:

mtcars %>%
  summarize(across(c(MyList), list(min = min, max = max), .names = "{.col}_{.fn}")) %>%
  pivot_longer(everything(), names_to = c("column", "stat"), names_sep = "_")

which gives:

# A tibble: 4 x 3
  column stat  value
  <chr>  <chr> <dbl>
1 mpg    min    10.4
2 mpg    max    33.9
3 cyl    min     4  
4 cyl    max     8  

CodePudding user response:

maybe:

library(tidyverse)
map(MyList,  ~ mtcars %>% select(.x) %>% 
      summarise(across(everything(), list(~min(.), ~max(.)))))
# [[1]]
#   mpg_1 mpg_2
# 1  10.4  33.9

# [[2]]
#   cyl_1 cyl_2
# 1     4     8

CodePudding user response:

Try this , it worked for me

library(tidyr)
mtcars %>%  
  select(mpg,cyl) %>%  
  summarize(min_mpg=min(mpg),
        max_mpg=max(mpg),
        min_cyl=min(cyl),
        max_cyl=max(cyl))

CodePudding user response:

One way to do this is to write a function that returns the min and max values of a vector:

myMinMax <- function(x)(c('min'=min(x),'max'=max(x)))

and then ?apply it to the columns you are interested in, i.e. mtcars[,MyList].

    apply(mtcars[,MyList], 2, myMinMax)
     mpg cyl
min 10.4   4
max 33.9   8

A tidyverse solution might be more readable:

library(tidyverse)
mtcars %>% select(MyList) %>% summarise(across(MyList,  list('min'=min, 'max'=max)))

You could also mix-and-match the two approaches:

mtcars %>% select(MyList) %>% summarise(across(MyList,  myMinMax))

CodePudding user response:

For some reason $ doesn't work in for loops, so the solution that I found is:

my_func <- function(DF, MyList){
  for(element in MyList){
    print(DF[,element])
    print(c(min(DF[,element]), max(DF[,element])))
  }
}
  • Related