Home > front end >  Add dataframe column containing minimum value of a list
Add dataframe column containing minimum value of a list

Time:06-23

I have a dataframe containing three columns, two of which can contain either numeric values or lists. I would like to add additional columns containing the min / max values of each of these two columns. For example, my data frame may look like;

df <- structure(list(ID = c(1L, 2L, 3L), A = structure(list(
    5, c(0.5, 0.6), 2), names = c("", "", "")), B = structure(list(
    c(0.2, 0.3), 6, c(0.1, 0.1)), names = c("", "", ""))), row.names = c(NA, 
3L), class = "data.frame")

I would like to mutate this to add the columns;

ID A B min_A max_A min_B max_B
1 5 0.2, 0.3 5 5 0.2 0.3
2 0.5, 0.6 6 0.5 0.6 6 6
3 2 0.1, 0.1 2 2 0.1 0.1

I have tried mutate(min_A = min(unlist(A))), but this seems to take the minimum value of the entire column of A rather than just the list on any given row. mutate(min_A = min(A)) errors out because list is an invalid argument type for the min command. So how might I go about adding the data I'm after?

CodePudding user response:

You should able to get the answer by adding rowwise(). I also used across() in my answer but that part isn't 100% necessary, just a little more efficient:

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(across(A:B, function(x) min(unlist(x)), .names = "min_{.col}")) %>%
  mutate(across(A:B, function(x) max(unlist(x)), .names = "max_{.col}"))

# A tibble: 3 × 7
# Rowwise: 
     ID A         B         min_A min_B max_A max_B
  <dbl> <list>    <list>    <dbl> <dbl> <dbl> <dbl>
1     1 <dbl [1]> <dbl [2]>   5     0.2   5     0.3
2     2 <dbl [2]> <dbl [1]>   0.5   6     0.6   6  
3     3 <dbl [1]> <dbl [2]>   2     0.1   2     0.1

CodePudding user response:

Base R with a loop:

cols <- c("A", "B")
for(col in cols){
  df[,paste0("min_", col)] <- sapply(df[,col], function(x) min(unlist(x)))
  df[,paste0("max_", col)] <- sapply(df[,col], function(x) max(unlist(x)))
}

CodePudding user response:

Using map with across

library(purrr)
library(dplyr)
df %>% 
 mutate(across(A:B,  ~map_dbl(.x, min), .names = 'min_{.col}'),
       across(A:B, ~ map_dbl(.x, max), .names = 'max_{.col}'))

-output

 ID        A        B min_A min_B max_A max_B
1  1        5 0.2, 0.3   5.0   0.2   5.0   0.3
2  2 0.5, 0.6        6   0.5   6.0   0.6   6.0
3  3        2 0.1, 0.1   2.0   0.1   2.0   0.1
  •  Tags:  
  • r
  • Related