Add dataframe column containing minimum value of a list-CodePudding

I have a dataframe containing three columns, two of which can contain either numeric values or lists. I would like to add additional columns containing the min / max values of each of these two columns. For example, my data frame may look like;

df <- structure(list(ID = c(1L, 2L, 3L), A = structure(list(
    5, c(0.5, 0.6), 2), names = c("", "", "")), B = structure(list(
    c(0.2, 0.3), 6, c(0.1, 0.1)), names = c("", "", ""))), row.names = c(NA, 
3L), class = "data.frame")

I would like to mutate this to add the columns;

ID	A	B	min_A	max_A	min_B	max_B
1	5	0.2, 0.3	5	5	0.2	0.3
2	0.5, 0.6	6	0.5	0.6	6	6
3	2	0.1, 0.1	2	2	0.1	0.1

I have tried mutate(min_A = min(unlist(A))), but this seems to take the minimum value of the entire column of A rather than just the list on any given row. mutate(min_A = min(A)) errors out because list is an invalid argument type for the min command. So how might I go about adding the data I'm after?

CodePudding user response：

You should able to get the answer by adding rowwise(). I also used across() in my answer but that part isn't 100% necessary, just a little more efficient:

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(across(A:B, function(x) min(unlist(x)), .names = "min_{.col}")) %>%
  mutate(across(A:B, function(x) max(unlist(x)), .names = "max_{.col}"))

# A tibble: 3 × 7
# Rowwise: 
     ID A         B         min_A min_B max_A max_B
  <dbl> <list>    <list>    <dbl> <dbl> <dbl> <dbl>
1     1 <dbl [1]> <dbl [2]>   5     0.2   5     0.3
2     2 <dbl [2]> <dbl [1]>   0.5   6     0.6   6  
3     3 <dbl [1]> <dbl [2]>   2     0.1   2     0.1

CodePudding user response：

Base R with a loop:

cols <- c("A", "B")
for(col in cols){
  df[,paste0("min_", col)] <- sapply(df[,col], function(x) min(unlist(x)))
  df[,paste0("max_", col)] <- sapply(df[,col], function(x) max(unlist(x)))
}

CodePudding user response：

Using map with across

library(purrr)
library(dplyr)
df %>% 
 mutate(across(A:B,  ~map_dbl(.x, min), .names = 'min_{.col}'),
       across(A:B, ~ map_dbl(.x, max), .names = 'max_{.col}'))

-output

 ID        A        B min_A min_B max_A max_B
1  1        5 0.2, 0.3   5.0   0.2   5.0   0.3
2  2 0.5, 0.6        6   0.5   6.0   0.6   6.0
3  3        2 0.1, 0.1   2.0   0.1   2.0   0.1