Home > Blockchain >  How to find the max and min values of string rows of a dataframe in R?
How to find the max and min values of string rows of a dataframe in R?

Time:01-13

For each row of my data, I want to get the min and max values which are originally stored as a character. For example, consider the following data:

df <- data.frame(id=c(1:3),
                 yr=c("2000,2009,1999,2022","2019,2018,2006,2007","1998,2012,2000,2020"))

Output needed:

id                   yr  min_yr    max_yr
1   2000,2009,1999,2022    1999      2022
2   2019,2018,2006,2007    2006      2019
3   1998,2012,2000,2020    1998      2020

CodePudding user response:

This will work also for years like 860, 1543, 2023, ...

df[c("min_yr", "max_yr")] <-
   t(sapply(strsplit(df$yr, ","), \(x) range(as.numeric(x))))

df
#  id                  yr min_yr max_yr
#1  1 2000,2009,1999,2022   1999   2022
#2  2 2019,2018,2006,2007   2006   2019
#3  3 1998,2012,2000,2020   1998   2020

CodePudding user response:

Here's one-liner in base R that also works on any number.

df[c('min_yr', 'max_yr')] <- t(sapply(df$yr, \(x) range(scan(text=x, sep = ','))))

Resulting in

df
#>   id                  yr min_yr max_yr
#> 1  1 2000,2009,1999,2022   1999   2022
#> 2  2 2019,2018,2006,2007   2006   2019
#> 3  3 1998,2012,2000,2020   1998   2020

CodePudding user response:

df$min_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),min)))
df$max_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),max)))

  id                  yr min_yr max_yr
1  1 2000,2009,1999,2022   1999   2022
2  2 2019,2018,2006,2007   2006   2019
3  3 1998,2012,2000,2020   1998   2020

CodePudding user response:

using dplyr and purrr:

library(dplyr)
library(purrr)
mutate(df, strsplit(yr, ",") |>
           map(as.numeric) |>
           map(range) |>
           map_dfr(setNames, c("min", "max")))

##>   id                  yr  min  max
##> 1  1 2000,2009,1999,2022 1999 2022
##> 2  2 2019,2018,2006,2007 2006 2019
##> 3  3 1998,2012,2000,2020 1998 2020

CodePudding user response:

library(stringr)
library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(min_yr = min(as.numeric(str_split_1(yr, ","))),
         max_yr = max(as.numeric(str_split_1(yr, ","))))
 
     id yr                  min_yr max_yr
  <int> <chr>                <dbl>  <dbl>
1     1 2000,2009,1999,2022   1999   2022
2     2 2019,2018,2006,2007   2006   2019
3     3 1998,2012,2000,2020   1998   2020

CodePudding user response:

Using pmin/pmax from base R - read the yr column with read.csv to create a data.frame and then use pmin/pmax

d1 <- read.csv(text = df$yr, header = FALSE)
df$min_yr <- do.call(pmin, d1)
df$max_yr <- do.call(pmax, d1)

-output

> df
  id                  yr min_yr max_yr
1  1 2000,2009,1999,2022   1999   2022
2  2 2019,2018,2006,2007   2006   2019
3  3 1998,2012,2000,2020   1998   2020
  • Related