Home > database >  How to find the max and min number of string rows of dataframe in R?
How to find the max and min number of string rows of dataframe in R?

Time:01-13

For each row of my data, I want to get the min and max values which are originally stored as a character. For example, consider the following data:

df <- data.frame(id=c(1:3),
                 yr=c("2000,2009,1999,2022","2019,2018,2006,2007","1998,2012,2000,2020"))

Output needed:

id                   yr  min_yr    max_yr
1   2000,2009,1999,2022    1999      2022
2   2019,2018,2006,2007    2006      2019
3   1998,2012,2000,2020    1998      2020

CodePudding user response:

This will work also for years like 860, 1543, 2023, ...

df[c("min_yr", "max_yr")] <-
   t(sapply(strsplit(df$yr, ","), \(x) range(as.numeric(x))))

df
#  id                  yr min_yr max_yr
#1  1 2000,2009,1999,2022   1999   2022
#2  2 2019,2018,2006,2007   2006   2019
#3  3 1998,2012,2000,2020   1998   2020

CodePudding user response:

Here's one-liner in base R that also works on any number.

df[c('min_yr', 'max_yr')] <- t(sapply(df$yr, \(x) range(scan(text=x, sep = ','))))

Resulting in

df
#>   id                  yr min_yr max_yr
#> 1  1 2000,2009,1999,2022   1999   2022
#> 2  2 2019,2018,2006,2007   2006   2019
#> 3  3 1998,2012,2000,2020   1998   2020

CodePudding user response:

df$min_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),min)))
df$max_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),max)))

  id                  yr min_yr max_yr
1  1 2000,2009,1999,2022   1999   2022
2  2 2019,2018,2006,2007   2006   2019
3  3 1998,2012,2000,2020   1998   2020

CodePudding user response:

using dplyr and purrr:

library(dplyr)
library(purrr)
mutate(df, strsplit(yr, ",") |>
           map(as.numeric) |>
           map(range) |>
           map_dfr(setNames, c("min", "max")))

##>   id                  yr  min  max
##> 1  1 2000,2009,1999,2022 1999 2022
##> 2  2 2019,2018,2006,2007 2006 2019
##> 3  3 1998,2012,2000,2020 1998 2020

CodePudding user response:

library(stringr)
library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(min_yr = min(as.numeric(str_split_1(yr, ","))),
         max_yr = max(as.numeric(str_split_1(yr, ","))))
 
     id yr                  min_yr max_yr
  <int> <chr>                <dbl>  <dbl>
1     1 2000,2009,1999,2022   1999   2022
2     2 2019,2018,2006,2007   2006   2019
3     3 1998,2012,2000,2020   1998   2020

CodePudding user response:

Using pmin/pmax from base R - read the yr column with read.csv to create a data.frame and then use pmin/pmax

d1 <- read.csv(text = df$yr, header = FALSE)
df$min_yr <- do.call(pmin, d1)
df$max_yr <- do.call(pmax, d1)

-output

> df
  id                  yr min_yr max_yr
1  1 2000,2009,1999,2022   1999   2022
2  2 2019,2018,2006,2007   2006   2019
3  3 1998,2012,2000,2020   1998   2020
  • Related