For each row of my data, I want to get the min and max values which are originally stored as a character. For example, consider the following data:
df <- data.frame(id=c(1:3),
yr=c("2000,2009,1999,2022","2019,2018,2006,2007","1998,2012,2000,2020"))
Output needed:
id yr min_yr max_yr
1 2000,2009,1999,2022 1999 2022
2 2019,2018,2006,2007 2006 2019
3 1998,2012,2000,2020 1998 2020
CodePudding user response:
This will work also for years like 860
, 1543
, 2023
, ...
df[c("min_yr", "max_yr")] <-
t(sapply(strsplit(df$yr, ","), \(x) range(as.numeric(x))))
df
# id yr min_yr max_yr
#1 1 2000,2009,1999,2022 1999 2022
#2 2 2019,2018,2006,2007 2006 2019
#3 3 1998,2012,2000,2020 1998 2020
CodePudding user response:
Here's one-liner in base R that also works on any number.
df[c('min_yr', 'max_yr')] <- t(sapply(df$yr, \(x) range(scan(text=x, sep = ','))))
Resulting in
df
#> id yr min_yr max_yr
#> 1 1 2000,2009,1999,2022 1999 2022
#> 2 2 2019,2018,2006,2007 2006 2019
#> 3 3 1998,2012,2000,2020 1998 2020
CodePudding user response:
df$min_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),min)))
df$max_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),max)))
id yr min_yr max_yr
1 1 2000,2009,1999,2022 1999 2022
2 2 2019,2018,2006,2007 2006 2019
3 3 1998,2012,2000,2020 1998 2020
CodePudding user response:
using dplyr
and purrr
:
library(dplyr)
library(purrr)
mutate(df, strsplit(yr, ",") |>
map(as.numeric) |>
map(range) |>
map_dfr(setNames, c("min", "max")))
##> id yr min max
##> 1 1 2000,2009,1999,2022 1999 2022
##> 2 2 2019,2018,2006,2007 2006 2019
##> 3 3 1998,2012,2000,2020 1998 2020
CodePudding user response:
library(stringr)
library(dplyr)
df %>%
rowwise() %>%
mutate(min_yr = min(as.numeric(str_split_1(yr, ","))),
max_yr = max(as.numeric(str_split_1(yr, ","))))
id yr min_yr max_yr
<int> <chr> <dbl> <dbl>
1 1 2000,2009,1999,2022 1999 2022
2 2 2019,2018,2006,2007 2006 2019
3 3 1998,2012,2000,2020 1998 2020
CodePudding user response:
Using pmin/pmax
from base R
- read the yr column with read.csv
to create a data.frame and then use pmin/pmax
d1 <- read.csv(text = df$yr, header = FALSE)
df$min_yr <- do.call(pmin, d1)
df$max_yr <- do.call(pmax, d1)
-output
> df
id yr min_yr max_yr
1 1 2000,2009,1999,2022 1999 2022
2 2 2019,2018,2006,2007 2006 2019
3 3 1998,2012,2000,2020 1998 2020