I want an indicator variable that tells me if the date is the lowest or highest when its group by ID. However, I don't want it to count anything that has practice as 1. This is what the data frame looks now and what I want it to look like.
ID | date | practice |
---|---|---|
1 | 02-27-2020 | 1 |
1 | 04-21-2021 | 0 |
1 | 06-24-2022 | 0 |
2 | 03-21-2019 | 0 |
2 | 09-19-2020 | 0 |
2 | 01-21-2021 | 0 |
End product:
ID | date | practice | lowest | highest |
---|---|---|---|---|
1 | 02-27-2020 | 1 | 0 | 0 |
1 | 04-21-2021 | 0 | 1 | 0 |
1 | 06-24-2022 | 0 | 0 | 1 |
2 | 03-21-2019 | 0 | 1 | 0 |
2 | 09-19-2020 | 0 | 0 | 0 |
2 | 01-21-2021 | 0 | 0 | 1 |
CodePudding user response:
This code delivered the desired solution using the tidyverse
. Note that I had to force date
into a date format; by default the mm-dd-YYYY format gets read as character, which causes min(date)
and max(date)
to read different values as min and max.
x<-'
ID date practice
1 02-27-2020 1
1 04-21-2021 0
1 06-24-2022 0
2 03-21-2019 0
2 09-19-2020 0
2 01-21-2021 0'
df1 <- read.table(textConnection(x), header = TRUE)
library(tidyverse)
df1$date <- as.Date(df1$date, format = "%m-%d-%Y")
desired_result <- df1 %>%
group_by(ID) %>%
mutate(
lowest = ifelse(date == min(date[practice == 0]), 1, 0),
highest = ifelse(date == max(date[practice == 0]), 1, 0)
)
desired_result
# A tibble: 6 × 5
# Groups: ID [2]
ID date practice lowest highest
<int> <date> <int> <dbl> <dbl>
1 1 2020-02-27 1 0 0
2 1 2021-04-21 0 1 0
3 1 2022-06-24 0 0 1
4 2 2019-03-21 0 1 0
5 2 2020-09-19 0 0 0
6 2 2021-01-21 0 0 1
CodePudding user response:
Here is a base R solution with ave
.
x<-'
ID date practice
1 02-27-2020 1
1 04-21-2021 0
1 06-24-2022 0
2 03-21-2019 0
2 09-19-2020 0
2 01-21-2021 0'
df1 <- read.table(textConnection(x), header = TRUE)
df1$date <- as.Date(df1$date, "%m-%d-%Y")
y1 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
if(length(x))
min(x) == x
else NULL
}))
y2 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
if(length(x))
max(x) == x
else NULL
}))
df1$lowest <- as.integer(y1 & (df1$practice != 1))
df1$highest <- as.integer(y2 & (df1$practice != 1))
df1
#> ID date practice lowest highest
#> 1 1 2020-02-27 1 0 0
#> 2 1 2021-04-21 0 1 0
#> 3 1 2022-06-24 0 0 1
#> 4 2 2019-03-21 0 1 0
#> 5 2 2020-09-19 0 0 0
#> 6 2 2021-01-21 0 0 1
Created on 2022-04-25 by the reprex package (v2.0.1)
CodePudding user response:
A data.table approach:
f <- function(x,p) list(1*(x==min(x[p!=1])), 1*(x==max(x[p!=1])))
setDT(df)[,date:=as.IDate(date, "%m-%d-%Y")][,c("lowest","highest"):=f(date,practice), by=ID][]
Output:
ID date practice lowest highest
<int> <IDat> <int> <num> <num>
1: 1 2020-02-27 1 0 0
2: 1 2021-04-21 0 1 0
3: 1 2022-06-24 0 0 1
4: 2 2019-03-21 0 1 0
5: 2 2020-09-19 0 0 0
6: 2 2021-01-21 0 0 1
Input:
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), date = c("02-27-2020",
"04-21-2021", "06-24-2022", "03-21-2019", "09-19-2020", "01-21-2021"
), practice = c(1L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, -6L
), class = "data.frame")