I have the following dataframe
df =
id date medication related_medication
1 2017-02-18 A yes
1 2017-02-07 D yes
2 2017-02-18 S yes
2 2017-02-18 F no
3 2017-02-18 A yes
3 2017-02-01 F yes
I want to take only the minimum date on which related medications appeared per id. In the above example we have only indivoduals 1 and 3 with 2 related_medication (sinc this variable take a value yes). For this reason I would love to take the minimum date of when the appear. The resulting table should look like the following:
df =
id date medication related_medication
1 2017-02-07 D yes
2 2017-02-18 S yes
2 2017-02-18 F no
3 2017-02-01 F yes
so far I have tried:
df_final <- df %>%
slice(which.min(date))
but I dont find the way to carry this operation only when a certain condition is met, i.e. related_medication == "yes"
cheers
CodePudding user response:
You can use slice_min
library(dplyr)
df %>%
group_by(id, related_medication) %>%
slice_min(date)
output
id date medication related_medication
1 1 2017-02-07 D yes
2 2 2017-02-18 F no
3 2 2017-02-18 S yes
4 3 2017-02-01 F yes
If one wants to keep all observations if related_medication == "no"
.
df %>%
group_by(id) %>%
filter(date[related_medication == "yes"] == min(date[related_medication == "yes"]) |
related_medication == "no")
CodePudding user response:
Using data.table
library(data.table)
setDT(df)[, .SD[which.min(date)], .(id, related_medication)]
-output
id related_medication date medication
<int> <char> <Date> <char>
1: 1 yes 2017-02-07 D
2: 2 yes 2017-02-18 S
3: 2 no 2017-02-18 F
4: 3 yes 2017-02-01 F