For one of my analyses I would like to evaluate the course of hemoglobin levels (art_hb) of different subjects (id) for a certain duration (nmp_time). We would like to put the id's in different categories of hemoglobin levels (0-3; 3-6; 6-9 and >9), based on its first measurement. So if the levels of hemoglobin changes over time, we don't want it to switch categories.
My example data:
df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA))
So far I've managed to create the categories based on the measurement of hemoglobin at time==0
df$art_hb_cat <- ifelse(df$art_hb < 3 & df$time == 0, "0-3", ifelse(df$art_hb >= 3 & df$art_hb < 6 & df$time == 0, "3-6", ifelse(df$art_hb >= 6 & df$art_hb < 9 & df$time == 0, "6-9", ifelse(df$art_hb > 9 & df$time == 0, ">9", ""))))
Which leads to: df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA)), art_hb_cat=c("3-6","","","6-9","","")
Now I would like to copy these categories for the id's (-> group_by(id)),to end up with a df like:
df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA)), art_hb_cat=c("3-6","3-6","3-6","6-9","6-9","6-9")
But I did not manage to, after trying for a couple of days. Could anyone help my out? Many, many thanks in advance.
P.S. It's my first post, so I hope this is clear enough. Sorry.
CodePudding user response:
You could use cut
instead of ifelse
, and apply it on art_hb
when time == 0
for each group
of id
:
library(dplyr)
df %>%
group_by(id) %>%
mutate(art_hb_cat = cut(art_hb[time == 0],
breaks = c(0, 3, 6, 9, Inf),
labels = c("0-3", "3-6", "6-9", ">9")))
id time art_hb art_hb_cat
<fct> <dbl> <dbl> <fct>
1 1 0 5.8 3-6
2 1 30 6.1 3-6
3 1 60 5.9 3-6
4 2 0 6.7 6-9
5 2 30 6.9 6-9
6 2 60 NA 6-9
CodePudding user response:
We may use case_when
after grouping by 'id'
library(dplyr)
df %>%
group_by(id) %>%
mutate(art_hb_cat = case_when(art_hb < 3 & time == 0 ~ "0-3", art_hb >=3 & art_hb <6 & time == 0 ~ "3-6", art_hb>=6 & art_hb < 9 & time == 0 ~ "6-9", art_hb > 9 & time == 0 ~ ">9")[1]) %>%
ungroup
-output
# A tibble: 6 × 4
id time art_hb art_hb_cat
<fct> <dbl> <dbl> <chr>
1 1 0 5.8 3-6
2 1 30 6.1 3-6
3 1 60 5.9 3-6
4 2 0 6.7 6-9
5 2 30 6.9 6-9
6 2 60 NA 6-9
Or with data.table
library(data.table)
setDT(df)[df[time == 0, .(id, art_hb_cat = fcase(between(art_hb, 0,
3), "0-3", between(art_hb, 3, 6), "3-6", between(art_hb, 6, 9),
"6-9", default = ">9"))], on = .(id)]
id time art_hb art_hb_cat
<fctr> <num> <num> <char>
1: 1 0 5.8 3-6
2: 1 30 6.1 3-6
3: 1 60 5.9 3-6
4: 2 0 6.7 6-9
5: 2 30 6.9 6-9
6: 2 60 NA 6-9