Home > Back-end >  How to repeat a categorical variable that was determined at a start point, and continues for the dur
How to repeat a categorical variable that was determined at a start point, and continues for the dur

Time:08-30

For one of my analyses I would like to evaluate the course of hemoglobin levels (art_hb) of different subjects (id) for a certain duration (nmp_time). We would like to put the id's in different categories of hemoglobin levels (0-3; 3-6; 6-9 and >9), based on its first measurement. So if the levels of hemoglobin changes over time, we don't want it to switch categories.

My example data: df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA))

So far I've managed to create the categories based on the measurement of hemoglobin at time==0 df$art_hb_cat <- ifelse(df$art_hb < 3 & df$time == 0, "0-3", ifelse(df$art_hb >= 3 & df$art_hb < 6 & df$time == 0, "3-6", ifelse(df$art_hb >= 6 & df$art_hb < 9 & df$time == 0, "6-9", ifelse(df$art_hb > 9 & df$time == 0, ">9", ""))))

Which leads to: df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA)), art_hb_cat=c("3-6","","","6-9","","")

Now I would like to copy these categories for the id's (-> group_by(id)),to end up with a df like: df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA)), art_hb_cat=c("3-6","3-6","3-6","6-9","6-9","6-9")

But I did not manage to, after trying for a couple of days. Could anyone help my out? Many, many thanks in advance.

P.S. It's my first post, so I hope this is clear enough. Sorry.

CodePudding user response:

You could use cut instead of ifelse, and apply it on art_hb when time == 0 for each group of id:

library(dplyr)
df %>% 
  group_by(id) %>% 
  mutate(art_hb_cat = cut(art_hb[time == 0],
                          breaks = c(0, 3, 6, 9, Inf), 
                          labels = c("0-3", "3-6", "6-9", ">9")))
  id     time art_hb art_hb_cat
  <fct> <dbl>  <dbl> <fct>     
1 1         0    5.8 3-6       
2 1        30    6.1 3-6       
3 1        60    5.9 3-6       
4 2         0    6.7 6-9       
5 2        30    6.9 6-9       
6 2        60   NA   6-9       

CodePudding user response:

We may use case_when after grouping by 'id'

library(dplyr)
df %>% 
  group_by(id) %>%
   mutate(art_hb_cat = case_when(art_hb < 3 & time == 0 ~ "0-3", art_hb >=3 & art_hb <6 & time == 0 ~ "3-6", art_hb>=6 & art_hb < 9 & time  == 0 ~ "6-9", art_hb > 9 & time == 0 ~ ">9")[1]) %>%
   ungroup

-output

# A tibble: 6 × 4
  id     time art_hb art_hb_cat
  <fct> <dbl>  <dbl> <chr>     
1 1         0    5.8 3-6       
2 1        30    6.1 3-6       
3 1        60    5.9 3-6       
4 2         0    6.7 6-9       
5 2        30    6.9 6-9       
6 2        60   NA   6-9   

Or with data.table

library(data.table)
setDT(df)[df[time == 0, .(id, art_hb_cat = fcase(between(art_hb, 0,
  3), "0-3", between(art_hb, 3, 6), "3-6", between(art_hb, 6, 9), 
    "6-9", default = ">9"))], on = .(id)]
       id  time art_hb art_hb_cat
   <fctr> <num>  <num>     <char>
1:      1     0    5.8        3-6
2:      1    30    6.1        3-6
3:      1    60    5.9        3-6
4:      2     0    6.7        6-9
5:      2    30    6.9        6-9
6:      2    60     NA        6-9
  •  Tags:  
  • r
  • Related