Home > database >  Create a field that tracks the order in which an item falls in a sequence based on a condition
Create a field that tracks the order in which an item falls in a sequence based on a condition

Time:12-14

I'm looking to track the order in which an item falls in a sequence. For example, the final product should look something like this.

df <- data.frame(paitent=c('Sally', 'Josh', 'Josh', 'Abram','Sally', 'Josh'),
                 visit=mdy(c('2/10/2022', '2/11/2022', '2/12/2022', '2/13/2022', '2/14/2022', '2/15/2022')),
                visit_count=c(1,1,2,1,2,3))
  paitent      visit
1   Sally 2022-02-10
2    Josh 2022-02-11
3    Josh 2022-02-12
4   Abram 2022-02-13
5   Sally 2022-02-14
6    Josh 2022-02-15

The 'visit_count' column would be automatically populated based on the Patient name and where it falls in sequence based on the date.

I'm not exactly sure where to go. I've looked into using the mutate and nrow() functions to count the rows but I'm having trouble finding a way to filter for the specific name and then only count the dates that are less than the current records date.

CodePudding user response:

dplyr

library(dplyr)
df %>%
  group_by(paitent) %>%
  mutate(visit_count2 = rank(visit, ties.method = "first")) %>%
  ungroup()
# # A tibble: 6 x 4
#   paitent visit      visit_count visit_count2
#   <chr>   <date>           <dbl>        <int>
# 1 Sally   2022-02-10           1            1
# 2 Josh    2022-02-11           1            1
# 3 Josh    2022-02-12           2            2
# 4 Abram   2022-02-13           1            1
# 5 Sally   2022-02-14           2            2
# 6 Josh    2022-02-15           3            3

base R

df$visit_count2 <- ave(as.numeric(df$visit), df$paitent, FUN = function(z) rank(z, ties.method = "first"))
df
#   paitent      visit visit_count visit_count2
# 1   Sally 2022-02-10           1            1
# 2    Josh 2022-02-11           1            1
# 3    Josh 2022-02-12           2            2
# 4   Abram 2022-02-13           1            1
# 5   Sally 2022-02-14           2            2
# 6    Josh 2022-02-15           3            3

data.table

library(data.table)
as.data.table(df)[, visit_count2 := rank(visit, ties.method = "first"), by = .(paitent)]

Data

df <- structure(list(paitent = c("Sally", "Josh", "Josh", "Abram", "Sally", "Josh"), visit = structure(c(19033, 19034, 19035, 19036, 19037, 19038), class = "Date"), visit_count = c(1, 1, 2, 1, 2, 3), visit_count2 = c(1, 1, 2, 1, 2, 3)), row.names = c(NA, -6L), class = "data.frame")

CodePudding user response:

We can group by patient, sort visit in ascending order and then create visit_count

df%>%
  group_by(paitent)%>%
  arrange(visit)%>%
  mutate(visit_count=row_number())

# A tibble: 6 x 3
# Groups:   paitent [3]
  paitent visit      visit_count
  <fct>   <date>           <int>
1 Sally   2022-02-10           1
2 Josh    2022-02-11           1
3 Josh    2022-02-12           2
4 Abram   2022-02-13           1
5 Sally   2022-02-14           2
6 Josh    2022-02-15           3

CodePudding user response:

library(dplyr)

df %>% 
  group_by(patient) %>% 
  mutate(visit_count =1:n())
  patient visit      visit_count
  <chr>   <date>           <int>
1 Sally   2022-02-10           1
2 Josh    2022-02-11           1
3 Josh    2022-02-12           2
4 Abram   2022-02-13           1
5 Sally   2022-02-14           2
6 Josh    2022-02-15           3
  • Related