I am working on a project using medication history of patients, and I want to ask your help for this. The database contains start dates of medication in random order, and I would like to number the medications in order of use.
So I would like to transform:
ID 001 002 003
medA 2001 2005 2003
medB 1999 2000 2015
medC 2019 2014 2000
To:
ID 001 002 003
medA 1 3 2
medB 1 2 3
medC 3 2 1
The real database has 700 subjects and 10 medications.
Is there a way to do this in R?
Thanks in avance for your help!
NB this is my first post, please let me know if I'm doing something wrong forum-wise :)
CodePudding user response:
If you want to keep the original columns:
df[, paste0("rank", 1:3)] <- t(apply(df[,2:4], 1, rank))
CodePudding user response:
Here's an approach:
library(tidyverse)
tribble(
~ID, ~"001", ~"002", ~"003",
"medA", 2001, 2005, 2003,
"medB", 1999, 2000, 2015,
"medC", 2019, 2014, 2000
) |>
pivot_longer(- ID) |>
group_by(ID) |>
mutate(rank = rank(value)) |>
select(-value) |>
pivot_wider(names_from = name, values_from = rank)
#> # A tibble: 3 × 4
#> # Groups: ID [3]
#> ID `001` `002` `003`
#> <chr> <dbl> <dbl> <dbl>
#> 1 medA 1 3 2
#> 2 medB 1 2 3
#> 3 medC 3 2 1
Created on 2022-04-28 by the reprex package (v2.0.1)
CodePudding user response:
Another approach in base R:
#Your data
mydf <- structure(list(
ID = c("medA", "medB", "medC"),
`001` = c(2001L,1999L, 2019L),
`002` = c(2005L, 2000L, 2014L),
`003` = c(2003L, 2015L, 2000L)),
class = "data.frame",
row.names = c(NA, -3L))
# Transform
mydf[,2:4] <- t(apply(mydf[,2:4], 1, order))
# Result
mydf
ID 001 002 003
1 medA 1 3 2
2 medB 1 2 3
3 medC 3 2 1
In case more explanation is helpful:
order
is a function that returns the order of a numeric vector, such as a numeric row or a numeric column. For example:order(c(6,4,5))
returns2 3 1
.mydf[, 2:4]
means the second column up to the fourth column ofmydf
.apply
is a function that apply another function to each row or each column of a data frame or a matrix. In your case,order
is to be applied to each row ofmydf[, 2:4]
so the index1
is used. If a function is to be applied to each column, the index2
should be used.t
is a function to transpose a matrix or a data frame. In this case, it is used to restore the values in each row because whenorder
is applied, the results are returned as columns, so they are transposed to be rows again.