i am working with quite a large longitudinal dataset in which the same individuals are tested at more than one timepoint (in long format so that one individual has more than one row). i would like to create new variables with the prefix "first_" (for example on the variable "age", making the new variable "first_age"). i would like this first-variable to have the age from the first timepoint this individual was tested (ie the lowest value on "timepoint"). also, i want this new variable to be non-timevarying, so that the same "first age" is the value on each row for that specific individual.
hopefully i can make this clearer with some example data:
ID <- c(1010, 1010, 1011, 1011, 1012, 1012, 1013, 1013)
timepoint <- c(2, 5, 3, 6, 5, 6, 3, 5)
age <- c(55, 59, 60, 67, 71, 73, 58, 64)
first_age <- c(55, 55, 60, 60, 71, 71, 58, 58)
cell_prop <- c(0.7, 0.1, 0.5, 0.6, 0.7, 0.4, 0.8, 0.1)
first_cell <- c(0.7, 0.7, 0.5, 0.5, 0.7, 0.7, 0.8, 0.8)
df<-data.frame(matrix(ncol = 0, nrow = length(ID)))
df$ID <- ID
df <- df %>%
mutate(timepoint = timepoint,
age = age,
cell_prop = cell_prop)
based on this, i want to create two new variables, called "first_age" and "first_cell". i want these to have the values from "age" and "cell_prop", respectively, from the lowest timepoint-value for each ID. so for ID 1010 i want the "first_age" to be 55 in both 1010-rows, and the "first_cell" to be 0.7 in both.
basically, i want it to look like this (with additional variables "first_age" and "first_cell"):
df <- df %>%
mutate(first_age = first_age,
first_cell = first_cell)
does someone know how i would do this, preferably using tidyverse?
it works to do it like this:
df <- df %>%
group_by(ID) %>%
mutate(first_age = min(age))
the problem here is that the new value in "first_age" is based on the lowest value in "age". it is fine for this variable (since age doesn't decrease with time), BUT, in the variable "cell_prop" there are cases when the lowest value is measured at a later timepoint, which won't work. i still want the values from the first timepoint.
i have also tried to do it like this:
df <- df %>%
group_by(ID) %>%
mutate(first_age = age==min(timepoint))
but here, the new variable "first_age" only has "FALSE" as values for every row.
what i would like to do is to write it like this (in words):
df <- df %>% group_by(ID) %>% mutate(first_age = age at min(timepoint))
it feels like this shouldn't be that hard, but i can't figure out how to code this "at", indicating that i want the value in the new variable to be the same as in "age" at the lowest value in "timepoint".
can someone please help me? i am really new to R programming so just let me know if there is something missing in my question! super grateful for help
CodePudding user response:
Using fill()
from the tidyr
-package:
library(dplyr)
library(tidyr)
df %>% group_by(ID) %>%
mutate(first_age = ifelse(timepoint == min(timepoint),age,NA),
first_cell_prop = ifelse(timepoint == min(timepoint),cell_prop,NA)) %>%
fill(first_age,first_cell_prop)
Which gives
# A tibble: 8 x 6
# Groups: ID [4]
ID timepoint age cell_prop first_age first_cell_prop
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1010 2 55 0.7 55 0.7
2 1010 5 59 0.1 55 0.7
3 1011 3 60 0.5 60 0.5
4 1011 6 67 0.6 60 0.5
5 1012 5 71 0.7 71 0.7
6 1012 6 73 0.4 71 0.7
7 1013 3 58 0.8 58 0.8
8 1013 5 64 0.1 58 0.8
This should work, assuming you don't have duplicate timepoints with different age and cell_prop values
CodePudding user response:
Another approach is to make a summary table of the minimum values and join them back in. I find this approach a bit safer than filling because you don't have to worry about something going wrong with the sort order of your data (easily corrected by sorting, of course).
library(dplyr)
group_mins <- df %>%
group_by(ID) %>%
slice(which.min(timepoint)) %>%
rename(first_age = age,
first_cell_prop = cell_prop) %>%
select(-timepoint)
df %>%
left_join(group_mins, by = c("ID"))
# ID timepoint age cell_prop first_age first_cell_prop
# 1 1010 2 55 0.7 55 0.7
# 2 1010 5 59 0.1 55 0.7
# 3 1011 3 60 0.5 60 0.5
# 4 1011 6 67 0.6 60 0.5
# 5 1012 5 71 0.7 71 0.7
# 6 1012 6 73 0.4 71 0.7
# 7 1013 3 58 0.8 58 0.8
# 8 1013 5 64 0.1 58 0.8