Home > Software design >  Obtaining values in one variable (height/weight) based on when it was collected (dates)
Obtaining values in one variable (height/weight) based on when it was collected (dates)

Time:11-12

I'm working with a dataset where I have the date that a given value (weight) was collected, and then the weight (for that date). Some participants have multiple weights in the dataset because they have come back more than once; others only have one weight value. Is there an easy way to ask R to provide a new dataframe with one value per person, based on the earliest date? (And by default, those with only one value are included)?

I'm wondering if it would be advantageous to group by a subject ID and get their mean weight value (as I don't anticipate it may fluctuate drastically). But to be consistent, grouping based on the earliest/first weight recorded would be ideal. I'm thinking possibly a function in the 'lubridate' package would be useful, but I'm not 100%.

CodePudding user response:

Sort by date, group by id, then take the first row per group:

library(dplyr)

weights %>% 
  arrange(date) %>% 
  group_by(id) %>% 
  slice(1) %>% 
  ungroup()

#> # A tibble: 3 × 3
#>      id date       weight
#>   <int> <date>      <dbl>
#> 1     1 2021-03-15   182.
#> 2     2 2021-05-12   133.
#> 3     3 2021-08-09   151.

Example data:

set.seed(13)
weights <- tibble::tibble(
  id = rep(1:3, each = 3),
  date = lubridate::ymd("2021-01-01")   sample(0:364, 9),
  weight = rnorm(9, 160, 20)
)

weights

#> # A tibble: 9 × 3
#>      id date       weight
#>   <int> <date>      <dbl>
#> 1     1 2021-09-16   165.
#> 2     1 2021-12-23   153.
#> 3     1 2021-03-15   182.
#> 4     2 2021-07-24   138.
#> 5     2 2021-09-19   169.
#> 6     2 2021-05-12   133.
#> 7     3 2021-11-16   123.
#> 8     3 2021-08-09   151.
#> 9     3 2021-09-05   156.

Created on 2022-11-11 with reprex v2.0.2

  • Related