How to create new row to ensure time series length is equal?-CodePudding

I am trying to perform to classify effectiveness of a treatment. Each of the id should contain 4 timeframes.

Dataframe

id	timeframe	distance
1	1	1.1
1	2	1.1
1	3	1.2
1	4	1.1
2	1	1.1
2	2	1.1
2	4	1.1

The question is for example id 2 timeframe #3 is missing. How to create a new row added in the missing timeframe with the average distance value for all the rows with such issue?

I am getting the 'not all time is the same length' when running - Longitudinal clustering using "longitudinal k-means (KML)"

CodePudding user response：

We can use complete to create the missing combination and then replace the NA with the mean

library(dplyr)
library(tidyr)
df1 %>%
    mutate(rn = row_number()) %>%
    complete(id, timeframe) %>%
    mutate(distance = replace(distance, is.na(distance) & is.na(rn), 
          mean(distance, na.rm = TRUE)))

If the mean should be calculated within each 'id', then do a group_by before the mutate

df1 %>%
    mutate(rn = row_number()) %>%
    complete(id, timeframe) %>%
    group_by(id) %>%
    mutate(distance = replace(distance, is.na(distance) & is.na(rn), 
          mean(distance, na.rm = TRUE))) %>%
    ungroup