Home > Blockchain >  How to create new row to ensure time series length is equal?
How to create new row to ensure time series length is equal?

Time:09-22

I am trying to perform to classify effectiveness of a treatment. Each of the id should contain 4 timeframes.

Dataframe
id timeframe distance
1 1 1.1
1 2 1.1
1 3 1.2
1 4 1.1
2 1 1.1
2 2 1.1
2 4 1.1

The question is for example id 2 timeframe #3 is missing. How to create a new row added in the missing timeframe with the average distance value for all the rows with such issue?

I am getting the 'not all time is the same length' when running - Longitudinal clustering using "longitudinal k-means (KML)"

CodePudding user response:

We can use complete to create the missing combination and then replace the NA with the mean

library(dplyr)
library(tidyr)
df1 %>%
    mutate(rn = row_number()) %>%
    complete(id, timeframe) %>%
    mutate(distance = replace(distance, is.na(distance) & is.na(rn), 
          mean(distance, na.rm = TRUE)))

If the mean should be calculated within each 'id', then do a group_by before the mutate

df1 %>%
    mutate(rn = row_number()) %>%
    complete(id, timeframe) %>%
    group_by(id) %>%
    mutate(distance = replace(distance, is.na(distance) & is.na(rn), 
          mean(distance, na.rm = TRUE))) %>%
    ungroup
  • Related