Proportion calculation based on time-CodePudding

I have a dataset that contains measurements taken at different points in time. I would like to calculate the percentage of times a measurement in one time period is followed by the same measurement in the next time period. I want to know how often each row has the same measurement from one period to the next. How can I do this?

Sample data:

structure(list(t1 = c(1, 2, 1), t2 = c(1, 1, 1), t3 = c(1, 3, 
4), t4 = c(2, 2, 2), t5 = c(3, 3, 3), t6 = c(3, 3, 3), t7 = c(1, 
1, 1)), row.names = c(NA, -3L), spec = structure(list(cols = list(
    t1 = structure(list(), class = c("collector_double", "collector"
    )), t2 = structure(list(), class = c("collector_double", 
    "collector")), t3 = structure(list(), class = c("collector_double", 
    "collector")), t4 = structure(list(), class = c("collector_double", 
    "collector")), t5 = structure(list(), class = c("collector_double", 
    "collector")), t6 = structure(list(), class = c("collector_double", 
    "collector")), t7 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

CodePudding user response：

To compare each time period to the previous time period, it's probably easiest to put the data in long form and compare to the lag:

library(dplyr)
library(tidyr)

timedata |>
    mutate(id = row_number()) |>
    pivot_longer(
        -id,
        names_to = "time"
    ) |>
    group_by(id) |>
    mutate(nochange = value == lag(value)) |>
    group_by(time) |>
    summarise(
        num_repeated = sum(nochange, na.rm = TRUE), 
        percent_repeated = num_repeated / n() * 100
    )

# A tibble: 7 x 2
#   time  num_repeated percent_repeated
#   <chr>        <int>            <dbl>
# 1 t1               0              0
# 2 t2               2             66.7
# 3 t3               1             33.3
# 4 t4               0              0
# 5 t5               0              0
# 6 t6               3            100
# 7 t7               0              0

CodePudding user response：

If you call your dataframe df. Then:

equal <- as.data.frame(NA)

for (i in 1:(length(df)-1)) {
  for (j in 1:nrow(df)) {
    equal[j,i] <- df[j,i]== df[j, i 1]
  }
  }
sum(equal[TRUE])*100/(nrow(df)* length(df))

Notice that this compares whether t1= t2 (no comparisons are possible in the last column because there are no 'posterior' measurements)