I have a dataset that contains measurements taken at different points in time. I would like to calculate the percentage of times a measurement in one time period is followed by the same measurement in the next time period. I want to know how often each row has the same measurement from one period to the next. How can I do this?
Sample data:
structure(list(t1 = c(1, 2, 1), t2 = c(1, 1, 1), t3 = c(1, 3,
4), t4 = c(2, 2, 2), t5 = c(3, 3, 3), t6 = c(3, 3, 3), t7 = c(1,
1, 1)), row.names = c(NA, -3L), spec = structure(list(cols = list(
t1 = structure(list(), class = c("collector_double", "collector"
)), t2 = structure(list(), class = c("collector_double",
"collector")), t3 = structure(list(), class = c("collector_double",
"collector")), t4 = structure(list(), class = c("collector_double",
"collector")), t5 = structure(list(), class = c("collector_double",
"collector")), t6 = structure(list(), class = c("collector_double",
"collector")), t7 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
CodePudding user response:
To compare each time period to the previous time period, it's probably easiest to put the data in long form and compare to the lag:
library(dplyr)
library(tidyr)
timedata |>
mutate(id = row_number()) |>
pivot_longer(
-id,
names_to = "time"
) |>
group_by(id) |>
mutate(nochange = value == lag(value)) |>
group_by(time) |>
summarise(
num_repeated = sum(nochange, na.rm = TRUE),
percent_repeated = num_repeated / n() * 100
)
# A tibble: 7 x 2
# time num_repeated percent_repeated
# <chr> <int> <dbl>
# 1 t1 0 0
# 2 t2 2 66.7
# 3 t3 1 33.3
# 4 t4 0 0
# 5 t5 0 0
# 6 t6 3 100
# 7 t7 0 0
CodePudding user response:
If you call your dataframe df
. Then:
equal <- as.data.frame(NA)
for (i in 1:(length(df)-1)) {
for (j in 1:nrow(df)) {
equal[j,i] <- df[j,i]== df[j, i 1]
}
}
sum(equal[TRUE])*100/(nrow(df)* length(df))
Notice that this compares whether t1= t2 (no comparisons are possible in the last column because there are no 'posterior' measurements)