I have data in the following format:
name = c("john", "john", "jack", "jack", "jason", "jason")
time_to_run_100_meters_last_year_this_year = c(22.3, 22.1, 12.4, 12.3, 15.1, 15.6)
my_data = data.frame(name, time_to_run_100_meters_last_year_this_year)
name time_to_run_100_meters_last_year_this_year
1 john 22.3
2 john 22.1
3 jack 12.4
4 jack 12.3
5 jason 15.1
6 jason 15.6
I want to find out how the percent change in time for each student. This would mean: (22.1/22.3, 12.3/12.4, 15.6/15.1).
I thought of the following way to solve this problem:
library(dplyr)
my_data = my_data %>%
arrange(name) %>%
group_by(name) %>%
mutate(id = row_number()) %>%
ungroup()
id_1 = my_data[which(my_data$id == 1), ]
id_2 = my_data[which(my_data$id == 2), ]
division = id_2$time_to_run_100_meters_last_year_this_year/id_1$time_to_run_100_meters_last_year_this_year
unique = unique(my_data$name)
final_data = data.frame(unique, division)
In the end, I think my idea worked:
> final_data
unique division
1 jack 0.9919355
2 jason 1.0331126
3 john 0.9910314
My Question: But are there better ways to solve this problem?
CodePudding user response:
Using first
and last
from dplyr
could be another option given that you have only two observations per name:
library(dplyr)
my_data |>
group_by(name) |>
summarize(division = last(time_to_run_100_meters_last_year_this_year)/first(time_to_run_100_meters_last_year_this_year)) |>
ungroup()
Output:
# A tibble: 3 × 2
name division
<chr> <dbl>
1 jack 0.992
2 jason 1.03
3 john 0.991
CodePudding user response:
You can use group_by
and summarize
in the package dplyr
.
Use lead
for the value behind the current row and use na.omit
to ignore NA
in the calculation.
library(dplyr)
final_data <-
my_data %>%
group_by(name) %>%
summarize(division = na.omit(lead(time_to_run_100_meters_last_year_this_year)/time_to_run_100_meters_last_year_this_year))
final_data
# A tibble: 3 × 2
name division
<chr> <dbl>
1 jack 0.992
2 jason 1.03
3 john 0.991