Dividing second row by first per group-CodePudding

I have data in the following format:

name = c("john", "john", "jack", "jack", "jason", "jason")
time_to_run_100_meters_last_year_this_year = c(22.3, 22.1, 12.4, 12.3, 15.1, 15.6)

my_data = data.frame(name, time_to_run_100_meters_last_year_this_year)


   name time_to_run_100_meters_last_year_this_year
1  john                                       22.3
2  john                                       22.1
3  jack                                       12.4
4  jack                                       12.3
5 jason                                       15.1
6 jason                                       15.6

I want to find out how the percent change in time for each student. This would mean: (22.1/22.3, 12.3/12.4, 15.6/15.1).

I thought of the following way to solve this problem:

library(dplyr)

my_data = my_data %>% 
  arrange(name) %>%
  group_by(name) %>% 
  mutate(id = row_number()) %>%
  ungroup()


id_1 =  my_data[which(my_data$id == 1), ]

id_2 =  my_data[which(my_data$id == 2), ]

division =  id_2$time_to_run_100_meters_last_year_this_year/id_1$time_to_run_100_meters_last_year_this_year

unique = unique(my_data$name)

final_data = data.frame(unique, division)

In the end, I think my idea worked:

> final_data
  unique  division
1   jack 0.9919355
2  jason 1.0331126
3   john 0.9910314

My Question: But are there better ways to solve this problem?

CodePudding user response：

Using first and last from dplyr could be another option given that you have only two observations per name:

library(dplyr)

my_data |> 
  group_by(name) |> 
  summarize(division = last(time_to_run_100_meters_last_year_this_year)/first(time_to_run_100_meters_last_year_this_year)) |>
  ungroup()

Output:

# A tibble: 3 × 2
  name  division
  <chr>    <dbl>
1 jack     0.992
2 jason    1.03 
3 john     0.991

CodePudding user response：

You can use group_by and summarize in the package dplyr.

Use lead for the value behind the current row and use na.omit to ignore NA in the calculation.

library(dplyr)

final_data <- 
  my_data %>% 
  group_by(name) %>% 
  summarize(division = na.omit(lead(time_to_run_100_meters_last_year_this_year)/time_to_run_100_meters_last_year_this_year))

final_data
# A tibble: 3 × 2
  name  division
  <chr>    <dbl>
1 jack     0.992
2 jason    1.03 
3 john     0.991