Home > Blockchain >  Dividing second row by first per group
Dividing second row by first per group

Time:08-03

I have data in the following format:

name = c("john", "john", "jack", "jack", "jason", "jason")
time_to_run_100_meters_last_year_this_year = c(22.3, 22.1, 12.4, 12.3, 15.1, 15.6)

my_data = data.frame(name, time_to_run_100_meters_last_year_this_year)


   name time_to_run_100_meters_last_year_this_year
1  john                                       22.3
2  john                                       22.1
3  jack                                       12.4
4  jack                                       12.3
5 jason                                       15.1
6 jason                                       15.6

I want to find out how the percent change in time for each student. This would mean: (22.1/22.3, 12.3/12.4, 15.6/15.1).

I thought of the following way to solve this problem:

library(dplyr)

my_data = my_data %>% 
  arrange(name) %>%
  group_by(name) %>% 
  mutate(id = row_number()) %>%
  ungroup()


id_1 =  my_data[which(my_data$id == 1), ]

id_2 =  my_data[which(my_data$id == 2), ]

division =  id_2$time_to_run_100_meters_last_year_this_year/id_1$time_to_run_100_meters_last_year_this_year

unique = unique(my_data$name)

final_data = data.frame(unique, division)

In the end, I think my idea worked:

> final_data
  unique  division
1   jack 0.9919355
2  jason 1.0331126
3   john 0.9910314

My Question: But are there better ways to solve this problem?

CodePudding user response:

Using first and last from dplyr could be another option given that you have only two observations per name:

library(dplyr)

my_data |> 
  group_by(name) |> 
  summarize(division = last(time_to_run_100_meters_last_year_this_year)/first(time_to_run_100_meters_last_year_this_year)) |>
  ungroup()

Output:

# A tibble: 3 × 2
  name  division
  <chr>    <dbl>
1 jack     0.992
2 jason    1.03 
3 john     0.991

CodePudding user response:

You can use group_by and summarize in the package dplyr.

Use lead for the value behind the current row and use na.omit to ignore NA in the calculation.

library(dplyr)

final_data <- 
  my_data %>% 
  group_by(name) %>% 
  summarize(division = na.omit(lead(time_to_run_100_meters_last_year_this_year)/time_to_run_100_meters_last_year_this_year))

final_data
# A tibble: 3 × 2
  name  division
  <chr>    <dbl>
1 jack     0.992
2 jason    1.03 
3 john     0.991
  • Related