Home > Net >  Row operation based on reference category R
Row operation based on reference category R

Time:12-08

I have a dataframe:

df <- data.frame(category = rep(c('Cat','Dog','Chicken'),2),
                 value = c(3,4,5,4,6,7),
                 time = c(rep(1,3),rep(2,3))

  category value time
       Cat     3    1
       Dog     4    1
   Chicken     5    1
       Cat     4    2
       Dog     6    2
   Chicken     7    2

How can you calculate the difference between a reference variable (in this case 'Cat') and the other variables for each time point. The output I am looking for would be like this:

  category value time
       Cat     0    1
       Dog     1    1
   Chicken     2    1
       Cat     0    2
       Dog     2    2
   Chicken     3    2

CodePudding user response:

Assuming the reference value occurs only once within each time point, you can dplyr::group_by(time), then use logical indexing to get the value where category == "cat":

library(dplyr)

df %>%
  group_by(time) %>%
  mutate(value = value - value[category == "Cat"]) %>%
  ungroup()
# A tibble: 6 × 3
  category value  time
  <chr>    <dbl> <dbl>
1 Cat          0     1
2 Dog          1     1
3 Chicken      2     1
4 Cat          0     2
5 Dog          2     2
6 Chicken      3     2
  • Related