How to calculate the difference between values in one column based on another column?-CodePudding

I'm trying to calculate the difference between abundance at the time points C1 and C0. I'd like to do this for the different genes, so I've used group_by for the genes, but can't figure out how to find the difference in abundance at the different time points.

Here is one of my attempts:


IgH_CDR3_post_challenge_unique_vv <- IgH_CDR3_post_challenge_unique_v %>% 
  group_by(gene ) %>% 
  mutate(increase_in_abundance = (abunance[Timepoint=='C1'])-(abunance[Timepoint=='C0'])) %>% 
  ungroup()

My data looks something like this:

gene	Timepoint	abundance
1	C0	5
2	C1	3
1	C1	6
3	C0	2

CodePudding user response：

Assuming (!) you will have one entry per gene and timepoint (as opposed to the table posted in the question), you can pivot_wider your data and then calculate the difference for every gene. The current example, of course, isn't very helpful with mostly missings.

df <- data.frame(gene = c(1, 2, 1, 3),
                 Timepoint = c("c0", "c1", "c1", "c0"),
                 abundance = c(5, 3, 6, 2))

library(tidyverse)

df %>%
  pivot_wider(names_from = Timepoint,
              values_from = abundance,
              id_cols = gene) %>%
  mutate(increase_in_abundance = c1 - c0)

# A tibble: 3 x 4
   gene    c0    c1 increase_in_abundance
  <dbl> <dbl> <dbl>                 <dbl>
1     1     5     6                     1
2     2    NA     3                    NA
3     3     2    NA                    NA