I'm trying to calculate the difference between abundance at the time points C1 and C0. I'd like to do this for the different genes, so I've used group_by for the genes, but can't figure out how to find the difference in abundance at the different time points.
Here is one of my attempts:
IgH_CDR3_post_challenge_unique_vv <- IgH_CDR3_post_challenge_unique_v %>%
group_by(gene ) %>%
mutate(increase_in_abundance = (abunance[Timepoint=='C1'])-(abunance[Timepoint=='C0'])) %>%
ungroup()
My data looks something like this:
gene | Timepoint | abundance |
---|---|---|
1 | C0 | 5 |
2 | C1 | 3 |
1 | C1 | 6 |
3 | C0 | 2 |
CodePudding user response:
Assuming (!) you will have one entry per gene and timepoint (as opposed to the table posted in the question), you can pivot_wider
your data and then calculate the difference for every gene. The current example, of course, isn't very helpful with mostly missings.
df <- data.frame(gene = c(1, 2, 1, 3),
Timepoint = c("c0", "c1", "c1", "c0"),
abundance = c(5, 3, 6, 2))
library(tidyverse)
df %>%
pivot_wider(names_from = Timepoint,
values_from = abundance,
id_cols = gene) %>%
mutate(increase_in_abundance = c1 - c0)
# A tibble: 3 x 4
gene c0 c1 increase_in_abundance
<dbl> <dbl> <dbl> <dbl>
1 1 5 6 1
2 2 NA 3 NA
3 3 2 NA NA