Home > Enterprise >  Calculating multiple percents with multiple rows, grouping, then iterating over columns in R
Calculating multiple percents with multiple rows, grouping, then iterating over columns in R

Time:08-07

Longtime lurker, first time writer.

Using dataframe A, I am trying to calculate 4 percentages using multiple rows, grouped by a column. I then hope to iterate those same calculations over other columns, saving the outputs into dataframe B.

Dataframe A (output by another program) looks like this:

sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)

dat_a

   sample_number condition celltype_1 celltype_2
1              1         A       1220        950
2              1         B        800        850
3              1         C        700        450
4              1         D        300         50
5              1         E        200         50
6              2         A       1000       1650
7              2         B        900        550
8              2         C        500        750
9              2         D        100        250
10             2         E        100        150
11             3         A       1700       1150
12             3         B        600        750
13             3         C        800        650
14             3         D        300        250
15             3         E        200        150

I hope to calculate the following percentages using the values in columns celltype_1 & _2 that correspond with these letters in the condition column:

per_w = 100*((A - B)/(A-D))
per_x = 100 - per_w
per_y = 100*((A - C)/(A-D))
per_z = 100 - per_y

and output the results into dataframe B:

sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)
colnames(cell_matrix) <- c("sample_number","condition","celltype_1","celltype_2")

dat_b

  sample_number celltype per_w per_x per_y per_z
1             1        1    35    65    25    75
2             2        2    20    80    60    40
3             3        1    70    30    40    60
4             1        2    45    55    75    15
5             2        1    15    85     5    95
6             3        2    90    10    30    70

I have started different combinations of loops, group by(), and sapply(), but here is the most successful code thus far which calculates results for cell_type 1 (albeit without a perfectly formatted dataframe B), but doesn't yet have the flexibility of being applied across columns.

dat_test = dat_a %>% 
  select(c(1,2,3)) %>% 
  group_by(sample_number) %>% 
  spread("condition",3)  %>% 
  mutate(per_w = 100*((A - B)/(A-D))) %>% 
  mutate(per_x = 100 - per_w) %>% 
  mutate(per_y = 100*((A - C)/(A-D))) %>%
  mutate(per_z = 100 - per_y) 

dat_test

  sample_number     A     B     C     D     E per_w per_x per_y per_z
  <chr>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1              1220   800   700   300   200  45.7  54.3  56.5  43.5
2 2              1000   900   500   100   100  11.1  88.9  55.6  44.4
3 3              1700   600   800   300   200  78.6  21.4  64.3  35.7

I have seen parts of my question in other stack questions, but have not determined how to put all the pieces together. I would appreciate any help you can provide. Thank you!

CodePudding user response:

If you want to perform calculation on both cell type, you'll need to separate them into different rows (i.e. the first pivot_longer).

library(tidyverse)

dat_a %>% 
  pivot_longer(starts_with("celltype"), names_to = "celltype", names_pattern = "celltype_(\\d)") %>% 
  pivot_wider(names_from = condition, values_from = value) %>% 
  group_by(celltype, sample_number) %>% 
  mutate(per_w = 100*((A - B)/(A-D)), 
         per_x = 100 - per_w,
         per_y = 100*((A - C)/(A-D)),
         per_z = 100 - per_y) %>% 
  select(-(A:E)) %>% 
  ungroup()

# A tibble: 6 × 6
  sample_number celltype per_w per_x per_y per_z
  <chr>         <chr>    <dbl> <dbl> <dbl> <dbl>
1 1             1         45.7  54.3  56.5  43.5
2 1             2         11.1  88.9  55.6  44.4
3 2             1         11.1  88.9  55.6  44.4
4 2             2         78.6  21.4  64.3  35.7
5 3             1         78.6  21.4  64.3  35.7
6 3             2         44.4  55.6  55.6  44.4
  • Related