Home > Software design >  Adding one value to another within the data frame based on grouping
Adding one value to another within the data frame based on grouping

Time:11-09

A have a hypothetical dataframe that looks like this

df<-data.frame(Group = rep(c("A","A","B","B", "C","C","D","D","E","E")),
               Sex = rep(c("M","F"),5),
               Rate= rnorm(10))
df
 Group Sex       Rate     
1      A   M -1.0678237 
2      A   F -0.2179749  
3      B   M -1.0260044 
4      B   F -0.7288912   
5      C   M -0.6250393   
6      C   F -1.6866933   
7      D   M  0.8377870  
8      D   F  0.1533731 
9      E   M -1.1381369
10     E   F  1.2538149

BUT the rate of "F" is actually rate of M rate of F, within their own group. Within the dataframe, taking into account the different groups, can I change the rate of F to include the rate of M? Potentially in a different column (df$Rate2)

Hypothetically....

 Group Sex       Rate     Rate2
1      A   M -1.0678237   Same
2      A   F -0.2179749   M F (Group A)
3      B   M -1.0260044   Same
4      B   F -0.7288912   M F (Group B)
5      C   M -0.6250393   Same
6      C   F -1.6866933   M F (Group C)
7      D   M  0.8377870   Same
8      D   F  0.1533731   M F (Group D)
9      E   M -1.1381369   Same
10     E   F  1.2538149   M F (Group E)

CodePudding user response:

Assuming that there is always a M and F in every group, you could do this:

library(tidyverse)

df<-tibble(Group = rep(c("A","A","B","B", "C","C","D","D","E","E")),
               Sex = rep(c("M","F"),5),
               Rate= rnorm(10))

df %>%
  group_by(Group) %>%
  mutate(Rate2 = case_when(Sex == "M" ~ Rate, 
                           Sex == "F" ~ sum(Rate)))
#> # A tibble: 10 x 4
#> # Groups:   Group [5]
#>    Group Sex     Rate  Rate2
#>    <chr> <chr>  <dbl>  <dbl>
#>  1 A     M      1.54   1.54 
#>  2 A     F      1.82   3.35 
#>  3 B     M     -0.778 -0.778
#>  4 B     F      0.507 -0.270
#>  5 C     M     -0.624 -0.624
#>  6 C     F      1.31   0.690
#>  7 D     M      0.760  0.760
#>  8 D     F      0.666  1.43 
#>  9 E     M      1.46   1.46 
#> 10 E     F      0.218  1.68

CodePudding user response:

We can use pivot_wider, which would output a different format, but a tidy one.

data from @AndS. (but this time reproducible, with set.seed)

set.seed(1)
df<-tibble(Group = rep(c("A","A","B","B", "C","C","D","D","E","E")),
               Sex = rep(c("M","F"),5),
               Rate= rnorm(10))

Answer

library(dplyr)
library(tidyr)

df %>% pivot_wider(values_from = Rate, names_from = Sex) %>%
  mutate(F = F   M)

# A tibble: 5 × 3
  Group      M      F
  <chr>  <dbl>  <dbl>
1 A     -0.626 -0.443
2 B     -0.836  0.760
3 C      0.330 -0.491
4 D      0.487  1.23 
5 E      0.576  0.270

In case we eant the original layouot, we can pivot_longer the output, as in:

df %>% pivot_wider(values_from = Rate, names_from = Sex)%>%
  mutate(F = F   M)%>%
  pivot_longer(cols = c(M, F))

# A tibble: 10 × 3
   Group name   value
   <chr> <chr>  <dbl>
 1 A     M     -0.626
 2 A     F     -0.443
 3 B     M     -0.836
 4 B     F      0.760
 5 C     M      0.330
 6 C     F     -0.491
 7 D     M      0.487
 8 D     F      1.23 
 9 E     M      0.576
10 E     F      0.270
  •  Tags:  
  • r
  • Related