Home > Software engineering >  Add a new column with the same value within each group based on value from specific row
Add a new column with the same value within each group based on value from specific row

Time:11-25

I have a data frame with a grouping variable ID, a factor F and a value V that looks something like this:

df <- data.frame(ID = c(rep(1, 3), rep(2, 3)),
                 F = factor(c("A","B","X","C","D","X")),
                 V = c(30, 32, 25, 31, 37, 24)
                  )
> df
  ID F  V
1  1 A 30
2  1 B 32
3  1 X 25
4  2 C 31
5  2 D 37
6  2 X 24

Now, I would like to add a new column New, which has the same value within each group (by ID) based on the value for V in the row where F==X using the tidyverse environment. Ideally, those rows would be removed afterwards so that the new data frame looks like this:

> df
  ID F  V New
1  1 A 30  25
2  1 B 32  25
3  2 C 31  24
4  2 D 37  24

I know that I have to use the group_by() function and probably also mutate(), but I couldn't quite manage to get my desired result.

CodePudding user response:

df %>%
   group_by(ID) %>%
   mutate(New = V[F =='X']) %>%
   filter(F != 'X')

# A tibble: 4 × 4
# Groups:   ID [2]
     ID F         V   New
  <dbl> <fct> <dbl> <dbl>
1     1 A        30    25
2     1 B        32    25
3     2 C        31    24
4     2 D        37    24

CodePudding user response:

library(dplyr)

df %>% 
  group_by(ID) %>% # grouping variables by ID
  mutate(New = ifelse(F == "X",
                            V,
                            NA)) %>% # adding New column
  summarise(New = max(New, na.rm = T)) %>% # Filtering rows with filled New column
  right_join(df %>% filter(F != "X"), by = "ID") %>% # SQL-like join
  select(ID, F, V, New) # reordering the columns to the desired order
 

And you get this output:

# A tibble: 4 × 4
     ID F         V   New
  <dbl> <fct> <dbl> <dbl>
1     1 A        30    25
2     1 B        32    25
3     2 C        31    24
4     2 D        37    24

Or even simplier:

  df %>% filter(F == "X") %>% # filtering the rows with "X" in F column
    right_join(df %>% filter(F != "X"), by = "ID") %>% joining to the same dataset without "X" rows
    select(ID, F= F.y, V = V.y, New = V.x) #reordering and renaming of columns
  • Related