Home > Blockchain >  Merging dataframe rows with text, numeric, and NA values
Merging dataframe rows with text, numeric, and NA values

Time:04-05

I have a dataframe like the one below (though much larger).

name age sex favcolor grade score
tim NA NA blue 12 100
tim 18 male red 12 50
dave 17 male red 12 85
mike 15 male green 10 95
john 12 male NA 7 80
john 12 NA orange 7 90

There are a few things I want to do. My primary goal is to merge the rows by the name variable, such that each name gets one row. Second, when merging rows, I want cells with data to override cells that are NA (tim with age and sex. Third, if the rows that are merging both have text values (e.g. tim with favcolor), I want to keep the first one. And lastly, for rows that both have values in numeric columns (age, grade, and score), I want the new value to be the mean of the merging rows.

If all these rules are followed, the dataframe should look something like this.

name age sex favcolor grade score
tim 18 male blue 12 75
dave 17 male red 12 85
mike 15 male green 10 95
john 12 male orange 7 85

Is there a straightforward way to accomplish this? I've tried about 30 different things, but it never turns out the way I want it to. Any help would be greatly appreciated.

CodePudding user response:

You can group_by(name) and use summarize to collapse the rows into a single one.

Finally, relocate to reorder the columns as the input.

library(dplyr)

df %>% 
  group_by(name) %>% 
  summarize(across(where(is.numeric), ~ mean(.x, na.rm = T)),
            across(where(is.character), ~.x[!is.na(.x)][1])) %>% 
  relocate(colnames(df))

# A tibble: 4 × 6
  name    age sex   favcolor grade score
  <chr> <dbl> <chr> <chr>    <dbl> <dbl>
1 dave     17 male  red         12    85
2 john     12 male  orange       7    85
3 mike     15 male  green       10    95
4 tim      18 male  blue        12    75
  • Related