Merging dataframe rows with text, numeric, and NA values-CodePudding

I have a dataframe like the one below (though much larger).

name	age	sex	favcolor	grade	score
tim	NA	NA	blue	12	100
tim	18	male	red	12	50
dave	17	male	red	12	85
mike	15	male	green	10	95
john	12	male	NA	7	80
john	12	NA	orange	7	90

There are a few things I want to do. My primary goal is to merge the rows by the name variable, such that each name gets one row. Second, when merging rows, I want cells with data to override cells that are NA (tim with age and sex. Third, if the rows that are merging both have text values (e.g. tim with favcolor), I want to keep the first one. And lastly, for rows that both have values in numeric columns (age, grade, and score), I want the new value to be the mean of the merging rows.

If all these rules are followed, the dataframe should look something like this.

name	age	sex	favcolor	grade	score
tim	18	male	blue	12	75
dave	17	male	red	12	85
mike	15	male	green	10	95
john	12	male	orange	7	85

Is there a straightforward way to accomplish this? I've tried about 30 different things, but it never turns out the way I want it to. Any help would be greatly appreciated.

CodePudding user response：

You can group_by(name) and use summarize to collapse the rows into a single one.

Finally, relocate to reorder the columns as the input.

library(dplyr)

df %>% 
  group_by(name) %>% 
  summarize(across(where(is.numeric), ~ mean(.x, na.rm = T)),
            across(where(is.character), ~.x[!is.na(.x)][1])) %>% 
  relocate(colnames(df))

# A tibble: 4 × 6
  name    age sex   favcolor grade score
  <chr> <dbl> <chr> <chr>    <dbl> <dbl>
1 dave     17 male  red         12    85
2 john     12 male  orange       7    85
3 mike     15 male  green       10    95
4 tim      18 male  blue        12    75