Home > OS >  Collapse rows with duplicate ID and average values of all other variables
Collapse rows with duplicate ID and average values of all other variables

Time:11-11

I'm using a dataset with several thousand words and over 60 values associated with each word. Most of the words are unique, but there are a few that have been duplicated. I'd like to merge these and replace the associated values with the averages. If there's a way of doing this without having to specify which words are duplicated, that would be great.

So just going from this:

Word    measure1    measure2    measure3 
aids    3.52        2.2         21 
aids    1.33        0.8         21  
coke    6.55        1.99        22  
coke    6.62        1.91        21  

To this:

Word    measure1    measure2    measure3 
aids    2.425       1.5         21  
coke    6.585       1.95        21.5 

(I am using this data)

CodePudding user response:

You could use

library(dplyr)

df1 %>% 
  group_by(Word) %>% 
  summarise(across(where(is.numeric), mean))

This returns

# A tibble: 2 x 4
  Word  measure1 measure2 measure3
  <chr>    <dbl>    <dbl>    <dbl>
1 aids     2.425     1.5      21  
2 coke     6.585     1.95     21.5
  • Related