I'm using a dataset with several thousand words and over 60 values associated with each word. Most of the words are unique, but there are a few that have been duplicated. I'd like to merge these and replace the associated values with the averages. If there's a way of doing this without having to specify which words are duplicated, that would be great.
So just going from this:
Word measure1 measure2 measure3
aids 3.52 2.2 21
aids 1.33 0.8 21
coke 6.55 1.99 22
coke 6.62 1.91 21
To this:
Word measure1 measure2 measure3
aids 2.425 1.5 21
coke 6.585 1.95 21.5
(I am using this data)
CodePudding user response:
You could use
library(dplyr)
df1 %>%
group_by(Word) %>%
summarise(across(where(is.numeric), mean))
This returns
# A tibble: 2 x 4
Word measure1 measure2 measure3
<chr> <dbl> <dbl> <dbl>
1 aids 2.425 1.5 21
2 coke 6.585 1.95 21.5