I am looking to find a way to average common-names in a certain column, and based off of that similarity, to average the rest of the values into a common row. I looked into aggregate, but I think I am missing how to do this in a large dataset, since it isnt working to how I want it to work.
Say i have DF1
ID | Location | Value2 | Value3 | Value4 |
---|---|---|---|---|
First | 2 | 4 | 3 | 3 |
Second | 1 | 2 | 5 | 2 |
Second | 2 | 4 | 5 | 6 |
Third | 6 | 1 | 8 | 8 |
Third | 4 | 7 | 5 | 4 |
Fourth | 1 | 4 | 5 | 1 |
I would want to average any duplicate values
What is the best way to do that when there is a large number of columns, and this is only a partial number of them? Any help would be appreciated. Thanks!
ID <- c("First", "Second", "Second", "Third", "Third", "Fourth")
Location <- c(2,1,2,6,4,1)
Value2 <- c(4,2,4,1,7,4)
Value3 <- c(3,5,5,8,5,5)
Value4 <- c(3,2,6,8,4,1)
DF <- data.frame(ID, Location, Value2, Value3, Value4)
CodePudding user response:
You can try this using aggregate
aggregate(df[-1], list(df$ID), mean)
Group.1 Location Value2 Value3 Value4
1 First 2.0 4 3.0 3
2 Fourth 1.0 4 5.0 1
3 Second 1.5 3 5.0 4
4 Third 5.0 4 6.5 6
With dplyr
library(dplyr)
df %>%
group_by(ID) %>%
summarise(across(starts_with(c("Lo","Val")),mean))
# A tibble: 4 × 5
ID Location Value2 Value3 Value4
<chr> <dbl> <dbl> <dbl> <dbl>
1 First 2 4 3 3
2 Fourth 1 4 5 1
3 Second 1.5 3 5 4
4 Third 5 4 6.5 6
Data
df <- structure(list(ID = c("First", "Second", "Second", "Third", "Third",
"Fourth"), Location = c(2L, 1L, 2L, 6L, 4L, 1L), Value2 = c(4L,
2L, 4L, 1L, 7L, 4L), Value3 = c(3L, 5L, 5L, 8L, 5L, 5L), Value4 = c(3L,
2L, 6L, 8L, 4L, 1L)), class = "data.frame", row.names = c(NA,
-6L))