Home > Software engineering >  I want to merge all rows that match in two specific variables
I want to merge all rows that match in two specific variables

Time:11-19

I am looking to merge all rows corresponding to the same year and location into one that represents the average.

Let's say this is my data frame

data<-data.frame(year=c(2000,2000,2000,2000,2001,2001,2001,2001,2002,2002,2002,2002),
                 location=c(1,1,2,2,1,1,2,2,1,1,2,2),
                 x=c(1,2,3,4,5,6,7,8,9,10,11,12))

I want to merge all rows that are representative of the same year (f.e. 2000) and the same location (f.e. 1) into one. The x-value of this new row should be the average of the x-values of the merged rows.

Unfortunately I have no idea how to do this and haven't been able to find a way in documentation or online.

CodePudding user response:

Looking for somethin like this...?

  library(dplyr)
  data %>% 
    group_by(year, location) %>% 
    summarise(x_mean = mean(x)) %>% 
    ungroup()
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
# A tibble: 6 × 3
   year location x_mean
  <dbl>    <dbl>  <dbl>
1  2000        1    1.5
2  2000        2    3.5
3  2001        1    5.5
4  2001        2    7.5
5  2002        1    9.5
6  2002        2   11.5

CodePudding user response:

Here is a base R solution

aggregate(data, list(data$year, data$location), mean)[, -c(1:2)]
  year location    x
1 2000        1  1.5
2 2001        1  5.5
3 2002        1  9.5
4 2000        2  3.5
5 2001        2  7.5
6 2002        2 11.5

or dplyr

library(dplyr)

data %>% 
  group_by(year, location) %>% 
  summarize(x = mean(x), .groups = "drop")
# A tibble: 6 × 3
   year location     x
  <dbl>    <dbl> <dbl>
1  2000        1   1.5
2  2000        2   3.5
3  2001        1   5.5
4  2001        2   7.5
5  2002        1   9.5
6  2002        2  11.5
  •  Tags:  
  • r
  • Related