Home > Blockchain >  Group by in r and calculating Average of Unique values
Group by in r and calculating Average of Unique values

Time:06-25

I want to calculate three metric from the following dataframe after summarizing at country,state and date column

  1. count of unique household_id and individual_id
  2. Avg weights
  3. Total duration

enter image description here

Household_id 100 have two unique individual(1and 2) and househld_id 101 have three unique individual(1,2,3). So total unique is 5 after summarizing.

The Average weights I want to calculate of this 5 unique individuals i.e. (100 50 200 200 200)/5 =150

Final dataset:

enter image description here

What I did is

 data %>% group_by(country,state,date) %>% 
 summarise(Total_unique = n_distinct(household_id,individual_id),
        Tot_Duration = sum(duration))

But not able to calculate the Average_weights. Any help is highly appreciated.

Below is the sample dataset

 library(dplyr)
 data <- data.frame(country = c("US","US","US","US","US","US","IND","IND"),
               state = c("TX","TX","TX","TX","TX","TX","AP","AP"),
               date = c(20220601,20220601,20220601,20220601,20220601,20220601,20220601,20220601),
               household_id = c(100,100,100,101,101,101,102,102),
               individual_id=c(1,2,1,1,2,3,1,1),
               weights = c(100,50,100,200,200,200,100,100),
               duration = c(10,20,30,40,50,60,70,80))

CodePudding user response:

You may create a unique key to distinguish each individual and then calculate the values by group.

library(dplyr)
library(tidyr)

data %>%
  unite(unique_key, country,state,date, household_id, 
        individual_id, remove = FALSE) %>%
  group_by(country,state,date) %>%
  summarise(Total_unique = n_distinct(unique_key), 
            Avg_weights = mean(weights[!duplicated(unique_key)]), 
            Tot_Duration = sum(duration), .groups = "drop")

#  country state     date Total_unique Avg_weights Tot_Duration
#  <chr>   <chr>    <dbl>        <int>       <dbl>        <dbl>
#1 IND     AP    20220601            1         100          150
#2 US      TX    20220601            5         150          210
  •  Tags:  
  • r
  • Related