I want to calculate three metric from the following dataframe after summarizing at country,state and date column
- count of unique household_id and individual_id
- Avg weights
- Total duration
Household_id 100 have two unique individual(1and 2) and househld_id 101 have three unique individual(1,2,3). So total unique is 5 after summarizing.
The Average weights I want to calculate of this 5 unique individuals i.e. (100 50 200 200 200)/5 =150
Final dataset:
What I did is
data %>% group_by(country,state,date) %>%
summarise(Total_unique = n_distinct(household_id,individual_id),
Tot_Duration = sum(duration))
But not able to calculate the Average_weights. Any help is highly appreciated.
Below is the sample dataset
library(dplyr)
data <- data.frame(country = c("US","US","US","US","US","US","IND","IND"),
state = c("TX","TX","TX","TX","TX","TX","AP","AP"),
date = c(20220601,20220601,20220601,20220601,20220601,20220601,20220601,20220601),
household_id = c(100,100,100,101,101,101,102,102),
individual_id=c(1,2,1,1,2,3,1,1),
weights = c(100,50,100,200,200,200,100,100),
duration = c(10,20,30,40,50,60,70,80))
CodePudding user response:
You may create a unique key to distinguish each individual and then calculate the values by group.
library(dplyr)
library(tidyr)
data %>%
unite(unique_key, country,state,date, household_id,
individual_id, remove = FALSE) %>%
group_by(country,state,date) %>%
summarise(Total_unique = n_distinct(unique_key),
Avg_weights = mean(weights[!duplicated(unique_key)]),
Tot_Duration = sum(duration), .groups = "drop")
# country state date Total_unique Avg_weights Tot_Duration
# <chr> <chr> <dbl> <int> <dbl> <dbl>
#1 IND AP 20220601 1 100 150
#2 US TX 20220601 5 150 210