In the following dataframe I want to calculate unique household_id - individual_id combination and Average weights and Total Duration after summarizing at country, state and date column.
Household_id 100 have two unique individual(1and 2) and househld_id 101 have three unique individual(1,2,3). So total unique is 5 after summarizing.
The Average weights I want to calculate of this 5 unique individuals i.e. (100 50 200 200 200)/5 =150
Final dataset:
what I did is
data %>% group_by(country,state,date) %>%
summarise(total_unique = n_distinct(household_id,individual_id),
Tot_Duration = sum(duration))
But not able to calculate the Average_weights. Any help is highly appreciated.
Sample Dataset
library(dplyr)
data <- data.frame(country = c("US","US","US","US","US","US","IND","IND"),
state = c("TX","TX","TX","TX","TX","TX","AP","AP"),
date = c(20220601,20220601,20220601,20220601,20220601,20220601,20220601,20220601),
household_id = c(100,100,100,101,101,101,102,102),
individual_id=c(1,2,1,1,2,3,1,1),
weights = c(100,50,100,200,200,200,100,100),
duration = c(10,20,30,40,50,60,70,80))
EDIT Apologies for not putting the right dataset which I realized later. Two update in Dataset
- Different individuals may have same weight as in household_id 101
- Duration column is added with solution 1 above distinct will not work and with solution 2 unique will not work. Please suggest
I have updated the sample dataset
CodePudding user response:
Building on to your code, you could add an extra statement in summarise
:
library(tidyverse)
data %>%
group_by(country,state,date) %>%
summarise(total_unique = n_distinct(household_id,individual_id),
Average_weights = sum(unique(weights), na.rm = T)/total_unique)
Output
country state date total_unique Average_weights
<chr> <chr> <dbl> <int> <dbl>
1 IND AP 20220601 1 100
2 US TX 20220601 5 210
CodePudding user response:
You may try
library(dplyr)
data %>%
group_by(country, state, date) %>%
distinct() %>%
summarize(total_unique = n(),
average_Weights = sum(weights)/total_unique)
country state date total_unique average_Weights
<chr> <chr> <dbl> <int> <dbl>
1 IND AP 20220601 1 100
2 US TX 20220601 5 210