I have a list with two elements, where I have a variable of interest value
for each combination of variables conutry
and gender
.
# Toy example
A <- list()
A[[1]] <- data.frame(country=c("Spain", "Spain", "France", "France"),
gender= c("M", "F", "M", "F"),
value = c(100,125,10,200))
A[[2]] <- data.frame(country=c("Spain", "Spain", "France"),
gender=c("M", "F", "F"),
value = c(150,75,100))
Data looks like this:
[[1]]
country gender value
1 Spain M 100
2 Spain F 125
3 France M 10
4 France F 200
[[2]]
country gender value
1 Spain M 150
2 Spain F 75
3 France F 100
I would like to compute the mean of value
(across the elements of the list) for each possible combination of gender
and country
taking into account that no all the combinations are in all the elements of the list (in that example, the second element has not a value for Males in France.
What I expect is something like this:
country gender value
1 Spain M 125
2 Spain F 100
3 France M 5
4 France F 150
Any idea to deal with that?
CodePudding user response:
What you need here is complete
from {tidyr}
.
A %>%
map(tidyr::complete, country,
gender = c("M", "F"), fill = list(value = 0)) %>%
bind_rows() %>%
group_by(country, gender) %>%
summarise(value = mean(value)) %>%
ungroup()
First the map
, ensures that each list has all the rows for M
and F
for gender
, and then complete with the present countries (you can also specify the countries that should be present). Then ensures that NA
is 0
instead.
CodePudding user response:
A %>%
bind_rows() %>%
group_by(country, gender) %>%
summarise(value = sum(value) / length(A))