I've the following table
Data = structure(list(Countries = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("China", "India", "Vietnam"), class = "factor"), Year = c(2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2018L, 2018L, 2017L, 2018L, 2018L, 2018L,2017L, 2017L, 2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2019L, 2018L, 2017L, 2017L), Food = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bread","Rice"), class = "factor"), Price = c(2.8, 2.8, 2.7, NA, 2.6, 2.58, 2.53, 2.5, NA, NA, 2.395, 2.9, 2.8, 2.75, 2.66, 2.5, 11.5,11.3, 11.2, 11, NA, 10.7, 10.7, NA, NA, 10.3, 10.1, 10)), class = "data.frame", row.names = c(NA, -28L))
The table shows the following:
Countries | Year | Food | Price |
---|---|---|---|
China | 2019 | Bread | 2.8 |
China | 2018 | Bread | 2.8 |
China | 2018 | Bread | 2.7 |
China | 2018 | Bread | NA |
China | 2017 | Bread | 2.6 |
China | 2017 | Bread | 2.58 |
India | 2019 | Bread | 2.53 |
India | 2019 | Bread | 2.5 |
India | 2018 | Bread | NA |
India | 2018 | Bread | NA |
India | 2017 | Bread | 2.395 |
Vietnam | 2018 | Bread | 2.9 |
Vietnam | 2018 | Bread | 2.8 |
Vietnam | 2018 | Bread | 2.75 |
Vietnam | 2017 | Bread | 2.66 |
Vietnam | 2017 | Bread | 2.5 |
China | 2019 | Rice | 11.5 |
China | 2018 | Rice | 11.3 |
China | 2018 | Rice | 11.2 |
China | 2018 | Rice | 11.0 |
China | 2017 | Rice | NA |
China | 2017 | Rice | 10.7 |
Vietnam | 2019 | Rice | 10.7 |
Vietnam | 2019 | Rice | NA |
Vietnam | 2019 | Rice | NA |
Vietnam | 2018 | Rice | 10.3 |
Vietnam | 2017 | Rice | 10.1 |
Vietnam | 2017 | Rice | 10.0 |
Does anyone know how to aggregate the price of the individual food items based on the countries and the year (the actual dataset has more countries, year and food item but in the same format) using dplyr and/or tiderverse while accounting for NA? i.e.
- If the price of bread for 2018 is 2.8, 2.7 and NA, the average will be (2.8 2.7)/2 instead of (2.8 2.7 0)/3
- If the price of bread for the entire year is NA, we can discard this and don't even bother printing it on the output table.
Output table
Countries | Year | Food | Price |
---|---|---|---|
China | 2019 | Bread | 2.8 |
China | 2018 | Bread | 2.8 |
China | 2017 | Bread | 2.6 |
India | 2019 | Bread | 2.5 |
India | 2017 | Bread | 2.4 |
Vietnam | 2018 | Bread | 2.8 |
Vietnam | 2017 | Bread | 2.6 |
China | 2019 | Rice | 11.5 |
China | 2018 | Rice | 11.2 |
China | 2017 | Rice | 10.7 |
Vietnam | 2019 | Rice | 10.7 |
Vietnam | 2018 | Rice | 10.3 |
Vietnam | 2017 | Rice | 10.1 |
Also out of genuine curiosity, is this even possible to do in base R?
CodePudding user response:
For a base R answer, aggregate
will drop the missing values automatically (you can change this behaviour with the na.action
argument if you want to). So:
aggregate( Price ~ Food Year Countries , mean , data=Data)
gives you:
Food Year Countries Price
1 Bread 2017 China 2.590000
2 Rice 2017 China 10.700000
3 Bread 2018 China 2.750000
4 Rice 2018 China 11.166667
5 Bread 2019 China 2.800000
6 Rice 2019 China 11.500000
7 Bread 2017 India 2.395000
8 Bread 2019 India 2.515000
9 Bread 2017 Vietnam 2.580000
10 Rice 2017 Vietnam 10.050000
11 Bread 2018 Vietnam 2.816667
12 Rice 2018 Vietnam 10.300000
13 Rice 2019 Vietnam 10.700000
If you want them in a different order just reorder the RHS of the formula.
CodePudding user response:
With dplyr
:
Data %>%
group_by(Countries, Year, Food) %>%
summarise(Price = mean(Price, na.rm = TRUE), .groups = 'drop') %>%
filter(!is.na(Price)) %>%
arrange(Food, Countries, desc(Year))
#> # A tibble: 13 × 4
#> Countries Year Food Price
#> <fct> <int> <fct> <dbl>
#> 1 China 2019 Bread 2.8
#> 2 China 2018 Bread 2.75
#> 3 China 2017 Bread 2.59
#> 4 India 2019 Bread 2.51
#> 5 India 2017 Bread 2.40
#> 6 Vietnam 2018 Bread 2.82
#> 7 Vietnam 2017 Bread 2.58
#> 8 China 2019 Rice 11.5
#> 9 China 2018 Rice 11.2
#> 10 China 2017 Rice 10.7
#> 11 Vietnam 2019 Rice 10.7
#> 12 Vietnam 2018 Rice 10.3
#> 13 Vietnam 2017 Rice 10.0
CodePudding user response:
Try this:
library(tidyverse)
Data <- structure(list(Countries = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("China", "India", "Vietnam"), class = "factor"), Year = c(2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2018L, 2018L, 2017L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2019L, 2018L, 2017L, 2017L), Food = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bread", "Rice"), class = "factor"), Price = c(2.8, 2.8, 2.7, NA, 2.6, 2.58, 2.53, 2.5, NA, NA, 2.395, 2.9, 2.8, 2.75, 2.66, 2.5, 11.5, 11.3, 11.2, 11, NA, 10.7, 10.7, NA, NA, 10.3, 10.1, 10)), class = "data.frame", row.names = c(NA, -28L))
Data |>
group_by(Countries, Year, Food) |>
summarise(Price = round(mean(Price, na.rm = TRUE), 1)) |>
arrange(Food, Countries, desc(Year)) |>
filter(!is.nan(Price))
#> # A tibble: 13 × 4
#> # Groups: Countries, Year [8]
#> Countries Year Food Price
#> <fct> <int> <fct> <dbl>
#> 1 China 2019 Bread 2.8
#> 2 China 2018 Bread 2.8
#> 3 China 2017 Bread 2.6
#> 4 India 2019 Bread 2.5
#> 5 India 2017 Bread 2.4
#> 6 Vietnam 2018 Bread 2.8
#> 7 Vietnam 2017 Bread 2.6
#> 8 China 2019 Rice 11.5
#> 9 China 2018 Rice 11.2
#> 10 China 2017 Rice 10.7
#> 11 Vietnam 2019 Rice 10.7
#> 12 Vietnam 2018 Rice 10.3
#> 13 Vietnam 2017 Rice 10.1
Created on 2022-05-11 by the reprex package (v2.0.1)
CodePudding user response:
with data.table
:
library(data.table)
setDT(Data)[!is.na(Price), .(Price=mean(Price,na.rm=T)), by=.(Countries,Year,Food)]
Output:
Countries Year Food Price
1: China 2019 Bread 2.800000
2: China 2018 Bread 2.750000
3: China 2017 Bread 2.590000
4: India 2019 Bread 2.515000
5: India 2017 Bread 2.395000
6: Vietnam 2018 Bread 2.816667
7: Vietnam 2017 Bread 2.580000
8: China 2019 Rice 11.500000
9: China 2018 Rice 11.166667
10: China 2017 Rice 10.700000
11: Vietnam 2019 Rice 10.700000
12: Vietnam 2018 Rice 10.300000
13: Vietnam 2017 Rice 10.050000