I have the following data set containing the quantity of phytosanitary products purchased per zip code in france between 2015 and 2019 with its classification (other,toxic,mineral,organic).
the dataframe looks like this, so with the zip_code, the year and the classification you can see the quantity that was purchased
zip_code | year | classification | total_quantity |
---|---|---|---|
01000 | 2015 | other | 44.305436 |
01000 | 2015 | toxic | 212.783330 |
01000 | 2015 | mineral | value |
01000 | 2015 | organic | value |
01000 | 2016 | other | value |
01000 | 2016 | toxic | value |
01000 | 2016 | mineral | value |
it follows the same pattern .....
zip_code | year | classification | total_quantity |
---|---|---|---|
01000 | 2019 | organic | value |
01090 | 2015 | other | value |
but I would like something where you have only one entry per zip code like this (of course going to 2019 and not stoping at 2016 like i did in my exemple)
zip_code | other_total-quantity-2015 | Toxic_total-quantity-2015 | Mineral_total-quantity-2015 | organic_total-quantity-2015 | other_total-quantity-2016 | Toxic_total-quantity-2016 |
---|---|---|---|---|---|---|
01000 | value | value | value | value | value | value |
01090 | value | value | value | value | value |
I tried to do this using the reshape function but the closest i got from what i want is a table where the zip_code is repeated 4 times (for every classification).
Thank you
CodePudding user response:
The following uses pivot_wider
from package tidyr
to do the reshape. I'm aware that's a personal preference, but maybe it's helpful though.
library(tidyr)
library(dplyr)
## or install and load these and related packages
## in bulk through the `tidyverse` package
df %>%
pivot_wider(
id_cols = zip_code,
names_from = c(year, classification),
values_from = total_quantity,
names_prefix = 'total-quantity' ## redundant, actually
)
CodePudding user response:
I created a sample dataset that looks like this:
# A tibble: 40 × 4
zip_code year classification total_quantity
<dbl> <dbl> <chr> <dbl>
1 1000 2015 other 61.1
2 1000 2015 toxic 32.8
3 1000 2015 mineral 11.4
4 1000 2015 organic 38.9
5 1000 2016 other 18.8
6 1000 2016 toxic 65.0
7 1000 2016 mineral 0.382
8 1000 2016 organic 18.8
9 1000 2017 other 96.0
10 1000 2017 toxic 60.4
# … with 30 more rows
If you run the following code you will get your requested table:
df %>%
pivot_wider(
id_cols = zip_code,
names_from = c(year, classification),
values_from = total_quantity,
names_glue = "{classification}_total-quantity-{year}"
)
Output:
# A tibble: 2 × 21
zip_code `other_total-quantity…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1000 61.1 32.8 11.4 38.9 18.8 65.0 0.382 18.8 96.0 60.4 80.4 81.2 47.4 87.4 52.2 9.65 19.7 11.3 45.7 12.8
2 1090 75.2 40.1 47.9 10.3 86.2 97.9 11.2 93.3 55.0 88.5 63.5 46.6 5.30 13.1 20.4 83.9 58.6 61.3 6.56 46.7
As you can see the year
and classification
are added to the columnnames using names_glue
in the pivot_wider
function.