Reshaping Long data frame to wide using R-CodePudding

I have the following data set containing the quantity of phytosanitary products purchased per zip code in france between 2015 and 2019 with its classification (other,toxic,mineral,organic).

the dataframe looks like this, so with the zip_code, the year and the classification you can see the quantity that was purchased

zip_code	year	classification	total_quantity
01000	2015	other	44.305436
01000	2015	toxic	212.783330
01000	2015	mineral	value
01000	2015	organic	value
01000	2016	other	value
01000	2016	toxic	value
01000	2016	mineral	value

it follows the same pattern .....

zip_code	year	classification	total_quantity
01000	2019	organic	value
01090	2015	other	value

but I would like something where you have only one entry per zip code like this (of course going to 2019 and not stoping at 2016 like i did in my exemple)

zip_code	other_total-quantity-2015	Toxic_total-quantity-2015	Mineral_total-quantity-2015	organic_total-quantity-2015	other_total-quantity-2016	Toxic_total-quantity-2016
01000	value	value	value	value	value	value
01090	value	value	value	value	value

I tried to do this using the reshape function but the closest i got from what i want is a table where the zip_code is repeated 4 times (for every classification).

Thank you

CodePudding user response：

The following uses pivot_wider from package tidyr to do the reshape. I'm aware that's a personal preference, but maybe it's helpful though.

library(tidyr)
library(dplyr)
## or install and load these and related packages 
## in bulk through the `tidyverse` package

df %>%
    pivot_wider(
        id_cols = zip_code,
        names_from = c(year, classification),
        values_from = total_quantity,
        names_prefix = 'total-quantity' ## redundant, actually
        )

CodePudding user response：

I created a sample dataset that looks like this:

# A tibble: 40 × 4
   zip_code  year classification total_quantity
      <dbl> <dbl> <chr>                   <dbl>
 1     1000  2015 other                  61.1  
 2     1000  2015 toxic                  32.8  
 3     1000  2015 mineral                11.4  
 4     1000  2015 organic                38.9  
 5     1000  2016 other                  18.8  
 6     1000  2016 toxic                  65.0  
 7     1000  2016 mineral                 0.382
 8     1000  2016 organic                18.8  
 9     1000  2017 other                  96.0  
10     1000  2017 toxic                  60.4  
# … with 30 more rows

If you run the following code you will get your requested table:

df %>%
  pivot_wider(
    id_cols = zip_code,
    names_from = c(year, classification),
    values_from = total_quantity,
    names_glue = "{classification}_total-quantity-{year}"
  )

Output:

# A tibble: 2 × 21
  zip_code `other_total-quantity…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…`
     <dbl>                   <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
1     1000                    61.1             32.8             11.4             38.9             18.8             65.0            0.382             18.8             96.0             60.4             80.4             81.2            47.4              87.4             52.2             9.65             19.7             11.3            45.7              12.8
2     1090                    75.2             40.1             47.9             10.3             86.2             97.9           11.2               93.3             55.0             88.5             63.5             46.6             5.30             13.1             20.4            83.9              58.6             61.3             6.56             46.7

As you can see the year and classification are added to the columnnames using names_glue in the pivot_wider function.