Home > Software design >  Reshaping Long data frame to wide using R
Reshaping Long data frame to wide using R

Time:03-25

I have the following data set containing the quantity of phytosanitary products purchased per zip code in france between 2015 and 2019 with its classification (other,toxic,mineral,organic).

the dataframe looks like this, so with the zip_code, the year and the classification you can see the quantity that was purchased

zip_code year classification total_quantity
01000 2015 other 44.305436
01000 2015 toxic 212.783330
01000 2015 mineral value
01000 2015 organic value
01000 2016 other value
01000 2016 toxic value
01000 2016 mineral value

it follows the same pattern .....

zip_code year classification total_quantity
01000 2019 organic value
01090 2015 other value

but I would like something where you have only one entry per zip code like this (of course going to 2019 and not stoping at 2016 like i did in my exemple)

zip_code other_total-quantity-2015 Toxic_total-quantity-2015 Mineral_total-quantity-2015 organic_total-quantity-2015 other_total-quantity-2016 Toxic_total-quantity-2016
01000 value value value value value value
01090 value value value value value

I tried to do this using the reshape function but the closest i got from what i want is a table where the zip_code is repeated 4 times (for every classification).

Thank you

CodePudding user response:

The following uses pivot_wider from package tidyr to do the reshape. I'm aware that's a personal preference, but maybe it's helpful though.

library(tidyr)
library(dplyr)
## or install and load these and related packages 
## in bulk through the `tidyverse` package

df %>%
    pivot_wider(
        id_cols = zip_code,
        names_from = c(year, classification),
        values_from = total_quantity,
        names_prefix = 'total-quantity' ## redundant, actually
        )

CodePudding user response:

I created a sample dataset that looks like this:

# A tibble: 40 × 4
   zip_code  year classification total_quantity
      <dbl> <dbl> <chr>                   <dbl>
 1     1000  2015 other                  61.1  
 2     1000  2015 toxic                  32.8  
 3     1000  2015 mineral                11.4  
 4     1000  2015 organic                38.9  
 5     1000  2016 other                  18.8  
 6     1000  2016 toxic                  65.0  
 7     1000  2016 mineral                 0.382
 8     1000  2016 organic                18.8  
 9     1000  2017 other                  96.0  
10     1000  2017 toxic                  60.4  
# … with 30 more rows

If you run the following code you will get your requested table:

df %>%
  pivot_wider(
    id_cols = zip_code,
    names_from = c(year, classification),
    values_from = total_quantity,
    names_glue = "{classification}_total-quantity-{year}"
  )

Output:

# A tibble: 2 × 21
  zip_code `other_total-quantity…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…` `other_total-q…` `toxic_total-q…` `mineral_total…` `organic_total…`
     <dbl>                   <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
1     1000                    61.1             32.8             11.4             38.9             18.8             65.0            0.382             18.8             96.0             60.4             80.4             81.2            47.4              87.4             52.2             9.65             19.7             11.3            45.7              12.8
2     1090                    75.2             40.1             47.9             10.3             86.2             97.9           11.2               93.3             55.0             88.5             63.5             46.6             5.30             13.1             20.4            83.9              58.6             61.3             6.56             46.7

As you can see the year and classification are added to the columnnames using names_glue in the pivot_wider function.

  • Related