Home > Net >  Having trouble with pivoting data in R
Having trouble with pivoting data in R

Time:12-10

Below is the sample data. My goal is to create a single row (monthly data) for each area/industry/ownership combination. For this sample data set, there will be 24 values per area/industry/ownership combination. I know that you have to do a series of pivots but my attempts have not been successful. At the bottom is the desired result.

In my larger data set, I have far more than 3 years and one industry but this keeps it manageable.

 naicscode <- c("111","111","111","111","111","111","111","111","111","111","111","111","111","111","111","111")
 areavalue <- c("000000","000000","000000","000000","000000","000000","000000","000000","000003","000003","000003","000003","000003","000003","000003","000003")
 ownership <- c("50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50")
 period <- c("01","02","03","04","01","02","03","04","01","02","03","04","01","02","03","04")
 periodyear <- c("2020","2020","2020","2020","2021","2021","2021","2021", "2020","2020","2020","2020", "2021","2021","2021","2021")
 mnth1emp<- c(25000,25005,25010,25020,25025,20506,20510,21555,16000,16005,16025,16020,16035,13595,14010,13985)
 mnth2emp<- c(25005,25010,25000,24995,25005,25010,25060,24995,15995,16005,16015,16020,16030,14015,14000,14200)
 mnth3emp<- c(24985,25000,25005,25010,25009,25040,25090,25080,15990,16000,16065,16025,16030,14665,14550,14620)


 test <- data.frame(naicscode,areavalue,ownership,periodyear,period,mnth1emp,mnth2emp,mnth3emp)




  naicscode       areavalue    ownership     202001     202002    202003  202004   202005   202006  ... and on until 202112. 
     111            000000        50          25000       25005     24985   25005   25010   25000

CodePudding user response:

I'm assuming period means quarter, and the number in name denotes the month number within that quarter.

If that's the case, your column headers are 100*periodyear (period-1)*4 the number inside name.

library(tidyverse)
test %>%
  pivot_longer(starts_with("mnth")) %>%
  mutate(period_num = as.numeric(periodyear)*100   (as.numeric(period)-1)*4   parse_number(name)) %>%
  select(-c(periodyear:name)) %>%
  pivot_wider(names_from = period_num, values_from = value)
  

result

# A tibble: 2 × 27
  naicscode areavalue ownership 20200…¹ 20200…² 20200…³ 20200…⁴ 20200…⁵ 20200…⁶
  <chr>     <chr>     <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 111       000000    50          25000   25005   24985   25005   25010   25000
2 111       000003    50          16000   15995   15990   16005   16005   16000
# … with 18 more variables: `202009` <dbl>, `202010` <dbl>, `202011` <dbl>,
#   `202013` <dbl>, `202014` <dbl>, `202015` <dbl>, `202101` <dbl>,
#   `202102` <dbl>, `202103` <dbl>, `202105` <dbl>, `202106` <dbl>,
#   `202107` <dbl>, `202109` <dbl>, `202110` <dbl>, `202111` <dbl>,
#   `202113` <dbl>, `202114` <dbl>, `202115` <dbl>, and abbreviated variable
#   names ¹​`202001`, ²​`202002`, ³​`202003`, ⁴​`202005`, ⁵​`202006`, ⁶​`202007`
# ℹ Use `colnames()` to see all variable names
  • Related