Having trouble with pivoting data in R-CodePudding

Below is the sample data. My goal is to create a single row (monthly data) for each area/industry/ownership combination. For this sample data set, there will be 24 values per area/industry/ownership combination. I know that you have to do a series of pivots but my attempts have not been successful. At the bottom is the desired result.

In my larger data set, I have far more than 3 years and one industry but this keeps it manageable.

 naicscode <- c("111","111","111","111","111","111","111","111","111","111","111","111","111","111","111","111")
 areavalue <- c("000000","000000","000000","000000","000000","000000","000000","000000","000003","000003","000003","000003","000003","000003","000003","000003")
 ownership <- c("50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50")
 period <- c("01","02","03","04","01","02","03","04","01","02","03","04","01","02","03","04")
 periodyear <- c("2020","2020","2020","2020","2021","2021","2021","2021", "2020","2020","2020","2020", "2021","2021","2021","2021")
 mnth1emp<- c(25000,25005,25010,25020,25025,20506,20510,21555,16000,16005,16025,16020,16035,13595,14010,13985)
 mnth2emp<- c(25005,25010,25000,24995,25005,25010,25060,24995,15995,16005,16015,16020,16030,14015,14000,14200)
 mnth3emp<- c(24985,25000,25005,25010,25009,25040,25090,25080,15990,16000,16065,16025,16030,14665,14550,14620)


 test <- data.frame(naicscode,areavalue,ownership,periodyear,period,mnth1emp,mnth2emp,mnth3emp)




  naicscode       areavalue    ownership     202001     202002    202003  202004   202005   202006  ... and on until 202112. 
     111            000000        50          25000       25005     24985   25005   25010   25000

CodePudding user response：

I'm assuming period means quarter, and the number in name denotes the month number within that quarter.

If that's the case, your column headers are 100*periodyear (period-1)*4 the number inside name.

library(tidyverse)
test %>%
  pivot_longer(starts_with("mnth")) %>%
  mutate(period_num = as.numeric(periodyear)*100   (as.numeric(period)-1)*4   parse_number(name)) %>%
  select(-c(periodyear:name)) %>%
  pivot_wider(names_from = period_num, values_from = value)

result

# A tibble: 2 × 27
  naicscode areavalue ownership 20200…¹ 20200…² 20200…³ 20200…⁴ 20200…⁵ 20200…⁶
  <chr>     <chr>     <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 111       000000    50          25000   25005   24985   25005   25010   25000
2 111       000003    50          16000   15995   15990   16005   16005   16000
# … with 18 more variables: `202009` <dbl>, `202010` <dbl>, `202011` <dbl>,
#   `202013` <dbl>, `202014` <dbl>, `202015` <dbl>, `202101` <dbl>,
#   `202102` <dbl>, `202103` <dbl>, `202105` <dbl>, `202106` <dbl>,
#   `202107` <dbl>, `202109` <dbl>, `202110` <dbl>, `202111` <dbl>,
#   `202113` <dbl>, `202114` <dbl>, `202115` <dbl>, and abbreviated variable
#   names ¹`202001`, ²`202002`, ³`202003`, ⁴`202005`, ⁵`202006`, ⁶`202007`
# ℹ Use `colnames()` to see all variable names