Below is the sample data. My goal is to create a single row (monthly data) for each area/industry/ownership combination. For this sample data set, there will be 24 values per area/industry/ownership combination. I know that you have to do a series of pivots but my attempts have not been successful. At the bottom is the desired result.
In my larger data set, I have far more than 3 years and one industry but this keeps it manageable.
naicscode <- c("111","111","111","111","111","111","111","111","111","111","111","111","111","111","111","111")
areavalue <- c("000000","000000","000000","000000","000000","000000","000000","000000","000003","000003","000003","000003","000003","000003","000003","000003")
ownership <- c("50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50")
period <- c("01","02","03","04","01","02","03","04","01","02","03","04","01","02","03","04")
periodyear <- c("2020","2020","2020","2020","2021","2021","2021","2021", "2020","2020","2020","2020", "2021","2021","2021","2021")
mnth1emp<- c(25000,25005,25010,25020,25025,20506,20510,21555,16000,16005,16025,16020,16035,13595,14010,13985)
mnth2emp<- c(25005,25010,25000,24995,25005,25010,25060,24995,15995,16005,16015,16020,16030,14015,14000,14200)
mnth3emp<- c(24985,25000,25005,25010,25009,25040,25090,25080,15990,16000,16065,16025,16030,14665,14550,14620)
test <- data.frame(naicscode,areavalue,ownership,periodyear,period,mnth1emp,mnth2emp,mnth3emp)
naicscode areavalue ownership 202001 202002 202003 202004 202005 202006 ... and on until 202112.
111 000000 50 25000 25005 24985 25005 25010 25000
CodePudding user response:
I'm assuming period
means quarter, and the number in name
denotes the month number within that quarter.
If that's the case, your column headers are 100*periodyear
(period-1)*4
the number inside name
.
library(tidyverse)
test %>%
pivot_longer(starts_with("mnth")) %>%
mutate(period_num = as.numeric(periodyear)*100 (as.numeric(period)-1)*4 parse_number(name)) %>%
select(-c(periodyear:name)) %>%
pivot_wider(names_from = period_num, values_from = value)
result
# A tibble: 2 × 27
naicscode areavalue ownership 20200…¹ 20200…² 20200…³ 20200…⁴ 20200…⁵ 20200…⁶
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 111 000000 50 25000 25005 24985 25005 25010 25000
2 111 000003 50 16000 15995 15990 16005 16005 16000
# … with 18 more variables: `202009` <dbl>, `202010` <dbl>, `202011` <dbl>,
# `202013` <dbl>, `202014` <dbl>, `202015` <dbl>, `202101` <dbl>,
# `202102` <dbl>, `202103` <dbl>, `202105` <dbl>, `202106` <dbl>,
# `202107` <dbl>, `202109` <dbl>, `202110` <dbl>, `202111` <dbl>,
# `202113` <dbl>, `202114` <dbl>, `202115` <dbl>, and abbreviated variable
# names ¹`202001`, ²`202002`, ³`202003`, ⁴`202005`, ⁵`202006`, ⁶`202007`
# ℹ Use `colnames()` to see all variable names