Below is the sample data and the desired outcome. This is a much simplified version of the actual data set. In the actual data set, there are 20 years and 4 quarters apiece. Looking to have each unique company entry listed once and the employment data series running from beginning to end from left to right. In the event that there is no data for Vision Inc in 2019 quarter 3, then I would want it to return a O and not an NA.
library(tidyverse)
library(dplyr)
legalname <- c("Vision Inc.","Expedia","Strong Enterprise","Vision Inc.","Expedia","Strong Enterprise")
year <- c(2019,2019,2019,2019,2019,2019)
quarter <- c(1,1,1,2,2,2)
cnty <- c(031,029,027,031,029,027)
naics <- c(345110,356110,362110,345110,356110,345110)
mnth1emp <- c (11,13,15,15,17,20)
mnth2emp <- c(12,14,15,16,18,22)
mnth3emp <-c(13,15,15,17,21,29)
employers <- data.frame(legalname,year,quarter,naics,mnth1emp,mnth2emp,mnth3emp)
Desired Outcome
legalname cnty naics 2019m1 2019m2 2019m3 2019m4 2019m5 2019m6
Vision Inc 031 345110 11 12 13 15 16 17
Expedia 029 356110 13 14 15 17 18 21
CodePudding user response:
I first pivot to a long form, then arrange by legalname
and year
(just to double-check that they are in numerical order). Then, I create a unique month series for each year for each company. Then, I drop quarter and pivot back to wide form and put name and year together, and finally replace NA
with 0
. Here, I'm assuming that you want each unique naics
on it's own row.
library(tidyverse)
employers %>%
pivot_longer(starts_with("mnth")) %>%
arrange(legalname, year) %>%
group_by(legalname, year, naics) %>%
mutate(name = paste0("m", 1:n())) %>%
select(-quarter) %>%
pivot_wider(names_from = c("year", "name"), names_sep = "", values_from = "value") %>%
mutate(across(everything(), ~replace_na(.,0)))
Output
legalname naics `2019m1` `2019m2` `2019m3` `2019m4` `2019m5` `2019m6`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Expedia 356110 13 14 15 17 18 21
2 Strong Enterprise 362110 15 15 15 0 0 0
3 Strong Enterprise 345110 0 0 0 20 22 29
4 Vision Inc. 345110 11 12 13 15 16 17
CodePudding user response:
Does this work for you?
First pivot longer to get the months and values in a quarter; and then pivot wider to get the wide format you want.
employers %>%
filter(legalname != "Strong Enterprise") %>%
pivot_longer(mnth1emp:mnth3emp, names_to = "mnth", values_to = "value") %>%
mutate(month_in_quarter = as.numeric(str_extract(mnth, "\\d")),
month =str_c("m", month_in_quarter 3*(quarter - 1))) %>%
select(-c(month_in_quarter, mnth)) %>%
pivot_wider(c(legalname,cnty, naics), names_from = c(year, month),
values_from = value,
values_fill = 0)
values_fill
will fill NA
s with 0s.