I have a long dataframe that lists the top 3 employers of each occupation code (3 rows per occupation code). It looks like this.
occcode | employer |
---|---|
1 | top employer for occcode1 |
1 | 2nd employer for occcode 1 |
1 | 3rd employer for occcode 1 |
2 | top employer for occcode2 |
2 | 2nd employer for occcode 2 |
2 | 3rd employer for occcode 1 |
I want to reshape it so that I have one row per occupation code, and columns named "emp1", "emp2", and "emp3" that are respectively populated with the 1st-3rd employers of that occupation code.
occcode | employer1 | employer2 | employer3 |
---|---|---|---|
1 | top employer for occcode1 | 2nd employer for occcode 1 | 3rd employer for occcode 1 |
2 | top employer for occcode2 | 2nd employer for occode2 | 3rd employer for occcode 1 |
I previously thought using the spread()
function would work. But reading the documentation and testing it out, it doesn't produce what I have in mind because it requires that the values in "employer" in the long version of the data be standardized (such that there are only 3 employer names); that's not the case because employer names vary a lot across occupation codes.
What is the best way to do reshape the data in line with what I need?
CodePudding user response:
I removed the last row of source data to show that this should work for variable numbers of employers per occcode:
library(tidyverse)
data.frame(
stringsAsFactors = FALSE,
occcode = c(1L, 1L, 1L, 2L, 2L),
employer = c("top employer for occcode1",
"2nd employer for occcode 1","3rd employer for occcode 1",
"top employer for occcode2",
"2nd employer for occcode 2")
) %>%
group_by(occcode) %>%
mutate(col = paste0("employer", row_number())) %>%
ungroup() %>%
pivot_wider(names_from = col, values_from = employer)
Result
# A tibble: 2 × 4
occcode employer1 employer2 employer3
<int> <chr> <chr> <chr>
1 1 top employer for occcode1 2nd employer for occcode 1 3rd employer for occcode 1
2 2 top employer for occcode2 2nd employer for occcode 2 NA
CodePudding user response:
Here is another approach:
library(data.table)
dcast(
setDT(df)[, emp:={emp=substr(employer,1,1);emp=paste0("employer",fifelse(emp=="t","1",emp))}],
occcode~emp, value.var="employer"
)
Output:
occcode employer1 employer2 employer3
1: 1 top employer for occcode1 2nd employer for occcode 1 3rd employer for occcode 1
2: 2 top employer for occcode2 2nd employer for occcode 2 3rd employer for occcode 2
Input:
structure(list(occcode = c(1L, 1L, 1L, 2L, 2L, 2L), employer = c("top employer for occcode1",
"2nd employer for occcode 1", "3rd employer for occcode 1", "top employer for occcode2",
"2nd employer for occcode 2", "3rd employer for occcode 2")), row.names = c(NA,
-6L), class = "data.frame")