I have a slight problem. I need a table in a very different format but I can't seem to do it. That's the Code:
year_2005 <- c("Category_A",
"Cook", "Butcher", "Stewardess",
"Category_B",
"Actress", "Cook", "Repairman",
"Category_C",
"Stewardess", "Singer", "Scientist")
value_2005 <- c(300,
150, 100, 50,
180,
110, 60, 10,
120,
50, 50, 20)
year_2006 <- c("Category_A",
"Actress", "Stewardess", "Butcher",
"Category_B",
"Cook", "Stewardess", "Singer",
"Category_C",
"Scientist", "Singer", "Butcher")
value_2006 <- c(330,
160, 110, 60,
210,
120, 70, 20,
120,
50, 50, 20)
df <- as.data.frame(cbind(year_2005, value_2005,
year_2006, value_2006))
So every Category has three careers (number is always the same) but I would want each Category in a different column with its three careers for each year. In this example, I only have two years but in reality it's more than 10 years. So essentially I want seven columns: Year, Category_A, Value_Category_A, Category_B, Value_Category_B, Category_C and Value_Category_C. I also need a row for the sum for each category.
So this is what I want it to look like:
year <- rep(c(2005, 2006), each = 4)
category_a <- c("Sum","Cook", "Butcher", "Stewardess")
value_category_a <- c(300, 150, 100, 50)
category_b <- c("Sum", "Cook", "Stewardess", "Singer")
value_category_b <- c(210, 120, 70, 20)
category_c <- c("Sum", "Stewardess", "Singer", "Scientist")
value_category_c <- c(120, 50, 50, 20)
df <- as.data.frame(cbind(year,
category_a, value_category_a,
category_b, value_category_b,
category_c, value_category_c))
Thank you everyone for your help!
CodePudding user response:
This is a fairly complex data wrangling task, but it can be achieved with a couple of pivots:
library(tidyverse)
df %>%
mutate(Category = cumsum(grepl('Category', year_2005))) %>%
group_by(Category) %>%
mutate(Category = first(year_2005)) %>%
mutate(across(everything(), ~ gsub("Category_.", "Sum", .x))) %>%
pivot_longer(-Category, names_sep = "_", names_to = c('name', 'year')) %>%
mutate(name = ifelse(name == 'year', 'name', 'value')) %>%
pivot_wider(year, names_from = c('Category', 'name'), values_from = 'value',
values_fn = list) %>%
unnest(-year) %>%
as.data.frame()
#> year Category_A_name Category_A_value Category_B_name Category_B_value
#> 1 2005 Sum 300 Sum 180
#> 2 2005 Cook 150 Actress 110
#> 3 2005 Butcher 100 Cook 60
#> 4 2005 Stewardess 50 Repairman 10
#> 5 2006 Sum 330 Sum 210
#> 6 2006 Actress 160 Cook 120
#> 7 2006 Stewardess 110 Stewardess 70
#> 8 2006 Butcher 60 Singer 20
#> Category_C_name Category_C_value
#> 1 Sum 120
#> 2 Stewardess 50
#> 3 Singer 50
#> 4 Scientist 20
#> 5 Sum 120
#> 6 Scientist 50
#> 7 Singer 50
#> 8 Butcher 20
Created on 2023-01-05 with reprex v2.0.2
CodePudding user response:
Similar, yet different to @Allan Cameron, for inspiration ( closer to your desired output):
library(dplyr)
library(tidyr)
library(stringr)
df |>
pivot_longer(everything(), names_sep = "_", names_to = c(".value", "year_")) |>
transmute(category = str_extract(year, "Category.*"),
name = str_replace(year, "Category.*", "Sum"),
value_ = value,
year = year_) |>
fill(category) |>
pivot_wider(c("year"),
names_from = "category",
values_from = c("name", "value_"),
names_vary = "slowest",
names_glue = "{str_remove(.value, 'name')}{category}",
values_fn = "list") |>
unnest(-year)
Output:
# A tibble: 8 × 7
year Category_A value_Category_A Category_B value_Category_B Category_C value_Category_C
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2005 Sum 300 Sum 180 Sum 120
2 2005 Cook 150 Actress 110 Stewardess 50
3 2005 Butcher 100 Cook 60 Singer 50
4 2005 Stewardess 50 Repairman 10 Scientist 20
5 2006 Sum 330 Sum 210 Sum 120
6 2006 Actress 160 Cook 120 Scientist 50
7 2006 Stewardess 110 Stewardess 70 Singer 50
8 2006 Butcher 60 Singer 20 Butcher 20