Home > Net >  Format table differently (Rows to Columns per Category)
Format table differently (Rows to Columns per Category)

Time:01-06

I have a slight problem. I need a table in a very different format but I can't seem to do it. That's the Code:

year_2005 <- c("Category_A", 
               "Cook", "Butcher", "Stewardess",
               "Category_B", 
               "Actress", "Cook", "Repairman",
               "Category_C",
               "Stewardess", "Singer", "Scientist")
value_2005 <- c(300,
                150, 100, 50,
                180,
                110, 60, 10,
                120,
                50, 50, 20)


year_2006 <- c("Category_A", 
               "Actress", "Stewardess", "Butcher",
               "Category_B", 
               "Cook", "Stewardess", "Singer",
               "Category_C",
               "Scientist", "Singer", "Butcher")
value_2006 <- c(330,
                160, 110, 60,
                210,
                120, 70, 20,
                120,
                50, 50, 20)

df <- as.data.frame(cbind(year_2005, value_2005,
            year_2006, value_2006))

So every Category has three careers (number is always the same) but I would want each Category in a different column with its three careers for each year. In this example, I only have two years but in reality it's more than 10 years. So essentially I want seven columns: Year, Category_A, Value_Category_A, Category_B, Value_Category_B, Category_C and Value_Category_C. I also need a row for the sum for each category.

So this is what I want it to look like:

year <- rep(c(2005, 2006), each = 4)
category_a <- c("Sum","Cook", "Butcher", "Stewardess")
value_category_a <- c(300, 150, 100, 50)
category_b <- c("Sum", "Cook", "Stewardess", "Singer")
value_category_b <- c(210, 120, 70, 20)
category_c <- c("Sum", "Stewardess", "Singer", "Scientist")
value_category_c <- c(120, 50, 50, 20)

df <- as.data.frame(cbind(year, 
                          category_a, value_category_a, 
                          category_b, value_category_b,
                          category_c, value_category_c))

Thank you everyone for your help!

CodePudding user response:

This is a fairly complex data wrangling task, but it can be achieved with a couple of pivots:

library(tidyverse)

df %>%
  mutate(Category = cumsum(grepl('Category', year_2005))) %>%
  group_by(Category) %>%
  mutate(Category = first(year_2005)) %>%
  mutate(across(everything(), ~ gsub("Category_.", "Sum", .x))) %>%
  pivot_longer(-Category, names_sep = "_", names_to = c('name', 'year')) %>%
  mutate(name = ifelse(name == 'year', 'name', 'value')) %>%
  pivot_wider(year, names_from = c('Category', 'name'), values_from = 'value',
              values_fn = list) %>%
  unnest(-year) %>%
  as.data.frame()
#>   year Category_A_name Category_A_value Category_B_name Category_B_value
#> 1 2005             Sum              300             Sum              180
#> 2 2005            Cook              150         Actress              110
#> 3 2005         Butcher              100            Cook               60
#> 4 2005      Stewardess               50       Repairman               10
#> 5 2006             Sum              330             Sum              210
#> 6 2006         Actress              160            Cook              120
#> 7 2006      Stewardess              110      Stewardess               70
#> 8 2006         Butcher               60          Singer               20
#>   Category_C_name Category_C_value
#> 1             Sum              120
#> 2      Stewardess               50
#> 3          Singer               50
#> 4       Scientist               20
#> 5             Sum              120
#> 6       Scientist               50
#> 7          Singer               50
#> 8         Butcher               20

Created on 2023-01-05 with reprex v2.0.2

CodePudding user response:

Similar, yet different to @Allan Cameron, for inspiration ( closer to your desired output):

library(dplyr)
library(tidyr)
library(stringr)

df |> 
  pivot_longer(everything(), names_sep = "_", names_to = c(".value", "year_")) |>
  transmute(category = str_extract(year, "Category.*"),
            name = str_replace(year, "Category.*", "Sum"),
            value_ = value,
            year = year_) |>
  fill(category) |>
  pivot_wider(c("year"),
              names_from = "category",
              values_from = c("name", "value_"),
              names_vary = "slowest",
              names_glue = "{str_remove(.value, 'name')}{category}",
              values_fn = "list") |>
  unnest(-year)

Output:

# A tibble: 8 × 7
  year  Category_A value_Category_A Category_B value_Category_B Category_C value_Category_C
  <chr> <chr>      <chr>            <chr>      <chr>            <chr>      <chr>           
1 2005  Sum        300              Sum        180              Sum        120             
2 2005  Cook       150              Actress    110              Stewardess 50              
3 2005  Butcher    100              Cook       60               Singer     50              
4 2005  Stewardess 50               Repairman  10               Scientist  20              
5 2006  Sum        330              Sum        210              Sum        120             
6 2006  Actress    160              Cook       120              Scientist  50              
7 2006  Stewardess 110              Stewardess 70               Singer     50              
8 2006  Butcher    60               Singer     20               Butcher    20              
  • Related