Pivot wider dataframe with difficult structure dplyr-CodePudding

I was working on something I thought would be simple, but maybe today my brain isn't working. My data is like this:

tibble(metric = c('income', 'income_upp', 'income_low', 'n_house', 'n_house_upp', 'n_house_low'),
       value = c(120, 140, 100, 10, 8, 12))


metric     value
income      120
income_low  100
income_upp  140
n            10
n_low        8
n_upp       12

And I want to pivot_wider so it looks like this:

metric     value value_low  value_upp
income      120    100        140
n            10     8         12

I'm having trouble separating metrics, because pivot_wider as is, brings a dataframe that's too wide:

df %>% pivot_wider(names_from = 'metric', values_from = value)

How can I achieve this or should I pivot longer after the pivot wider?

Thanks!

CodePudding user response：

I think if you convert metric into a column with "value", "value_upp" and "value_low" values, you can pivot_wider:

df %>% 
  mutate(param = case_when(str_detect(metric, "upp") ~ "value_upp",
                           str_detect(metric, "low") ~ "value_low",
                           TRUE ~ "value"),
         metric = str_remove(metric, "_low|_upp")) %>% 
  pivot_wider(names_from = param, values_from = value)

CodePudding user response：

I like to use separate() when I have text in a column like this. This function allows you to separate a column into multiple columns if there is a separator in the function.

In particular in this example we would want to use the arguments sep="_" and into = c("metric", "state") to convert into columns with those names.

Then mutate() and pivot_wider() can be used as you had previously specified.

library(tidyverse)

df <- tribble(~metric, ~value,
  "income", 120,
  "income_low", 100,
  "income_upp", 140,
  "n", 10,
  "n_low", 8,
  "n_upp", 12)

df |>
  separate(metric, sep = "_", into = c("metric", "state")) |> 
  mutate(state = ifelse(is.na(state), "value", state)) |> 
  pivot_wider(id_cols = metric, names_from = state, values_from = value, names_sep = "_")
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 2 rows [1, 4].
#> # A tibble: 2 × 4
#>   metric value   low   upp
#>   <chr>  <dbl> <dbl> <dbl>
#> 1 income   120   100   140
#> 2 n         10     8    12

^{Created on 2022-12-21 with reprex v2.0.2}

Note you can use the argument names_glue or names_prefix in pivot_wider() to add the "value" as a prefix to the column names.

CodePudding user response：

a data.table approach (if you can live wit the trailing underacore achter value_

library(data.table)
setDT(df)

# create some new columns based on metric
df[, c("first", "second") := tstrsplit(metric, "_")]
#        metric value  first second
# 1:     income   120 income   <NA>
# 2: income_low   100 income    low
# 3: income_upp   140 income    upp
# 4:          n    10      n   <NA>
# 5:      n_low     8      n    low
# 6:      n_upp    12      n    upp

# replace NA with ""
df[is.na(df)] <- ""

# now cast to wide, createing colnames on the fly
dcast(df, first ~ paste0("value_", second), value.var = "value")
#     first value_ value_low value_upp
# 1: income    120       100       140
# 2:      n     10         8        12