I really need your help with processing this dataframe. Let's assume we have a dataframe like the one below.
category value
A NA
high 20
mid 18
low 10
B NA
high 22
mid 12
low 11
C NA
low 14
As you can see, there are 3 categories A, B, and with possible subcategories high, mid, low. The category names have NA value because only each category's subcategory has a value, not the category itself. What makes this complicated is the fact that category C has only a single subcategory, which is low. The format I would like to convert it to, looks something like this:
category subcategory value
A high 20
A mid 18
A low 10
B high 22
B mid 12
B low 11
C low 14
I would really appreciate if someone could help me with this. I have been trying to figure it out for a long while. Thanks!
CodePudding user response:
You can define the sub-category values (cat_value
) and bring them in a separate column (subcategory
). Change the sub-category values to NA
and use tidyr::fill
to fill them with category
values.
library(dplyr)
cat_value <- c('high', 'mid', 'low')
df %>%
mutate(subcategory = replace(category, !category %in% cat_value, NA) ,
category = replace(category, category %in% cat_value, NA)) %>%
tidyr::fill(category) %>%
filter(!is.na(subcategory)) %>%
select(category, subcategory, value)
# category subcategory value
#1 A high 20
#2 A mid 18
#3 A low 10
#4 B high 22
#5 B mid 12
#6 B low 11
#7 C low 14
data
df <- structure(list(category = c("A", "high", "mid", "low", "B", "high",
"mid", "low", "C", "low"), value = c(NA, 20L, 18L, 10L, NA, 22L,
12L, 11L, NA, 14L)), class = "data.frame", row.names = c(NA, -10L))