I'm aware that the question is awkward. If I could phrase it better I'd probably find the solution in an other thread.
I have this data structure...
df <- data.frame(group = c("X", "F", "F", "F", "F", "C", "C"),
subgroup = c(NA, "camel", "horse", "dog", "cat", "orange", "banana"))
... and would like to turn it into this...
data.frame(group = c("X", "F", "camel", "horse", "dog", "cat", "C", "orange", "banana"))
... which is surprisingly confusing. Also, I would prefer not using a loop.
EDIT: I updated the example to clarify that solutions that depend on sorting unfortunately do not do the trick.
CodePudding user response:
Here an (edited) answer with new data.
Using data.table
is going to help a lot. The idea is to split the df into groups and lapply()
to each group what we need. Whe have to take care of some things meanwhile.
library(data.table)
# set as data.table
setDT(df)
# to mantain the ordering, you need to put as factor the group.
# the levels are going to give the ordering infos to split
df[,':='(group = factor(group, levels =unique(df$group)))]
# here the split function, splitting df int a list
df_list <-split(df, df$group, sorted =F)
# now you lapply to each element what you need
df_list <-lapply(df_list, function(x) data.frame(group = unique(c(as.character(x$group),x$subgroup))))
# put into a data.table and remove NAs
rbindlist(df_list)[!is.na(df_onecol$group)]
group
1: X
2: F
3: camel
4: horse
5: dog
6: cat
7: C
8: orange
9: banana
CodePudding user response:
One solution, which at least works for the current example (where we can use arrange to find the correct ordering of the groups).
df %>% mutate(col1 = 1) %>% # add column for the pivot_longer
# pivot groups into one column
pivot_longer(-col1, values_to = 'group') %>%
# arrange the groups alphabetically.
arrange(group) %>%
# remove duplicate rows.
unique() %>%
# remove the NA.
filter(!is.na(group))
## alternatively if you also have other data than just these
## two columns the code below should work.
df %>%
mutate(col1 = 1) %>%
pivot_longer(c(group, subgroup), values_to = 'group') %>%
arrange(group) %>%
# group by the new group column and keep the first row
group_by(group) %>%
slice(1) %>%
ungroup() %>%
filter(!is.na(group))
#output
# A tibble: 9 × 3
col1 name group
<dbl> <chr> <chr>
1 1 group A
2 1 group B
3 1 subgroup BA
4 1 subgroup BB
5 1 subgroup BC
6 1 subgroup BD
7 1 group C
8 1 subgroup CA
9 1 subgroup CB