Move subgroup under repeated main group while keeping main group once in data.frame R-CodePudding

I'm aware that the question is awkward. If I could phrase it better I'd probably find the solution in an other thread.

I have this data structure...

df <- data.frame(group = c("X", "F", "F", "F", "F", "C", "C"),
                 subgroup = c(NA, "camel", "horse", "dog", "cat", "orange", "banana"))

... and would like to turn it into this...

data.frame(group = c("X", "F", "camel", "horse", "dog", "cat", "C", "orange", "banana"))

... which is surprisingly confusing. Also, I would prefer not using a loop.

EDIT: I updated the example to clarify that solutions that depend on sorting unfortunately do not do the trick.

CodePudding user response：

Here an (edited) answer with new data. Using data.table is going to help a lot. The idea is to split the df into groups and lapply() to each group what we need. Whe have to take care of some things meanwhile.

library(data.table)
# set as data.table
setDT(df)

# to mantain the ordering, you need to put as factor the group.
# the levels are going to give the ordering infos to split
df[,':='(group = factor(group, levels =unique(df$group)))]

# here the split function, splitting df int a list
df_list <-split(df, df$group, sorted =F)

# now you lapply to each element what you need
df_list <-lapply(df_list, function(x) data.frame(group = unique(c(as.character(x$group),x$subgroup))))

# put into a data.table and remove NAs
rbindlist(df_list)[!is.na(df_onecol$group)]

    group
1:      X
2:      F
3:  camel
4:  horse
5:    dog
6:    cat
7:      C
8: orange
9: banana

CodePudding user response：

One solution, which at least works for the current example (where we can use arrange to find the correct ordering of the groups).

df %>% mutate(col1 = 1) %>% # add column for the pivot_longer

  # pivot groups into one column
  pivot_longer(-col1, values_to = 'group') %>%

  # arrange the groups alphabetically. 
  arrange(group) %>% 

  # remove duplicate rows. 
  unique() %>% 

  # remove the NA.
  filter(!is.na(group))

## alternatively if you also have other data than just these
## two columns the code below should work. 

df %>%
    mutate(col1 = 1) %>% 
    pivot_longer(c(group, subgroup), values_to = 'group') %>% 

    arrange(group) %>% 

    # group by the new group column and keep the first row
    group_by(group) %>%
    slice(1) %>%
    ungroup() %>% 
  
    filter(!is.na(group))


#output

# A tibble: 9 × 3
   col1 name     group
  <dbl> <chr>    <chr>
1     1 group    A    
2     1 group    B    
3     1 subgroup BA   
4     1 subgroup BB   
5     1 subgroup BC   
6     1 subgroup BD   
7     1 group    C    
8     1 subgroup CA   
9     1 subgroup CB