Home > other >  Move subgroup under repeated main group while keeping main group once in data.frame R
Move subgroup under repeated main group while keeping main group once in data.frame R

Time:11-08

I'm aware that the question is awkward. If I could phrase it better I'd probably find the solution in an other thread.

I have this data structure...

df <- data.frame(group = c("X", "F", "F", "F", "F", "C", "C"),
                 subgroup = c(NA, "camel", "horse", "dog", "cat", "orange", "banana"))

... and would like to turn it into this...

data.frame(group = c("X", "F", "camel", "horse", "dog", "cat", "C", "orange", "banana"))

... which is surprisingly confusing. Also, I would prefer not using a loop.

EDIT: I updated the example to clarify that solutions that depend on sorting unfortunately do not do the trick.

CodePudding user response:

Here an (edited) answer with new data. Using data.table is going to help a lot. The idea is to split the df into groups and lapply() to each group what we need. Whe have to take care of some things meanwhile.

library(data.table)
# set as data.table
setDT(df)

# to mantain the ordering, you need to put as factor the group.
# the levels are going to give the ordering infos to split
df[,':='(group = factor(group, levels =unique(df$group)))]

# here the split function, splitting df int a list
df_list <-split(df, df$group, sorted =F)

# now you lapply to each element what you need
df_list <-lapply(df_list, function(x) data.frame(group = unique(c(as.character(x$group),x$subgroup))))

# put into a data.table and remove NAs
rbindlist(df_list)[!is.na(df_onecol$group)]

    group
1:      X
2:      F
3:  camel
4:  horse
5:    dog
6:    cat
7:      C
8: orange
9: banana

CodePudding user response:

One solution, which at least works for the current example (where we can use arrange to find the correct ordering of the groups).

df %>% mutate(col1 = 1) %>% # add column for the pivot_longer

  # pivot groups into one column
  pivot_longer(-col1, values_to = 'group') %>%

  # arrange the groups alphabetically. 
  arrange(group) %>% 

  # remove duplicate rows. 
  unique() %>% 

  # remove the NA.
  filter(!is.na(group))

## alternatively if you also have other data than just these
## two columns the code below should work. 

df %>%
    mutate(col1 = 1) %>% 
    pivot_longer(c(group, subgroup), values_to = 'group') %>% 

    arrange(group) %>% 

    # group by the new group column and keep the first row
    group_by(group) %>%
    slice(1) %>%
    ungroup() %>% 
  
    filter(!is.na(group))


#output

# A tibble: 9 × 3
   col1 name     group
  <dbl> <chr>    <chr>
1     1 group    A    
2     1 group    B    
3     1 subgroup BA   
4     1 subgroup BB   
5     1 subgroup BC   
6     1 subgroup BD   
7     1 group    C    
8     1 subgroup CA   
9     1 subgroup CB 

  • Related