Home > Software design >  Nested group_split to add rows in each group
Nested group_split to add rows in each group

Time:05-25

I have the following data frame:

    df <- structure(list(file = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2),
                         model = c("a", "b", "c", "x", "x", "x", "y", "y", "y", "d", "e", "f", "x", "x", "x", "z", "z", "z"),
                         model_nr = c(0, 0, 0, 1, 1, 1, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2)),
                    row.names = c(NA, -18L),
                    class = "data.frame")

   file model model_nr
1     1     a        0
2     1     b        0
3     1     c        0
4     1     x        1
5     1     x        1
6     1     x        1
7     1     y        2
8     1     y        2
9     1     y        2
10    2     d        0
11    2     e        0
12    2     f        0
13    2     x        1
14    2     x        1
15    2     x        1
16    2     z        2
17    2     z        2
18    2     z        2

I now want to basically split up or group this data frame by file and within each group I want to "repeat" the rows/add rows with model_nr == 0 to each model.

So for file 1 we have three rows with model_nr == 0, then this is followed by group model_nr == 1. So I now want to add the three rows with model_nr == 0 before these three rows. Same for model_nr == 2 - I want to add the three rows with model_nr == 0. And then same for file 2.

My desired output could be either a list of data frames split up by file/model_nr or a joint data frame.

   file model model_nr
1     1     a        0
2     1     b        0
3     1     c        0
4     1     x        1
5     1     x        1
6     1     x        1
7     1     a        0
8     1     b        0
9     1     c        0
10    1     y        2
11    1     y        2
12    1     y        2
13    2     d        0
14    2     e        0
15    2     f        0
16    2     x        1
17    2     x        1
18    2     x        1
19    2     d        0
20    2     e        0
21    2     f        0
22    2     z        2
23    2     z        2
24    2     z        2

My problem is that I first want to group by file and then within each file group would like to split up by model_nr and then add to each group the rows with model_nr == 0.

This doesn't work, e.g.:

library(tidyverse)
df |> 
  group_split(file) |>
  map(.x = _,
      .f = ~.x |> 
        group_split(model_nr) |> 
        add_row(.x |> filter(model_nr == 0)))

Error:
! New rows can't add columns.
✖ Can't find columns `file`, `model`, and `model_nr` in `.data`.

Not sure what's not working here, because I thought what I'M doing here, is splitting up by file, then go through each list element (i.e. file = 1 or file = 2), take the respective data frame, split it up again by model number and then add to each group the rows with model_nr == 0.

Any idea how I could achieve this preferably in a tidyverse/purrr way in one pipe?

CodePudding user response:

This is an option -- split up the rows with model_nr == 0 from the others and then join them back in.

library(tidyverse)
df <- structure(list(file = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2),
                     model = c("a", "b", "c", "x", "x", "x", "y", "y", "y", "d", "e", "f", "x", "x", "x", "z", "z", "z"),
                     model_nr = c(0, 0, 0, 1, 1, 1, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2)),
                row.names = c(NA, -18L),
                class = "data.frame")

df %>% 
  filter(model_nr == 0) %>% 
  rename(model_0 = model_nr) %>% 
  left_join(expand(filter(df, model_nr != 0), 
                   nesting(file, model_nr)), by = "file") %>% 
  full_join(filter(df, model_nr != 0), by = c("file", "model", "model_nr")) %>% 
  arrange(file, model_nr) %>% 
  mutate(model_nr = ifelse(is.na(model_0), model_nr, 0)) %>% 
  select(-model_0)
#>    file model model_nr
#> 1     1     a        0
#> 2     1     b        0
#> 3     1     c        0
#> 4     1     x        1
#> 5     1     x        1
#> 6     1     x        1
#> 7     1     a        0
#> 8     1     b        0
#> 9     1     c        0
#> 10    1     y        2
#> 11    1     y        2
#> 12    1     y        2
#> 13    2     d        0
#> 14    2     e        0
#> 15    2     f        0
#> 16    2     x        1
#> 17    2     x        1
#> 18    2     x        1
#> 19    2     d        0
#> 20    2     e        0
#> 21    2     f        0
#> 22    2     z        2
#> 23    2     z        2
#> 24    2     z        2

Created on 2022-05-24 by the reprex package (v2.0.1)

CodePudding user response:

Would this be ok?

library(dplyr)

df %>% 
  filter(model_nr == 0) %>% 
  bind_rows(df) %>% 
  arrange(file)
   file model model_nr
1     1     a        0
2     1     b        0
3     1     c        0
4     1     a        0
5     1     b        0
6     1     c        0
7     1     x        1
8     1     x        1
9     1     x        1
10    1     y        2
11    1     y        2
12    1     y        2
13    2     d        0
14    2     e        0
15    2     f        0
16    2     d        0
17    2     e        0
18    2     f        0
19    2     x        1
20    2     x        1
21    2     x        1
22    2     z        2
23    2     z        2
24    2     z        2

CodePudding user response:

Just to show how I currently do it and what workaround I found:

# Extract the overhead from each model that we paste to all models later on
overhead <- df |> 
  group_by(file) |> 
  filter(model_nr == 0) |> 
  ungroup()

# Add the overhead to all models
df |> 
  group_split(file, model_nr) |> 
  map(.x = _,
      .f = ~ . |> 
        add_row(overhead |> filter(file == unique(.x$file)), .before = 1) |> 
        mutate(model_nr = if_else(model_nr == 0, NA_real_, model_nr)) |> 
        fill(model_nr, .direction = "updown") |>
        fill(model, .direction = "updown") |> 
        filter(!is.na(model_nr)))

From here I can filter out the empty tibbles or bind_rows them. In the map command, only the add_row is required to answer my question above. Everything else is just sugar to get a bit beyond.

  • Related