I have the following data frame:
df <- structure(list(file = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2),
model = c("a", "b", "c", "x", "x", "x", "y", "y", "y", "d", "e", "f", "x", "x", "x", "z", "z", "z"),
model_nr = c(0, 0, 0, 1, 1, 1, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2)),
row.names = c(NA, -18L),
class = "data.frame")
file model model_nr
1 1 a 0
2 1 b 0
3 1 c 0
4 1 x 1
5 1 x 1
6 1 x 1
7 1 y 2
8 1 y 2
9 1 y 2
10 2 d 0
11 2 e 0
12 2 f 0
13 2 x 1
14 2 x 1
15 2 x 1
16 2 z 2
17 2 z 2
18 2 z 2
I now want to basically split up or group this data frame by file
and within each group I want to "repeat" the rows/add rows with model_nr == 0
to each model.
So for file 1 we have three rows with model_nr == 0, then this is followed by group model_nr == 1. So I now want to add the three rows with model_nr == 0 before these three rows. Same for model_nr == 2 - I want to add the three rows with model_nr == 0. And then same for file 2.
My desired output could be either a list of data frames split up by file/model_nr or a joint data frame.
file model model_nr
1 1 a 0
2 1 b 0
3 1 c 0
4 1 x 1
5 1 x 1
6 1 x 1
7 1 a 0
8 1 b 0
9 1 c 0
10 1 y 2
11 1 y 2
12 1 y 2
13 2 d 0
14 2 e 0
15 2 f 0
16 2 x 1
17 2 x 1
18 2 x 1
19 2 d 0
20 2 e 0
21 2 f 0
22 2 z 2
23 2 z 2
24 2 z 2
My problem is that I first want to group by file and then within each file group would like to split up by model_nr and then add to each group the rows with model_nr == 0.
This doesn't work, e.g.:
library(tidyverse)
df |>
group_split(file) |>
map(.x = _,
.f = ~.x |>
group_split(model_nr) |>
add_row(.x |> filter(model_nr == 0)))
Error:
! New rows can't add columns.
✖ Can't find columns `file`, `model`, and `model_nr` in `.data`.
Not sure what's not working here, because I thought what I'M doing here, is splitting up by file, then go through each list element (i.e. file = 1 or file = 2), take the respective data frame, split it up again by model number and then add to each group the rows with model_nr == 0.
Any idea how I could achieve this preferably in a tidyverse/purrr way in one pipe?
CodePudding user response:
This is an option -- split up the rows with model_nr == 0
from the others and then join them back in.
library(tidyverse)
df <- structure(list(file = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2),
model = c("a", "b", "c", "x", "x", "x", "y", "y", "y", "d", "e", "f", "x", "x", "x", "z", "z", "z"),
model_nr = c(0, 0, 0, 1, 1, 1, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2)),
row.names = c(NA, -18L),
class = "data.frame")
df %>%
filter(model_nr == 0) %>%
rename(model_0 = model_nr) %>%
left_join(expand(filter(df, model_nr != 0),
nesting(file, model_nr)), by = "file") %>%
full_join(filter(df, model_nr != 0), by = c("file", "model", "model_nr")) %>%
arrange(file, model_nr) %>%
mutate(model_nr = ifelse(is.na(model_0), model_nr, 0)) %>%
select(-model_0)
#> file model model_nr
#> 1 1 a 0
#> 2 1 b 0
#> 3 1 c 0
#> 4 1 x 1
#> 5 1 x 1
#> 6 1 x 1
#> 7 1 a 0
#> 8 1 b 0
#> 9 1 c 0
#> 10 1 y 2
#> 11 1 y 2
#> 12 1 y 2
#> 13 2 d 0
#> 14 2 e 0
#> 15 2 f 0
#> 16 2 x 1
#> 17 2 x 1
#> 18 2 x 1
#> 19 2 d 0
#> 20 2 e 0
#> 21 2 f 0
#> 22 2 z 2
#> 23 2 z 2
#> 24 2 z 2
Created on 2022-05-24 by the reprex package (v2.0.1)
CodePudding user response:
Would this be ok?
library(dplyr)
df %>%
filter(model_nr == 0) %>%
bind_rows(df) %>%
arrange(file)
file model model_nr
1 1 a 0
2 1 b 0
3 1 c 0
4 1 a 0
5 1 b 0
6 1 c 0
7 1 x 1
8 1 x 1
9 1 x 1
10 1 y 2
11 1 y 2
12 1 y 2
13 2 d 0
14 2 e 0
15 2 f 0
16 2 d 0
17 2 e 0
18 2 f 0
19 2 x 1
20 2 x 1
21 2 x 1
22 2 z 2
23 2 z 2
24 2 z 2
CodePudding user response:
Just to show how I currently do it and what workaround I found:
# Extract the overhead from each model that we paste to all models later on
overhead <- df |>
group_by(file) |>
filter(model_nr == 0) |>
ungroup()
# Add the overhead to all models
df |>
group_split(file, model_nr) |>
map(.x = _,
.f = ~ . |>
add_row(overhead |> filter(file == unique(.x$file)), .before = 1) |>
mutate(model_nr = if_else(model_nr == 0, NA_real_, model_nr)) |>
fill(model_nr, .direction = "updown") |>
fill(model, .direction = "updown") |>
filter(!is.na(model_nr)))
From here I can filter out the empty tibbles or bind_rows them. In the map
command, only the add_row
is required to answer my question above. Everything else is just sugar to get a bit beyond.