I'm using R and the tidyverse and need to be able to systematically search a data frame for "problem rows", and then replace them with a row from above.
For example, let's say I have the following data:
df <- data.frame(
name = c("blank", "A", "B", "blank", "C", "D", "blank", "E"),
problem = c(0, 0, 0, 0, 0, 1, 0, 1)
)
df
<name> <problem>
blank 0
A 0
B 0
blank 0
C 0
D 1
blank 0
E 1
In this example both names D and E have a "problem", so I need to search for the nearest "blank" row above and reposition it where the problem row currently sits. So the solution would look like the following:
<name> <problem>
blank 0
A 0
B 0
C 0
D 1
blank 0
E 1
blank 0
Any help would be tremendously appreciated, as I have a much larger data frame in real life and can't make the adjustments manually. Thank you!
CodePudding user response:
To keep excess blanks in place:
library(tidyverse)
df <- data.frame(
name = c("blank", "A", "B", "blank", "C", "D", "blank", "E"),
problem = c(0, 0, 0, 0, 0, 1, 0, 1)
)
df |>
mutate(temp = if_else(problem == 1, name, NA_character_)) |>
fill(temp, .direction = "up") |>
mutate(temp = if_else(name == "blank" | problem == 1, temp, name)) |>
add_count(temp) |>
group_by(temp) |>
mutate(temp = if_else(n > 2 & row_number() == n - 2,
NA_character_, temp)) |>
ungroup() |>
fill(temp, .direction = "up") |>
arrange(temp, desc(problem)) |>
select(-temp, -n)
#> # A tibble: 8 × 2
#> name problem
#> <chr> <dbl>
#> 1 blank 0
#> 2 A 0
#> 3 B 0
#> 4 C 0
#> 5 D 1
#> 6 blank 0
#> 7 E 1
#> 8 blank 0
Created on 2022-06-02 by the reprex package (v2.0.1)
CodePudding user response:
Carl, I've built off your approach to create something I think is more generalizable. It's not super elegant, but I do think it works. If anyone else wants to have a go, that would be fantastic.
# add more data to top and bottom to ensure only "problems" are being addressed
df <- data.frame(
name = c("blank", "blank", "A", "B", "blank", "C",
"D", "blank", "E", "blank", "F", "blank"),
problem = c(0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0)
)
# find position of the first non-problem name
no_prob_name <- df %>%
filter(name != "blank") %>%
slice_head(n = 1) %>%
pull(name)
row_np_name <- which(df$name == no_prob_name)
# rearrange
df |>
mutate(temp = if_else(problem == 1, name, NA_character_)) |>
fill(temp, .direction = "up") |>
mutate(temp = ifelse(is.na(temp) & name != "blank", name, temp)) %>%
fill(temp, .direction = "up") %>%
fill(temp, .direction = "down") %>%
mutate(temp = if_else(name == "blank" | problem == 1, temp, name)
,temp = as.character(str_glue("{temp}{problem}"))
) |>
add_count(temp) |>
mutate(temp = if_else(n > 1 & row_number() < row_np_name,
NA_character_, temp)) |>
fill(temp, .direction = "up") |>
mutate(temp = str_sub(temp, 1, 1)) %>%
arrange(temp, desc(problem)) |>
select(-temp, -n)