Suppose I have two data frames like this:
df1 <- data.frame(a = c(1,2,4,0,0),
b = c(0,3,5,5,0),
c = c(0,0,6,7,6))
df2 <- data.frame(a = c(3,6,8,0,0),
b = c(0,9,10,4,0),
c = c(0,0,1,4,9))
And then I joint it, like
df3 <- full_join(df1, df2)
print(df3)
a b c
1 1 0 0
2 2 3 0
3 4 5 6
4 0 5 7
5 0 0 6
6 3 0 0
7 6 9 0
8 8 10 1
9 0 4 4
10 0 0 9
Note that I have always the same pattern, with zeros in rows 1 and 2; and in rows 9 and 10. And I also have zeros between rows 4 and 7. I want to remove, only, the zeros between rows 4 and 7. So, I can solve it, like:
df3[4,1] <- NA
df3[5,1] <- NA
df3[5,2] <- NA
df3[6,2] <- NA
df3[6,3] <- NA
df3[7,3] <- NA
new.df3 <- as.data.frame(lapply(df3, na.omit))
print(new.df3)
a b c
1 1 0 0
2 2 3 0
3 4 5 6
4 3 5 7
5 6 9 6
6 8 10 1
7 0 4 4
8 0 0 9
But it is not elegant and very time-consuming. Any thoughts? I really appreciate it, thanks in advance. Best!
CodePudding user response:
First, you find which one is zero between rows 4 and 7.
to_remove <- apply(df3[4:7, ], 1, function(x) which(x == 0))
Then, you substitute them by NAs.
for(i in seq(length(to_remove))){
df3[as.numeric(names(to_remove))[i], to_remove[[i]]] <- NA
}
And, finally, drop them.
new.df3 <- as.data.frame(lapply(df3, na.omit))
print(new.df3)
CodePudding user response:
Here is a tidyverse
solution. Note that this will fail, if not every variable in the data frame has the same number of zeros to be omitted in the specified range.
library(tidyverse)
df3 |>
# Define the range of rows for omission
mutate(flag = row_number() %in% 4:7) |>
# Group variable to preserve order, increments when flag changes
group_by(grp = (flag - lag(flag)) |>
replace_na(0) |>
abs() |>
cumsum()) |>
summarize(across(a:c,
# when flag then omit zeros
\(x, flag) if (first(flag)) x[x != 0] else x,
flag = flag),
.groups = "drop"
) |>
select(-grp)
#> # A tibble: 8 × 3
#> a b c
#> <dbl> <dbl> <dbl>
#> 1 1 0 0
#> 2 2 3 0
#> 3 4 5 6
#> 4 3 5 7
#> 5 6 9 6
#> 6 8 10 1
#> 7 0 4 4
#> 8 0 0 9
If you don't care about preserving the order of the rows, then it can be shortened:
df3 |>
# Define the range of rows for omission
group_by(flag = row_number() %in% 4:7) |>
summarize(across(a:c,
# when flag then omit zeros
\(x, flag) if (first(flag)) x[x != 0] else x,
flag = flag),
.groups = "drop") |>
select(-flag)
#> # A tibble: 8 × 3
#> a b c
#> <dbl> <dbl> <dbl>
#> 1 1 0 0
#> 2 2 3 0
#> 3 4 5 6
#> 4 8 10 1
#> 5 0 4 4
#> 6 0 0 9
#> 7 3 5 7
#> 8 6 9 6
Created on 2022-02-08 by the reprex package (v2.0.1)
CodePudding user response:
Here's a different approach:
mask <- !(seq(nrow(df3)) %in% 4:7 & df3 == 0)
df.lst <- lapply(1:3, function(x) df3[mask[, x], x])
sapply(df.lst, length)
# [1] 8 8 8 # Check to make sure the columns are the same length
names(df.lst) <- colnames(df3)
(new.df3 <- as.data.frame(df.lst))
# a b c
# 1 1 0 0
# 2 2 3 0
# 3 4 5 6
# 4 3 5 7
# 5 6 9 6
# 6 8 10 1
# 7 0 4 4
# 8 0 0 9
CodePudding user response:
df3 %>%
mutate(rn = between(row_number(), 4, 7)) %>%
summarise(across(-rn, ~.x[!(.x == 0 & rn)]))
a b c
1 1 0 0
2 2 3 0
3 4 5 6
4 3 5 7
5 6 9 6
6 8 10 1
7 0 4 4
8 0 0 9