I have the following three data.frame
:
area1 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(0, 100, 0),
sub_ua2 = c(100, 100, 100),
sub_ua3 = c(100, 0, 0))
area2 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(100, 100, 0),
sub_ua2 = c(100, 100, 0),
sub_ua3 = c(100, 0, 0))
df <- data.frame(ua = c(rep(1, 5), rep(2, 4), rep(3, 7)),
subua = c(rep("sub_ua1", 3), "sub_ua2", "sub_ua3",
"sub_ua1", "sub_ua1", "sub_ua2", "sub_ua3",
"sub_ua1", c(rep("sub_ua2", 2)), rep("sub_ua3", 4)),
value = c(rep(2, 3), rep(4, 3), rep(2, 2), rep(1, 8)))
What I'm trying to do is, based on column ua
in dfs area_1
and area_2
, filter only sub_ua
(1 to 3) that have a match of 100 in each df. For example, the first value of sub_ua2
is 100 in both area_1
and area_2
. This is a "sub_ua" I want.
Then, after having this list of "sub_ua" per "ua", filter only them on df
to obtain the filtered value
.
The results should be:
For ua
== 1, get both sub_ua2
and sub_ua3
For ua
== 2, get both sub_ua1
and sub_ua2
For ua
== 3, get sub_ua2
EDIT:
I was using the following approach to obtain a data.frame
of rows and columns indices:
library(prodlim)
# Indices for data frame 1 and 2 for values = 100
indices_1 <- which(area1 == 100, arr.ind = TRUE)
indices_2 <- which(area2 == 100, arr.ind = TRUE)
# Rows where indices are matched between the two data frame indices
indices_rows <- na.omit(row.match(as.data.frame(indices_1), as.data.frame(indices_2)))
# Row-column indices where both data frames have values of 100
indices_2[indices_rows, ]
I just don't know how to use this to filter in the final dataset df
CodePudding user response:
If I understood correctly this should work:
area1 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(0, 100, 0),
sub_ua2 = c(100, 100, 100),
sub_ua3 = c(100, 0, 0))
area2 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(100, 100, 0),
sub_ua2 = c(100, 100, 0),
sub_ua3 = c(100, 0, 0))
library(dplyr)
library(tidyr)
area1 %>%
left_join(area2, by = "ua", suffix = c(".area1",".area2")) %>%
pivot_longer(cols = -ua,names_to = "var",values_to = "value") %>%
separate(col = var,into = c("var","area"),sep = "\\.") %>%
pivot_wider(names_from = area,values_from = value) %>%
filter(area1 == 100, area2 == 100) %>%
select(-starts_with("area"))
# A tibble: 4 x 2
ua var
<dbl> <chr>
1 1 sub_ua2
2 1 sub_ua3
3 2 sub_ua1
4 2 sub_ua2