Home > database >  Match two equal-sized data.frames and then filter results on a third
Match two equal-sized data.frames and then filter results on a third

Time:11-29

I have the following three data.frame:

area1 <- data.frame(ua = c(1, 2, 3),
                   sub_ua1 = c(0, 100, 0),
                   sub_ua2 = c(100, 100, 100),
                   sub_ua3 = c(100, 0, 0))

area2 <- data.frame(ua = c(1, 2, 3),
                    sub_ua1 = c(100, 100, 0),
                    sub_ua2 = c(100, 100, 0),
                    sub_ua3 = c(100, 0, 0))

df <- data.frame(ua = c(rep(1, 5), rep(2, 4), rep(3, 7)),
                 subua = c(rep("sub_ua1", 3), "sub_ua2", "sub_ua3",
                           "sub_ua1", "sub_ua1", "sub_ua2", "sub_ua3",
                           "sub_ua1", c(rep("sub_ua2", 2)), rep("sub_ua3", 4)),
                 value = c(rep(2, 3), rep(4, 3), rep(2, 2), rep(1, 8)))

What I'm trying to do is, based on column ua in dfs area_1 and area_2, filter only sub_ua (1 to 3) that have a match of 100 in each df. For example, the first value of sub_ua2 is 100 in both area_1 and area_2. This is a "sub_ua" I want.

Then, after having this list of "sub_ua" per "ua", filter only them on df to obtain the filtered value.

The results should be: For ua == 1, get both sub_ua2 and sub_ua3 For ua == 2, get both sub_ua1 and sub_ua2 For ua == 3, get sub_ua2

EDIT:

I was using the following approach to obtain a data.frame of rows and columns indices:

library(prodlim)
# Indices for data frame 1 and 2 for values = 100
indices_1 <- which(area1 == 100, arr.ind = TRUE)
indices_2 <- which(area2 == 100, arr.ind = TRUE)

# Rows where indices are matched between the two data frame indices
indices_rows <- na.omit(row.match(as.data.frame(indices_1), as.data.frame(indices_2)))

# Row-column indices where both data frames have values of 100
indices_2[indices_rows, ]

I just don't know how to use this to filter in the final dataset df

CodePudding user response:

If I understood correctly this should work:

area1 <- data.frame(ua = c(1, 2, 3),
                    sub_ua1 = c(0, 100, 0),
                    sub_ua2 = c(100, 100, 100),
                    sub_ua3 = c(100, 0, 0))

area2 <- data.frame(ua = c(1, 2, 3),
                    sub_ua1 = c(100, 100, 0),
                    sub_ua2 = c(100, 100, 0),
                    sub_ua3 = c(100, 0, 0))
library(dplyr)
library(tidyr)

area1 %>% 
  left_join(area2, by = "ua", suffix = c(".area1",".area2")) %>% 
  pivot_longer(cols = -ua,names_to = "var",values_to = "value") %>% 
  separate(col = var,into = c("var","area"),sep = "\\.") %>% 
  pivot_wider(names_from = area,values_from = value) %>% 
  filter(area1 == 100, area2 == 100) %>% 
  select(-starts_with("area"))

# A tibble: 4 x 2
     ua var    
  <dbl> <chr>  
1     1 sub_ua2
2     1 sub_ua3
3     2 sub_ua1
4     2 sub_ua2
  • Related