Home > Software engineering >  Create new column using grouped column and numeric column from different data frame - R
Create new column using grouped column and numeric column from different data frame - R

Time:09-27

I have two data frames (regionSum and areaSum) shown below.

Area Region void_pts
0 010 0.125
1 110 1.566
1 111 1.350
3 310 2.004
3 312 1.652
Area void_pts
0 0.455
1 1.436
2 1.396
3 1.981

I'm trying to create a new column (alert) in the regionSum dataframe using these two conditions: regionSum$void_pts >= areaSum$void_pts and regionSum$area == areaSum$area.

Here is a snippet of the code that I've tried to use but its errors out.

t  %>% 
  mutate(alert = case_when(void_pts >= areaSum$void_pts & area == areaSum$area ~ "Red",
                                    TRUE ~ "Blue" )

What am I missing to give me the below results and how'd I go by tackling the same problem in the case of a lot of factors of the area field?

Area Region void_pts alert
0 010 0.125 blue
1 110 1.566 red
1 111 1.350 blue
3 310 2.004 red
3 312 1.652 blue

CodePudding user response:

We can perform a join:

library(dplyr)

regionSum %>% inner_join(areaSum, by = c('Area' = 'Area')) %>% mutate(alert = case_when(void_pts.x >= void_pts.y ~ "Red",
                                                                                         TRUE ~ "Blue" ))
# A tibble: 5 × 5
   Area Region void_pts.x void_pts.y alert
  <dbl> <chr>       <dbl>      <dbl> <chr>
1     0 010         0.125      0.455 Blue 
2     1 110         1.57       1.44  Red  
3     1 111         1.35       1.44  Blue 
4     3 310         2.00       1.98  Red  
5     3 312         1.65       1.98  Blue 

CodePudding user response:

Here is an option to achieve this:

regionSum  %>% 
  left_join(areaSum, by = "Area", suffix = c("_region", "_area")) %>%
  mutate(alert = case_when(void_pts_region >= void_pts_area ~ "Red", TRUE ~ "Blue" ))

  Area Region void_pts_region void_pts_area alert
1    0     10           0.125         0.455  Blue
2    1    110           1.566         1.436   Red
3    1    111           1.350         1.436  Blue
4    3    310           2.004         1.981   Red
5    3    312           1.652         1.981  Blue

Data


regionSum = structure(list(Area = c(0, 1, 1, 3, 3), Region = c(10, 110, 111, 
310, 312), void_pts = c(0.125, 1.566, 1.35, 2.004, 1.652)), class = "data.frame", row.names = c(NA, 
-5L))

areaSum = structure(list(Area = c(0, 1, 2, 3), void_pts = c(0.455, 1.436, 
1.396, 1.981)), class = "data.frame", row.names = c(NA, -4L))
  • Related