Home > Software engineering >  Iterate over two dataframes to create a criterion column in R
Iterate over two dataframes to create a criterion column in R

Time:02-23

I am looking to form a final data frame by iterating over two data frames.

The first df looks like this


employee_name <- c("rob", "peter")
employee_attribute1 <- c("10", "5")
employee_attribute2 <- c("5", "5")
employee_df <- data.frame(employee_name, employee_attribute1,employee_attribute2)

the second one looks like this. Employees can and do belong to multiple groups

employee_group <- c("1", "2","3" )
employee_attributes <- c("employee_attribute2", "employee_attribute2","employee_attribute1" )
group_att_mapping_df <- data.frame(employee_group,employee_attributes)

I would like to create a data frame like this. The rating condition is if employee attribute score in employee_df for the employee_attributes ( from group_att_mapping_df) = 10 then the rating is 1 else it is 2.

 employee_group| employee_name | employee_rating
     1         | Rob            |  2 
     1         | Peter          |  2
     2         | Rob            |  2
     2         |Peter           |  2
     3         | Rob            |  1
     3         | Peter         |  2

Need help writing valid R code to iterate over the two data frames. I will attach a screenshot of the loop I have written so far. Many thanks in advance.

CodePudding user response:

You don't need to iterate over the dataframes. You can use pivoting and joining to get the expected output:

employee_df %>% 
  tidyr::pivot_longer(
    contains("attribute"), 
    names_to = "employee_attributes", 
    values_to = "attribute_score"
  ) %>% 
  left_join(group_att_mapping_df) %>% 
  mutate(employee_rating = ifelse(attribute_score == 10, 1, 2)) %>% 
  select(employee_group, employee_name, employee_rating) %>% 
  arrange(employee_group)

#> Joining, by = "employee_attributes"
#> # A tibble: 6 x 3
#>   employee_group employee_name employee_rating
#>   <chr>          <chr>                   <dbl>
#> 1 1              rob                         2
#> 2 1              peter                       2
#> 3 2              rob                         2
#> 4 2              peter                       2
#> 5 3              rob                         1
#> 6 3              peter                       2

Explaination:

  • First we pivot employee_df to get all the attributes variables as one column.
  • Next we left join with group_att_mapping_df to get the employee_group column.
  • Then we create the employee_rating column based on your logic: 1 if the attribute score is 10, and 2 otherwise.
  • Finally I just make the data look the same as your expected output, by selecting the columns you want and sorting them by employee_group.

Data without the ...

employee_name <- c("rob", "peter")
employee_attribute1 <- c("10", "5")
employee_attribute2 <- c("5", "5")
employee_df <- data.frame(employee_name, employee_attribute1,employee_attribute2)

employee_group <- c("1", "2","3")
employee_attributes <- c("employee_attribute2", "employee_attribute2","employee_attribute1")
group_att_mapping_df <- data.frame(employee_group,employee_attributes)

  • Related