Home > Software engineering >  How to iterate through dataset comparing rows next to one another and output the relationship for re
How to iterate through dataset comparing rows next to one another and output the relationship for re

Time:03-31

I have a data set which lists a range of teams (e.g. "A", "B") and the id of the users in each team.

Example data set below (first 2 columns).Actual data runs in to the 500k plus, so I have been trying to implement a loop in R, without success.

example dataset

  1. I want to be able assess each row against the next,

    i.e. if row 1 = team A & row 2 = team A, "pass",

    if row 1 = team A & row 2 = team B, "fail"

    if row 1 = team B & row 2 = team B, "pass"

    if row 1 = team B & row 2 = team A, "fail",

    with each iteration comparing the next set of rows (e.g. 1 compared to 2, then 2 to 3, and so on).

    The output for each being in a new column, detailing the outcome i.e pass/fail.

  2. If the outcome is "pass", I would like 2 new columns the first having the user id in the first row and the second the user in the row to which it was compared - like an edge list -e.g. in the desired output format below, row 1 and 2 are compared, each having team, and "Pass", therefore user 1 is inserted into user_id_out, and user 2 into the user_id_in column.

If the outcome was fail, the content for user_id_out (and in), will be NA.

Desired output

Any help on how to do this will be greatly appreciated. :)

CodePudding user response:

Welcome to stackoverflow. Remember to always include a reproducible example of your data, do not share screenshots.

You can do it using dplyr functions.

library(dplyr)

# The data
data <- data.frame(
  team = c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'A', 'B', 'A', 'A', 'B', 'B'),
  user_id = c(1, 2, 3, 4, 5, 6, 7, 1, 5, 3, 4, 6, 7)
)

data %>%
  mutate(
    outcome = case_when(
      team == lead(team) ~ 'pass',
      TRUE ~ 'fail'
    ),
    user_id_out = ifelse(outcome == 'pass', user_id, NA),
    user_id_in = ifelse(outcome == 'pass', lead(user_id), NA)
  )
#>    team user_id outcome user_id_out user_id_in
#> 1     A       1    pass           1          2
#> 2     A       2    pass           2          3
#> 3     A       3    pass           3          4
#> 4     A       4    fail          NA         NA
#> 5     B       5    pass           5          6
#> 6     B       6    pass           6          7
#> 7     B       7    fail          NA         NA
#> 8     A       1    fail          NA         NA
#> 9     B       5    fail          NA         NA
#> 10    A       3    pass           3          4
#> 11    A       4    fail          NA         NA
#> 12    B       6    pass           6          7
#> 13    B       7    fail          NA         NA

Created on 2022-03-29 by the reprex package (v2.0.1)

  • Related