I am trying to create similar_player_selected column. I have first 4 columns. For row 1, player_id =1 and the most similar player to player 1 is player 3. But player 3 (row 3) isn't selected for campaign 1(player_selected=0) so I assign a value of 0 to similar_player_selected for row 1. For row 2, player_id=2 and the most similar player to player 2 is player 4. Player 4 is selected for the campaign 1(row 4) so I assign a value of 1 to similar_player_selected for row 2. Please note there are more than 1000 campaigns overall.
campaign_id | player_id | most_similar_player | player_selected | similar_player_selected |
---|---|---|---|---|
1 | 1 | 3 | 1 | 0 |
1 | 2 | 4 | 0 | 1 |
1 | 3 | 4 | 0 | ? |
1 | 4 | 1 | 1 | ? |
2 | 1 | 3 | 1 | ? |
2 | 2 | 4 | 1 | ? |
2 | 3 | 4 | 0 | ? |
2 | 4 | 1 | 0 | ? |
CodePudding user response:
Using match
we can subset player selected at matched locations
library(dplyr)
df |>
group_by(campaign_id) |>
mutate(
similar_player_selected = player_selected[match(most_similar_player, player_id)]
) |>
ungroup()
Faster base R alternative
df$similar_player_selected <- lapply(split(df, df$campaign_id), \(x)
with(x, player_selected[match(most_similar_player, player_id)])) |>
unlist()
campaign_id player_id most_similar_player player_selected similar_player_selected
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 3 1 0
2 1 2 4 0 1
3 1 3 4 0 1
4 1 4 1 1 1
5 2 1 3 1 0
6 2 2 4 1 0
7 2 3 4 0 0
8 2 4 1 0 1