I have a dataset like the below:
df <-
read.table(textConnection("CustomerID Code1 Code2
A 1 1
A 0 1
A 1 1
B 1 0
B 0 1
C 1 1
C 1 1
D 1 0
D 0 1
D 1 1 "), header=TRUE)
What I am looking to do is loop through each CustomerID and, if the first row of the group has Code1 = 1 and Code2 = 1, change the Code2 value to 0 for that row. So in the above case, below would be the correct final dataset - the change only applies to the very first row in each customer group, if it has a 1 for both Code1 and Code2.
CustomerID | Code1 | Code2 |
---|---|---|
A | 1 | 0 |
A | 0 | 1 |
A | 1 | 1 |
B | 1 | 0 |
B | 0 | 1 |
C | 1 | 0 |
C | 1 | 1 |
D | 1 | 0 |
D | 0 | 1 |
D | 1 | 1 |
Any help very much appreciated, thank you.
CodePudding user response:
Here's a solution using dplyr::case_when()
. Group by CustomerID, create an index within each group using row_number()
, and apply the case_when logic to the first row.
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.2.1
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <-
read.table(textConnection("CustomerID Code1 Code2
A 1 1
A 0 1
A 1 1
B 1 0
B 0 1
C 1 1
C 1 1
D 1 0
D 0 1
D 1 1 "),
header = TRUE)
df |>
group_by(CustomerID) |>
mutate(
row = row_number(),
Code2 = case_when(row == 1 &
Code1 == 1 & Code2 == 1 ~ as.integer(0),
TRUE ~ Code2)
) |>
ungroup() |>
select(-row)
#> # A tibble: 10 × 3
#> CustomerID Code1 Code2
#> <chr> <int> <int>
#> 1 A 1 0
#> 2 A 0 1
#> 3 A 1 1
#> 4 B 1 0
#> 5 B 0 1
#> 6 C 1 0
#> 7 C 1 1
#> 8 D 1 0
#> 9 D 0 1
#> 10 D 1 1
Created on 2022-10-26 with reprex v2.0.2