I am a beginner in R, and I have a table that consists of two columns. The first column is the ID. The second column is the label (actually CUSIP in CRSP data). The label is the eight-digit number where the first six digits identify the ID, and the last two-digit may differ by certain attributes of the ID.
I would like to have a list of IDs that has two different labels where one of the two labels ends with a different two-digit number.
for example, if the table looks as follows,
ID | label |
---|---|
1 | 11223330 |
1 | 11223341 |
2 | 11224430 |
3 | 11225530 |
3 | 11225531 |
4 | 11226630 |
5 | 11227730 |
5 | 11227753 |
in this case, I hope to see
ID | label |
---|---|
1 | 11223330 |
1 | 11223341 |
3 | 11225530 |
3 | 11225531 |
5 | 11227730 |
5 | 11227753 |
Thank you very much in advance!
CodePudding user response:
This should do it:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dat <- structure(list(ID = c(1L, 1L, 2L, 3L, 3L, 4L, 5L, 5L), label = c(11223330L,
11223341L, 11224430L, 11225530L, 11225531L, 11226630L, 11227730L,
11227753L)), row.names = c(NA, 8L), class = "data.frame")
dat %>%
distinct() %>%
group_by(ID) %>%
filter(n() >= 2)
#> # A tibble: 6 × 2
#> # Groups: ID [3]
#> ID label
#> <int> <int>
#> 1 1 11223330
#> 2 1 11223341
#> 3 3 11225530
#> 4 3 11225531
#> 5 5 11227730
#> 6 5 11227753
Created on 2022-04-24 by the reprex package (v2.0.1)