Getting a list of companies with dual share classes using cusip-CodePudding

I am a beginner in R, and I have a table that consists of two columns. The first column is the ID. The second column is the label (actually CUSIP in CRSP data). The label is the eight-digit number where the first six digits identify the ID, and the last two-digit may differ by certain attributes of the ID.

I would like to have a list of IDs that has two different labels where one of the two labels ends with a different two-digit number.

for example, if the table looks as follows,

ID	label
1	11223330
1	11223341
2	11224430
3	11225530
3	11225531
4	11226630
5	11227730
5	11227753

in this case, I hope to see

ID	label
1	11223330
1	11223341
3	11225530
3	11225531
5	11227730
5	11227753

Thank you very much in advance!

CodePudding user response：

This should do it:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dat <- structure(list(ID = c(1L, 1L, 2L, 3L, 3L, 4L, 5L, 5L), label = c(11223330L, 
11223341L, 11224430L, 11225530L, 11225531L, 11226630L, 11227730L, 
11227753L)), row.names = c(NA, 8L), class = "data.frame")

dat %>% 
  distinct() %>% 
  group_by(ID) %>%  
  filter(n() >= 2)
#> # A tibble: 6 × 2
#> # Groups:   ID [3]
#>      ID    label
#>   <int>    <int>
#> 1     1 11223330
#> 2     1 11223341
#> 3     3 11225530
#> 4     3 11225531
#> 5     5 11227730
#> 6     5 11227753

^{Created on 2022-04-24 by the reprex package (v2.0.1)}