Filter rows with consecutive numbers-CodePudding

I have some data.

I want to remain with rows when an ID has 4 consecutive numbers. For example, if ID 1 has rows 100, 101, 102, 103, 105, the "105" should be excluded.

Data:

    ID    X
0    1  100
1    1  101
2    1  102
3    1  103
4    1  105
5    2  100
6    2  102
7    2  103
8    2  104
9    3  100
10   3  101
11   3  102
12   3  103
13   3  106
14   3  107
15   3  108
16   3  109
17   3  110
18   3  112
19   4  100
20   4  102
21   4  103
22   4  104
23   4  105
24   4  107

Expected results:

    ID    X
0    1  100
1    1  101
2    1  102
3    1  103
4    3  100
5    3  101
6    3  102
7    3  103
8    3  106
9    3  107
10   3  108
11   3  109
12   3  110
13   4  102
14   4  103
15   4  104
16   4  105

CodePudding user response：

You can identify the consecutive values, then filter the groups by size with groupby.filter:

# group consecutive X
g = df['X'].diff().gt(1).cumsum() # no need to group here, we'll group later

# filter groups
out = df.groupby(['ID', g]).filter(lambda g: len(g)>=4)#.reset_index(drop=True)

output:

    ID    X
0    1  100
1    1  101
2    1  102
3    1  103
9    3  100
10   3  101
11   3  102
12   3  103
13   3  106
14   3  107
15   3  108
16   3  109
17   3  110
20   4  102
21   4  103
22   4  104
23   4  105

CodePudding user response：

Another method:

out = df.groupby(df.groupby('ID')['X'].diff().ne(1).cumsum()).filter(lambda x: len(x) >= 4)
print(out)

# Output
    ID    X
0    1  100
1    1  101
2    1  102
3    1  103
9    3  100
10   3  101
11   3  102
12   3  103
13   3  106
14   3  107
15   3  108
16   3  109
17   3  110
20   4  102
21   4  103
22   4  104
23   4  105