Home > database >  How to identify one column with continuous number and same value of another column?
How to identify one column with continuous number and same value of another column?

Time:01-02

I have a DataFrame with two columns A and B.

I want to create a new column named C to identify the continuous A with the same B value.

Here's an example

import pandas as pd

df = pd.DataFrame({'A':[1,2,3,5,6,10,11,12,13,18], 'B':[1,1,2,2,3,3,3,3,4,4]})

I found a similar question, but that method only identifies the continuous A regardless of B.

df['C'] = df['A'].diff().ne(1).cumsum().sub(1) 

I have tried to groupby B and apply the function like this:

df['C'] = df.groupby('B').apply(lambda x: x['A'].diff().ne(1).cumsum().sub(1))

However, it doesn't work: TypeError: incompatible index of inserted column with frame index.

The expected output is

A   B   C
1   1   0
2   1   0
3   2   1
5   2   2
6   3   3
10  3   4
11  3   4
12  3   4
13  4   5
18  4   6

CodePudding user response:

Let's create a sequential counter using groupby, diff and cumsum then factorize to reencode the counter

df['C'] = df.groupby('B')['A'].diff().ne(1).cumsum().factorize()[0]

Result

    A  B  C
0   1  1  0
1   2  1  0
2   3  2  1
3   5  2  2
4   6  3  3
5  10  3  4
6  11  3  4
7  12  3  4
8  13  4  5
9  18  4  6

CodePudding user response:

Use DataFrameGroupBy.diff with compare not equal 1 and Series.cumsum, last subtract 1:

df['C'] = df.groupby('B')['A'].diff().ne(1).cumsum().sub(1)
print (df)
    A  B  C
0   1  1  0
1   2  1  0
2   3  2  1
3   5  2  2
4   6  3  3
5  10  3  4
6  11  3  4
7  12  3  4
8  13  4  5
9  18  4  6
  • Related