Add a flag indicating that a column has changed for each group, except if group has only one row-CodePudding

I have a problem which is similar to the one in this question:

Add a flag indicating that a column has changed for each group in pandas

with the exception that some of my groups only contain one single row. Here is an extension of the problem formulated in the question above:

import pandas as pd

df = pd.DataFrame({
    "id": [1, 1, 1, 2, 2, 2, 3, 3, 3,4,5,5,5],
    "company": ["A", "A", "B", "C", "C", "C", "D", "E", "F","G","H","I","J"]},
    columns=["id", "company"]
)

Now, the solution offered by jezrael is really nice:

df['flag'] = df.groupby("id")["company"].shift().bfill().ne(df['company']).astype(int)

which produces

id company  flag
0    1       A     0
1    1       A     0
2    1       B     1
3    2       C     0
4    2       C     0
5    2       C     0
6    3       D     0
7    3       E     1
8    3       F     1
9    4       G     1
10   5       H     0
11   5       I     1
12   5       J     1

but it has a drawback for my purposes. Indeed, for id = 4, I wish the flag to be 0, not 1 since there is no change. I've tried to change shift but haven't had any luck.

Any ideas would be greatly appreciated.

CodePudding user response：

Change the bfill method to fillna:

df['flag'] = df.groupby('id')['company'].shift().fillna(df['company']).ne(df['company']).astype(int)
print(df)

# Output:
    id company  flag
0    1       A     0
1    1       A     0
2    1       B     1
3    2       C     0
4    2       C     0
5    2       C     0
6    3       D     0
7    3       E     1
8    3       F     1
9    4       G     0
10   5       H     0
11   5       I     1
12   5       J     1

CodePudding user response：

You can change that to:

df['flag'] = df.groupby("id")["company"].apply(lambda x: x.shift().fillna(x).ne(x).astype(int))

Output:

    id company  flag
0    1       A     0
1    1       A     0
2    1       B     1
3    2       C     0
4    2       C     0
5    2       C     0
6    3       D     0
7    3       E     1
8    3       F     1
9    4       G     0
10   5       H     0
11   5       I     1
12   5       J     1