I have a problem which is similar to the one in this question:
Add a flag indicating that a column has changed for each group in pandas
with the exception that some of my groups only contain one single row. Here is an extension of the problem formulated in the question above:
import pandas as pd
df = pd.DataFrame({
"id": [1, 1, 1, 2, 2, 2, 3, 3, 3,4,5,5,5],
"company": ["A", "A", "B", "C", "C", "C", "D", "E", "F","G","H","I","J"]},
columns=["id", "company"]
)
Now, the solution offered by jezrael is really nice:
df['flag'] = df.groupby("id")["company"].shift().bfill().ne(df['company']).astype(int)
which produces
id company flag
0 1 A 0
1 1 A 0
2 1 B 1
3 2 C 0
4 2 C 0
5 2 C 0
6 3 D 0
7 3 E 1
8 3 F 1
9 4 G 1
10 5 H 0
11 5 I 1
12 5 J 1
but it has a drawback for my purposes. Indeed, for id = 4
, I wish the flag to be 0, not 1 since there is no change. I've tried to change shift
but haven't had any luck.
Any ideas would be greatly appreciated.
CodePudding user response:
Change the bfill
method to fillna
:
df['flag'] = df.groupby('id')['company'].shift().fillna(df['company']).ne(df['company']).astype(int)
print(df)
# Output:
id company flag
0 1 A 0
1 1 A 0
2 1 B 1
3 2 C 0
4 2 C 0
5 2 C 0
6 3 D 0
7 3 E 1
8 3 F 1
9 4 G 0
10 5 H 0
11 5 I 1
12 5 J 1
CodePudding user response:
You can change that to:
df['flag'] = df.groupby("id")["company"].apply(lambda x: x.shift().fillna(x).ne(x).astype(int))
Output:
id company flag
0 1 A 0
1 1 A 0
2 1 B 1
3 2 C 0
4 2 C 0
5 2 C 0
6 3 D 0
7 3 E 1
8 3 F 1
9 4 G 0
10 5 H 0
11 5 I 1
12 5 J 1