I have a dataframe like:
col1 customer
1 a
3 a
1 b
2 b
3 b
5 b
I want the logic to be like this:
col1 customer col2
1 a 1
3 a 1
1 b 1
2 b 2
3 b 3
5 b 3
as you can see, if the customer has consistent values in col1, give it, if not, give the last consistent number which is 3
I tried using the df.shift() but I was stuck
Further Example:
col1
1
1
1
3
5
8
10
he should be given a value of 1 because that's the last consistent value for him!
CodePudding user response:
Update
If you have more than one month, you can use this version:
import numpy as np
inc_count = lambda x: np.where(x.diff(1) == 1, x, x.shift(fill_value=x.iloc[0]))
df['col2'] = df.groupby('customer')['col1'].transform(inc_count)
print(df)
# Output
col1 customer col2
0 1 a 1
1 3 a 1
2 1 b 1
3 2 b 2
4 3 b 3
5 5 b 3
Maybe you want to increment a counter if the next row value following the current one:
# Same as df['col1'].diff().eq(1).cumsum().add(1)
df['col2'] = df['col1'].eq(df['col1'].shift() 1).cumsum().add(1)
print(df)
# Output
col1 customer col2
0 1 a 1
1 3 a 1
2 1 b 1
3 2 b 2
4 3 b 3
5 5 b 3
Or for each customer:
inc_count = lambda x: x.eq(x.shift() 1).cumsum().add(1)
df['col2'] = df['col2'] = df.groupby('customer')['col1'].transform(inc_count)
print(df)
# Output
col1 customer col2
0 1 a 1
1 3 a 1
2 1 b 1
3 2 b 2
4 3 b 3
5 5 b 3