I have a dataset with supplier and its invoices and I need to determinate which of the invoices are consecutives marking it with a 1 or a 0.
For example:
df1 = pd.DataFrame()
df1['Supplier'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC']
df1['Invoice'] = ['101','102','105','109','110','114','120','121','122','130','131']
Supplier Invoice
0 ABC 101
1 ABC 102
2 ABC 105
3 ABC 109
4 ABC 110
5 ABC 114
6 ABC 120
7 ABC 121
8 ABC 122
9 ABC 130
10 ABC 131
And what I want is a third column like this:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
Thanks in advance!
CodePudding user response:
Make sure your 'Invoice' column is of a numeric type first. Then you can use:
df1['Consecutive'] = (df1.Invoice - df1.Invoice.shift(1) == 1).astype(int)
prints:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
CodePudding user response:
using npwhere and diff
# take difference b/w consecutive rows using diff
# update to 0 when difference is not 1
df1['consecutive']=np.where(df1['Invoice'].astype(int).diff() !=1, 0, 1)
df1
Supplier Invoice consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
CodePudding user response:
df1['Consecutive']=df1.Invoice.astype(int).diff().eq(1).astype(int)
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1