Home > database >  Create an incremental serial no for filtered rows in pandas dataframe
Create an incremental serial no for filtered rows in pandas dataframe

Time:10-30

Can you please help me change my code from current output to expected output? I am using apply() function of dataframe. It would be great if it could also be done more efficiently using vector operation.

Current output:

  col1 col2  serialno
0  100    0         4
1  100  100         0
2  100  100         0
3  200  100         4
4  200  200         0
5  300  200         4
6  300  300         0

Expected output:

  col1 col2  serialno
0  100    0         4
1  100  100         4
2  100  100         4
3  200  100         5
4  200  200         5
5  300  200         6
6  300  300         6

My current code contains a static value (4). I need to increment it by one based on a condition (col1 != col2). In addition, I need to repeat the serial no for all rows that meet the condition (col1 == col2).

My code:

import pandas as pd
columns = ['col1']
data =      ['100','100','100','200','200','300','300']
df = pd.DataFrame(data=data,columns=columns)
df['col2'] = df.col1.shift(1).fillna(0)
print(df)

start = 4
series = (df['col1']!=df['col2']).apply(lambda x: start if x==True else 0)
df['serialno'] = series
print(df)

CodePudding user response:

You can try this

import itertools
start = 4
counter = itertools.count(start) # to have incremental counter

df["serialno"] = [start if (x["col1"]==x["col2"] or (start:=next(counter))) else start for _, x in df.iterrows()]

This if condition will be executed in two manner: if col1 and col2 have same value then it will not go the next condition so start value will be same and if first condition is false then our counter will be incremented by 1.

CodePudding user response:

Here is how you can do it with apply function:

ID = 3
def check_value(A, B):
    global ID
    if A != B:
        ID  = 1
    return ID

df['id'] = df.apply(lambda row: check_value(row['col1'], row['col2']), axis=1)

You just need to start from 3 since the first row will increment it. print(df) will give you this:

   col1  col2  id
0   100     0   4
1   100   100   4
2   100   100   4
3   200   100   5
4   200   200   5
5   300   200   6
6   300   300   6
  • Related