Can you please help me change my code from current output to expected output? I am using apply() function of dataframe. It would be great if it could also be done more efficiently using vector operation.
Current output:
col1 col2 serialno
0 100 0 4
1 100 100 0
2 100 100 0
3 200 100 4
4 200 200 0
5 300 200 4
6 300 300 0
Expected output:
col1 col2 serialno
0 100 0 4
1 100 100 4
2 100 100 4
3 200 100 5
4 200 200 5
5 300 200 6
6 300 300 6
My current code contains a static value (4). I need to increment it by one based on a condition (col1 != col2). In addition, I need to repeat the serial no for all rows that meet the condition (col1 == col2).
My code:
import pandas as pd
columns = ['col1']
data = ['100','100','100','200','200','300','300']
df = pd.DataFrame(data=data,columns=columns)
df['col2'] = df.col1.shift(1).fillna(0)
print(df)
start = 4
series = (df['col1']!=df['col2']).apply(lambda x: start if x==True else 0)
df['serialno'] = series
print(df)
CodePudding user response:
You can try this
import itertools
start = 4
counter = itertools.count(start) # to have incremental counter
df["serialno"] = [start if (x["col1"]==x["col2"] or (start:=next(counter))) else start for _, x in df.iterrows()]
This if condition will be executed in two manner: if col1 and col2 have same value then it will not go the next condition so start value will be same and if first condition is false then our counter will be incremented by 1.
CodePudding user response:
Here is how you can do it with apply function:
ID = 3
def check_value(A, B):
global ID
if A != B:
ID = 1
return ID
df['id'] = df.apply(lambda row: check_value(row['col1'], row['col2']), axis=1)
You just need to start from 3 since the first row will increment it.
print(df)
will give you this:
col1 col2 id
0 100 0 4
1 100 100 4
2 100 100 4
3 200 100 5
4 200 200 5
5 300 200 6
6 300 300 6