Iterating through a Dataframe Column-CodePudding

I have a dataframe column that I am trying to iterate through using a for loop:

So with my loop, I am using the python index to get the max value for my iteration but code runs a bit slow and prints a Key error when it reaches the last index. I am trying to loop through the dataframe column final_df["MAN"] and compare the current index with the previous index during the loop, and if it is equal put 0 in the new column created final_df["MAN_ID"] and 1 if it is not equal to. An optimization to my code will be appreciated.

%%time

#Sort by MAN column
final_df.sort_values(by=['MAN'])


#Loop through the column
final_df["MAN_ID"] = ""

try:
    for i in final_df.index:
       if final_df["MAN"][i 1] == final_df["MAN"][i]:
           final_df["MAN_ID"][i] = 0

       elif final_df["MAN"][i 1] != final_df["MAN"][i]:
           final_df["MAN_ID"][i] = 1

except: 
       print("No value to loop")

CodePudding user response：

Do not iterate, this is slow.

Use vector operations: shift to shift the index, eq to perform the comparison, numpy.where to assign 0/1 depending on the equality.

import pandas as pd
import numpy as np

# dummy example
df = pd.DataFrame({'MAN': list('AABBABBBAA')})

df['MAN_ID'] = np.where(df['MAN'].eq(df['MAN'].shift(-1)), 0, 1)

output:

  MAN  MAN_ID
0   A       0
1   A       1
2   B       0
3   B       1
4   A       1
5   B       0
6   B       0
7   B       1
8   A       0
9   A       1

Alternatively, you can use:

(df['MAN'].ne(df['MAN'].shift(-1))).astype(int)

This outputs True if the values are different and False is they are identical (using the ne operator), then by converting to int, True becomes 1 and False 0.