Home > Mobile >  Iterating over rows in a dataframe in Pandas: is there a difference between using df.index and df.it
Iterating over rows in a dataframe in Pandas: is there a difference between using df.index and df.it

Time:12-05

When iterating through rows in a dataframe in Pandas, is there a difference in performance between using:

for index in df.index:
    ....

And:

for index, row in df.iterrows():
    ....

? Which one should be preferred?

CodePudding user response:

When we doing for loop , look up index get the data require additional loc

for index in df.index:
    value = df.loc['index','col']

When we do df.iterrows

for index, row in df.iterrows():
    value = row['col']

Since you already with pandas , both of them are not recommended. Unless you need certain function and cannot be vectorized.

However, IMO, I preferred df.index

CodePudding user response:

Pandas is significantly faster for column-wise operations so consider transposing your dataset and carrying out whatever operation you want. If you absolutely need to iterate through rows and want to keep it simple, you can use

for row in df.itertuples():
    print(row.column_1)

df.itertuples is significantly faster than df.iterrows() and iterating over the indices. However, there are faster ways to perform row-wise operations. Check out this answer for an overview.

  • Related