Trying to get the i 1 index on pandas dataframe failling-CodePudding

I am trying to loop over a dataframe in order to compare the i and i 1 index as follows :

d = {'col1': [1, 2,0,55,12,1, 3,1,56,13], 'col2': [3,4,44,34,46,2,3,43,35,47], 'col3': ['A','A','A','B','B','A','B','B','B','B'] } 
df = pd.DataFrame(data=d)
df

for index, row in df.iterrows():
    if df.at[index,"col3"] != df.at[index 1,"col3"]:
        print('True')
    else:
        print("false")

I get this error :

false
false
True
false
True
True
false
false
false

KeyError Traceback (most recent call last) in () 3 4 for index, row in df.iterrows(): ----> 5 if df.at[index,"col3"] != df.at[index 1,"col3"]: 6 print('True') 7 else:

in getitem(self, key) 2140 2141 key = self._convert_key(key) -> 2142 return self.obj._get_value(*key, takeable=self._takeable) 2143 2144 def setitem(self, key, value):

   2538         try:
-> 2539             return engine.get_value(series._values, index)
   2540         except (TypeError, ValueError):
   2541 

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 10

CodePudding user response：

Your code will always fail in the last row, because you are trying to get the line after the end.

Generally, when doing this kind of iteration where two lists of different sizes are used, the zip function is the best solution:

for this_row, next_row in zip(df["col3"], df["col3"][1:]):
    if this_row != next_row:
        print('True')
    else:
        print("false")

Note that this code does not throw an exception even if you data frame has only one element.

If prefer to use index for iterating, an alternative option is:

for this_index, next_index in zip(df.index, df.index[1:]):
    if df.at[this_index,"col3"] != df.at[next_index,"col3"]:
        print('True')
    else:
        print("false")

CodePudding user response：

below is how I will do it. it might not be the optimal solution, but it should help you solve the issue.

Regarding why pandas threw exceptions, please see the annotation below.

The reason I put it as a function is you can reuse the same function for different dataframes / tasks later on.

In addition, my personal habit is when iterating dataframe, and value is not needed, I will avoid using the iter_row method (this method can be computationally expensive especially when dealing with large data. But it is just based on my personal experiences).

I would love to see other brilliant solutions from other people!

def identify_same_or_not(data=None,col_index=None):
    ### 1: holder is the final result from comparation
    holder = []
    ### 2: Since we are only interested in row index, iter_row might not needed
    # Since we trying to compare x with x   1, we need set the index loop as range(len(length_of_data) - 1)
    # otherwise, in the final iteration (based on the example you provided), pd will try to compare row 9 with row 10,
    # However, Row 10 does not exist in df; therefore, pd will throw exception
    for row_index in range(len(df)-1):
    # Same logic as you provided
      if data.iloc[row_index,col_index] != df.iloc[row_index   1,col_index]:
        holder.append(True)
      else: 
        holder.append(False)
    return holder