Home > Software engineering >  Filter Dataframe rows depending on if there has been a change to the previous row
Filter Dataframe rows depending on if there has been a change to the previous row

Time:09-15

I have the following problem to solve:

I have a pandas dataframe which looks more or less like the following:

Timestamp Value 1 Value 2
05:05:01 1 4
05:05:02 1 4
05:05:03 1 3
05:05:04 1 3
05:05:05 1 4

What i need to achieve is only keeping the rows in which a change occurs to the previous row in any of the "Value X" columns. If there is multiple rows after each other where the "Value X" columns values are the same, only the first is supposed to be kept. Dropping duplicates with the "Value X" columns as subset does not work, since if a same combination reoccurs it is deleted aswell, even though it may have been different to its previous row.

So in the example, only rows 1, 3 and 5 are supposed to be kept.

Thanks in advance!

CodePudding user response:

You could try creating some new columns that are shifted and then sub-setting:

a=pd.DataFrame({'value_1': [1,1,1,1,1], 'value_2': [4,4,3,3,4]})
a=a.assign(value_1_shift=a['value_1'].shift(), value_2_shift=a['value_2'].shift())
b=a[(a.value_1_shift!=a.value_1) | (a.value_2_shift!=a.value_2)]

CodePudding user response:

add the keep parameter

df.drop_duplicates(subset=['Timestamp', 'Value 2'] ,keep='first', ignore_index=True)
  • Related