Home > Enterprise >  Subtract values associated with repeating variables in pandas
Subtract values associated with repeating variables in pandas

Time:03-01

I have a df like this:

df = pd.DataFrame([[1000, 'Yellow'], [2100, 'Blue'], [3400, 'Purple'], [3416,'Grey'], [3420,'Purple'], [1100,'Yellow']], columns=['Time', 'Marker'])
df

    Time    Marker
0   1000    Yellow
1   2100    Blue
2   3400    Purple
3   3416    Grey
4   3420    Purple
5   1100    Yellow

Goal: 1. Iterate through column 'Marker'. 2. For every cell with a consecutively repeating value (but not necessarily adjacent; for example, 'purple' repeats but is not right after the first occurrence), test whether they are less than 1,000 ms apart (by subtracting the value of the 'time' column for the two instances). If the difference is less than 1,000 - delete the row of the second occurrence.

I started with this, but am stuck on how to proceed:

for val in df['Marker']:
     df[~((df['Marker']==df['Marker'].shift())&(df['Marker'].isin(['180','181'])))]

In this example, we would end up with only the first stance of 'purple' because the second instance occurs too soon:

    Time    Marker
0   1000    Yellow
1   2100    Blue
2   3400    Purple
3   3416    Grey

I would love to hear suggestions.

CodePudding user response:

Assuming you want to drop differences strictly superior to 100, use groupby on Marker and compute the diff. If lower than (lt) 100, keep the row:

df[~df.groupby('Marker')['Time'].diff().lt(100)]

output:

   Time  Marker
0  1000  Yellow
1  2100    Blue
2  3400  Purple
3  3416    Grey
5  1100  Yellow
  • Related