I have a df like this:
df = pd.DataFrame([[1000, 'Yellow'], [2100, 'Blue'], [3400, 'Purple'], [3416,'Grey'], [3420,'Purple'], [1100,'Yellow']], columns=['Time', 'Marker'])
df
Time Marker
0 1000 Yellow
1 2100 Blue
2 3400 Purple
3 3416 Grey
4 3420 Purple
5 1100 Yellow
Goal: 1. Iterate through column 'Marker'. 2. For every cell with a consecutively repeating value (but not necessarily adjacent; for example, 'purple' repeats but is not right after the first occurrence), test whether they are less than 1,000 ms apart (by subtracting the value of the 'time' column for the two instances). If the difference is less than 1,000 - delete the row of the second occurrence.
I started with this, but am stuck on how to proceed:
for val in df['Marker']:
df[~((df['Marker']==df['Marker'].shift())&(df['Marker'].isin(['180','181'])))]
In this example, we would end up with only the first stance of 'purple' because the second instance occurs too soon:
Time Marker
0 1000 Yellow
1 2100 Blue
2 3400 Purple
3 3416 Grey
I would love to hear suggestions.
CodePudding user response:
Assuming you want to drop differences strictly superior to 100, use groupby
on Marker and compute the diff
. If lower than (lt
) 100, keep the row:
df[~df.groupby('Marker')['Time'].diff().lt(100)]
output:
Time Marker
0 1000 Yellow
1 2100 Blue
2 3400 Purple
3 3416 Grey
5 1100 Yellow