Say I have this dataframe:
df = {'ID' : [1, 1, 1, 1, 1, 1, 1, 2, 2],
'x':[76.551, 79.529, 78.336,77, 76.02, 79.23, 77.733, 79.249, 76.077],
'y': [151.933, 152.945, 153.970, 119.369, 120.615, 118.935, 119.115, 152.004, 153.027],
'position': ['start', 'end', 'start', 'NA', 'NA','NA','end', 'start', 'end']}
df = pd.DataFrame(df)
df
ID x y position
0 1 76.551 151.933 start
1 1 79.529 152.945 end
2 1 78.336 153.970 start
3 1 77.000 119.369 NA
4 1 76.020 120.615 NA
5 1 79.230 118.935 NA
6 1 77.733 119.115 end
7 2 79.249 152.004 start
8 2 76.077 153.027 end
I want to delete all the rows that are associated with an end point between certain values. I can specify the end points that I want to remove with:
df[(df['position'] == 'end') & (df['x'] > 75) & (df['x'] < 78)]
but how do I remove all the rows associated with that condition?
Output would look like:
ID x y position
0 1 76.551 151.933 start
1 1 79.529 152.945 end
EDIT: the context is that these are trajectories from different animals (with particular ID), if the animal's end coordinate lies between particular x-axis values, i want to remove that animal's whole trajectory from the model.
CodePudding user response:
Try this, using DataFrame.drop
:
rows_to_remove = df[(df['position'] == 'end') & (df['x'] > 75) & (df['x'] < 78)].index.values
df = df.drop(rows_to_remove)
CodePudding user response:
You can use a boolean mask:
m = (df['position'] == 'end') & (df['x'] > 75) & (df['x'] < 78)
out = df[~m.groupby(df['position'].eq('start').cumsum()).transform('max')]
print(out)
# Output
ID x y position
0 1 76.551 151.933 start
1 1 79.529 152.945 end
I already used in your previous question df['position'].eq('start').cumsum()
to create virtual groups to identify the different trajectories.