I want to delete particular rows, and leave the ones that have the first occurrence of the machine ID. So I want row 0, 5, 10 and 15
0,M_0003
1,M_0003
2,M_0003
3,M_0003
4,M_0003
5,M_0005
6,M_0005
7,M_0005
8,M_0005
9,M_0005
10,M_0007
11,M_0007
12,M_0007
13,M_0007
14,M_0007
15,M_0003
16,M_0003
17,M_0003
18,M_0003
19,M_0003
That's how the result should look like:
0,M_0003
1,M_0005
2,M_0007
3,M_0003
Is there a function in Python that will help? The only thing I found is this, but it does not work.
y_data = y_data.groupby(np.arange(len(y_data)) // 5)
CodePudding user response:
Use GroupBy.first
:
y_data = y_data.groupby(np.arange(len(y_data)) // 5).first()
CodePudding user response:
You can use boolean indexing:
y_data = y_data[np.arange(len(y_data))%5==0]
intermediates:
np.arange(len(y_data))%5
# array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4])
np.arange(len(y_data))%5==0
# array([ True, False, False, False, False, True, False, False, False,
# False, True, False, False, False, False, True, False, False,
# False, False])