I have csv file with data , but there rows i dont need. So task is remove rows from table. For example:
0 A
1 B
2 C
3 D
4 E * to delete
5 F *
6 G *
7 H *
8 I
9 J
10 k
11 L
12 M *
13 N *
14 O *
15 P *
So i want remove last 4 rows for each 8 rows in table . In table 3089 rows
I try to slice table , but no good result
CodePudding user response:
Use numpy to craft a mask:
import numpy as np
mask = (np.arange(len(df))%8//4) == 0
out = df[mask]
Other option:
mask = np.arange(len(df))%8 < 4
out = df[mask]
output:
col
0 A
1 B
2 C
3 D
8 I
9 J
10 k
11 L
How it works
We first get the modulo 8 to get the position in each group of 8, then floor division by 4 and comparison to 0 to keep only the first 4 per group:
col arange %8 //4 mask
0 A 0 0 0 True
1 B 1 1 0 True
2 C 2 2 0 True
3 D 3 3 0 True
4 E 4 4 1 False
5 F 5 5 1 False
6 G 6 6 1 False
7 H 7 7 1 False
8 I 8 0 0 True
9 J 9 1 0 True
10 k 10 2 0 True
11 L 11 3 0 True
12 M 12 4 1 False
13 N 13 5 1 False
14 O 14 6 1 False
15 P 15 7 1 False
CodePudding user response:
Groupby every eight rows, generate a new column say id and filter out any ids greater than 4.
df=df.assign(id=df.groupby(df.index//8).cumcount()).query('id<=4').drop(columns='id')
item
0 A
1 B
2 C
3 D
4 E
8 I
9 J
10 k
11 L
12 M