I have a big df
like below (just show the first lines, the real one has more than 60000k rows):
Id Name Age Friends
0 Will 33 385
1 Jean 26 2
2 Hugh 55 221
3 Deanna 40 465
4 Quark 68 21
5 Weyoun 59 318
6 Gowron 37 220
7 Will 54 307
8 Jadzia 38 380
9 Hugh 27 181
10 Odo 53 191
11 Ben 57 372
........
I would like to store in another dataframe that every 100 values insert 12.
I know that with .loc
and .iloc
you can store 1 value each n
values (100 in the example below):
df1 = df.loc[::100]
I am trying not to iterate with a for
within the dataframe since the df
is so large, the process slows down a lot, is there any way with .loc
to achieve this complex row selection?
CodePudding user response:
You can actually just trim off all the hundreds off the index values, so e.g. 200-300
becomes 0-100
, 123000-124000
becomes 0-100
, etc., and then filter for values less than 12:
filtered = df[df.index % 100 < 12]