Home > Back-end >  How to do in Python a complex selection of rows in Pandas dataframe
How to do in Python a complex selection of rows in Pandas dataframe

Time:03-14

I have a big df like below (just show the first lines, the real one has more than 60000k rows):

Id  Name    Age Friends
0   Will    33  385
1   Jean    26  2
2   Hugh    55  221
3   Deanna  40  465
4   Quark   68  21
5   Weyoun  59  318
6   Gowron  37  220
7   Will    54  307
8   Jadzia  38  380
9   Hugh    27  181
10  Odo     53  191
11  Ben     57  372
........

I would like to store in another dataframe that every 100 values insert 12. I know that with .loc and .iloc you can store 1 value each n values (100 in the example below):

df1 = df.loc[::100]

I am trying not to iterate with a for within the dataframe since the df is so large, the process slows down a lot, is there any way with .loc to achieve this complex row selection?

CodePudding user response:

You can actually just trim off all the hundreds off the index values, so e.g. 200-300 becomes 0-100, 123000-124000 becomes 0-100, etc., and then filter for values less than 12:

filtered = df[df.index % 100 < 12]
  • Related