Home > database >  How to efficiently extract segments from pandas dataframe given list of start stop index
How to efficiently extract segments from pandas dataframe given list of start stop index

Time:07-05

Given a dataframe df

    A   B   C
0   0   0   0
1   0   0   0
2   0   0   0
3   0   0   0
4   0   0   0
... ... ... ...
410 0   0   0
411 0   0   0
412 0   0   0
413 0   0   0
414 0   0   0

and a list contains pairs of start and stop index index_list. I want to extract subsets (segments) from df using these indexes.

[[0, 41],
 [42, 57],
 [62, 70],
 [71, 78],
 [84, 90],
 [91, 98],
 [105, 113],
 [114, 352],
 [353, 407]]

Apart from a for loop like this:

for start_index, stop_index in index_list:
    segments = df.iloc[start_index:stop_index].copy()

are there any better/faster way to extract these segments?

CodePudding user response:

You could transform your list of start/stop indexes into a mapping of the dataframe row id to segment id. You could then add a column that includes the segment number instead of extracting each into a separate dataframe.

import pandas

df = pandas.DataFrame(data={'A':[0]*415, 'B':[0]*415, 'C':[0]*415})
segment_ids = [[0, 41],
 [42, 57],
 [62, 70],
 [71, 78],
 [84, 90],
 [91, 98],
 [105, 113],
 [114, 352],
 [353, 407]]
segments = pandas.Series(
    {
        index: number
        for number, segment in enumerate([range(*segment) for segment in segment_ids])
        for index in segment
    },
    name='segment'
)
s = pandas.Series(ids, name='segment')
result = df.loc[df.index.isin(ids.keys())].join(s)

Output:

     A  B  C   segment
0    0  0  0         0
1    0  0  0         0
2    0  0  0         0
3    0  0  0         0
4    0  0  0         0
..  .. .. ..       ...
402  0  0  0         8
403  0  0  0         8
404  0  0  0         8
405  0  0  0         8
406  0  0  0         8

[384 rows x 4 columns]
  • Related