Given a dataframe df
A B C
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
... ... ... ...
410 0 0 0
411 0 0 0
412 0 0 0
413 0 0 0
414 0 0 0
and a list contains pairs of start and stop index index_list. I want to extract subsets (segments) from df using these indexes.
[[0, 41],
[42, 57],
[62, 70],
[71, 78],
[84, 90],
[91, 98],
[105, 113],
[114, 352],
[353, 407]]
Apart from a for loop like this:
for start_index, stop_index in index_list:
segments = df.iloc[start_index:stop_index].copy()
are there any better/faster way to extract these segments?
CodePudding user response:
You could transform your list of start/stop indexes into a mapping of the dataframe row id to segment id. You could then add a column that includes the segment number instead of extracting each into a separate dataframe.
import pandas
df = pandas.DataFrame(data={'A':[0]*415, 'B':[0]*415, 'C':[0]*415})
segment_ids = [[0, 41],
[42, 57],
[62, 70],
[71, 78],
[84, 90],
[91, 98],
[105, 113],
[114, 352],
[353, 407]]
segments = pandas.Series(
{
index: number
for number, segment in enumerate([range(*segment) for segment in segment_ids])
for index in segment
},
name='segment'
)
s = pandas.Series(ids, name='segment')
result = df.loc[df.index.isin(ids.keys())].join(s)
Output:
A B C segment
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
.. .. .. .. ...
402 0 0 0 8
403 0 0 0 8
404 0 0 0 8
405 0 0 0 8
406 0 0 0 8
[384 rows x 4 columns]