pandas select value in a row and then the next 3 rows after it-CodePudding

I'm a Data Science beginner and have the following task.

I have a huge list of data and need to pick the rows starting with scope_list but also the next 4 following rows of the filtered data. I have the row scope_list 1 to x times in the list, see below. To select the first row is no problem for me, but not the next 3 rows.

df_new = df.loc[df['parts'] == 'Scope_list']
df_new

which get all ids and rows where the value in column Parts is Scope_list

         parts
0   Scope_list
10  Scope_list
18  Scope_list

but I need not only the first row "Scope_list" also the next 3 rows like

          Parts
0    Scope_list
1   Light_front
2          Box1
3        Cable1
4    Scope_list
5   Light_front
6        Cable1
7     Connector
8    Scope_list
9    Light_left
10         Box2
11       Cable3

so thats a part of my df:

import pandas as pd

df = pd.DataFrame(['Scope_list', 'Light_front', 'Box1', 'Cable1', 'Connector', 'Switch', 'Info_list', 'can be used for 1', '456 not used','','Scope_list', 'Light_front', 'Cable1', 'Connector', 'Code_list', '345,456,567', '567', '', 'Scope_list', 'Light_left', 'Box2', 'Cable3', 'Switch3'], columns = ['parts'])

May anybody can give me a hint and help would be great. I use jupyter notebook and python 3.

CodePudding user response：

First get the indexes where 'Scope_list' is the value and then get the next 3 values:

scope_idx = df.loc[df.parts == 'Scope_list'].index
out = df.loc[[e for lst in [range(idx, idx   4) for idx in
             scope_idx] for e in lst]].copy()
out = out.reset_index(drop=True)

print(out):

          parts
0    Scope_list
1   Light_front
2          Box1
3        Cable1
4    Scope_list
5   Light_front
6        Cable1
7     Connector
8    Scope_list
9    Light_left
10         Box2
11       Cable3

CodePudding user response：

indexes = df[df['parts'].str.contains('Scope_list')].index
pd.concat([df.iloc[indexes[i]:indexes[i] 3] for i in range(len(indexes))])

I hope this will work fine. you can also bind this code in a function just pass the keyword you wanna search and the column name.

def func(column_name : string , keyword : string, show_items_after_keyword : int):
 indexes = df[df[column_name].str.contains(keyword)].index
 result = pd.concat([df.iloc[indexes[i]:indexes[i] show_items_after_keyword] for i in range(len(indexes))])
 return result