I'm a Data Science beginner and have the following task.
I have a huge list of data and need to pick the rows starting with scope_list
but also the next 4 following rows of the filtered data. I have the row scope_list
1 to x times in the list, see below. To select the first row is no problem for me, but not the next 3 rows.
df_new = df.loc[df['parts'] == 'Scope_list']
df_new
which get all ids and rows where the value in column Parts
is Scope_list
parts
0 Scope_list
10 Scope_list
18 Scope_list
but I need not only the first row "Scope_list" also the next 3 rows like
Parts
0 Scope_list
1 Light_front
2 Box1
3 Cable1
4 Scope_list
5 Light_front
6 Cable1
7 Connector
8 Scope_list
9 Light_left
10 Box2
11 Cable3
so thats a part of my df:
import pandas as pd
df = pd.DataFrame(['Scope_list', 'Light_front', 'Box1', 'Cable1', 'Connector', 'Switch', 'Info_list', 'can be used for 1', '456 not used','','Scope_list', 'Light_front', 'Cable1', 'Connector', 'Code_list', '345,456,567', '567', '', 'Scope_list', 'Light_left', 'Box2', 'Cable3', 'Switch3'], columns = ['parts'])
May anybody can give me a hint and help would be great. I use jupyter notebook and python 3.
CodePudding user response:
First get the indexes where 'Scope_list' is the value and then get the next 3 values:
scope_idx = df.loc[df.parts == 'Scope_list'].index
out = df.loc[[e for lst in [range(idx, idx 4) for idx in
scope_idx] for e in lst]].copy()
out = out.reset_index(drop=True)
print(out):
parts
0 Scope_list
1 Light_front
2 Box1
3 Cable1
4 Scope_list
5 Light_front
6 Cable1
7 Connector
8 Scope_list
9 Light_left
10 Box2
11 Cable3
CodePudding user response:
indexes = df[df['parts'].str.contains('Scope_list')].index
pd.concat([df.iloc[indexes[i]:indexes[i] 3] for i in range(len(indexes))])
I hope this will work fine. you can also bind this code in a function just pass the keyword you wanna search and the column name.
def func(column_name : string , keyword : string, show_items_after_keyword : int):
indexes = df[df[column_name].str.contains(keyword)].index
result = pd.concat([df.iloc[indexes[i]:indexes[i] show_items_after_keyword] for i in range(len(indexes))])
return result