I have the below sample dataset:
Protocol Number: xx-yzm2
Section Major Task Budget
1 Study Setup 25303.18
2 Study Setup Per-Location 110037.8
3 Site Identified by CRO 29966.25
4 Pre-study Site Visit (PSSV) 130525.92
I would like to search the entire dataframe with contains
and pass the keyword 'protocol' and return the value next to it.
Theoretically, the sheet could change so I can't filter by column. Is it possible to do with the pandas?
Input keyword would be: protocol
output would be xx-yzm2
CodePudding user response:
You could try as follows:
import pandas as pd
import numpy as np
data = {0: ['Protocol Number:', np.nan, 'Section Major', '1', '2', '3', '4'],
1: ['xx-yzm2', np.nan, 'Task', 'Study Setup', 'Study Setup Per-Location',
'Site Identified by CRO', 'Pre-study Site Visit (PSSV)'],
2: [np.nan, np.nan, 'Budget', '25303.18', '110037.8', '29966.25', '130525.92']}
df = pd.DataFrame(data)
0 1 2
0 Protocol Number: xx-yzm2 NaN
1 NaN NaN NaN
2 Section Major Task Budget
3 1 Study Setup 25303.18
4 2 Study Setup Per-Location 110037.8
5 3 Site Identified by CRO 29966.25
6 4 Pre-study Site Visit (PSSV) 130525.92
keyword = 'protocol'
# case-insensitive: case=False
# row: array([0], dtype=int64), col: array([0], dtype=int64)
row, col = np.where(df.apply(lambda x: x.astype(str).str.
contains(keyword, case=False)))
result = df.iat[row[0],col[0] 1]
print(result)
# xx-yzm2
If you have multiple matches, the above will get you only the first match. If you want to get all matches, simply use a loop. And in that case maybe add some check to error handle border cases.
for i in range(len(row)):
if not col[i] 1 == len(df.columns):
print(df.iat[row[i],col[i] 1])
else:
# error handle, you're keyword was found in last column,
# i.e. there is no `next` col
pass