Home > Software design >  How to search entire dataframe and return value next to match
How to search entire dataframe and return value next to match

Time:08-20

I have the below sample dataset:

Protocol Number:    xx-yzm2 
        
Section Major       Task                        Budget
1                   Study Setup                 25303.18
2                   Study Setup Per-Location    110037.8
3                   Site Identified by CRO      29966.25
4                   Pre-study Site Visit (PSSV) 130525.92

I would like to search the entire dataframe with contains and pass the keyword 'protocol' and return the value next to it.

Theoretically, the sheet could change so I can't filter by column. Is it possible to do with the pandas?

Input keyword would be: protocol output would be xx-yzm2

CodePudding user response:

You could try as follows:

import pandas as pd
import numpy as np

data = {0: ['Protocol Number:', np.nan, 'Section Major', '1', '2', '3', '4'],
         1: ['xx-yzm2', np.nan, 'Task', 'Study Setup', 'Study Setup Per-Location', 
             'Site Identified by CRO', 'Pre-study Site Visit (PSSV)'],
         2: [np.nan, np.nan, 'Budget', '25303.18', '110037.8', '29966.25', '130525.92']}

df = pd.DataFrame(data)

                  0                            1          2
0  Protocol Number:                      xx-yzm2        NaN
1               NaN                          NaN        NaN
2     Section Major                         Task     Budget
3                 1                  Study Setup   25303.18
4                 2     Study Setup Per-Location   110037.8
5                 3       Site Identified by CRO   29966.25
6                 4  Pre-study Site Visit (PSSV)  130525.92

keyword = 'protocol'

# case-insensitive: case=False
# row: array([0], dtype=int64), col: array([0], dtype=int64)
row, col = np.where(df.apply(lambda x: x.astype(str).str.
                             contains(keyword, case=False)))

result = df.iat[row[0],col[0] 1]
print(result)

# xx-yzm2

If you have multiple matches, the above will get you only the first match. If you want to get all matches, simply use a loop. And in that case maybe add some check to error handle border cases.

for i in range(len(row)):
    if not col[i] 1 == len(df.columns):
        print(df.iat[row[i],col[i] 1])
    else:
        # error handle, you're keyword was found in last column, 
        # i.e. there is no `next` col
        pass
  • Related