Home > Mobile >  How to write Pandas DataFrame filter condition from parameters in a function
How to write Pandas DataFrame filter condition from parameters in a function

Time:12-04

Let say I have pandas DataFrame like this:

  NO_DOC | NO_SEQ | DESC
  A1     | 1      | A
  B1     | 2      | B
  C1     | 3      | C

I want to search row index from data above on a certain filter: I am able to find it using statement:

idx = data.index[data["NO_DOC"] == "A1" && data["NO_SEQ"] == 1]

but what I want to ask is how to make the "condition" is variable because I want to make it in a function.

Example:

def find(self, data, columns, value):
  return data.index[data[...] == value[...] && ....].tolist()

columns parameters is ["NO_DOC","NO_SEQ"] value is ["A1", 1]

the filter condition above is: data["NO_DOC"] == "A1" && data["NO_SEQ"] == 1

How do I make a function to find row index on pandas DataFrame with a certain filter condition?

CodePudding user response:

How to write Pandas dataframe filter condition from parameters in a function import pandas as pd

raw_data = {'NO_DOC':['A1','B1','C1'],'NO_SEQ':[1,2,3],'DESC':['A','B','C']}
data = pd.DataFrame(raw_data)

columns = ["NO_DOC","NO_SEQ"]
value = ["A1", 1]

create masks and send them to functions instead of cols and values

mask1 = data[columns[0]] == value[0] 
mask2 = data[columns[1]] == value[1]

def find(data, mask1,mask2):
    return data.index[mask1 & mask2].tolist()

Call your function as follows:

find(data,mask1,mask2)

CodePudding user response:

We can use zip with a list comprehension to build a DataFrame from each Series of boolean values:

mask_df = pd.DataFrame([
    data[col] == val for col, val in zip(columns, values)
])

           0      1      2
NO_DOC  True  False  False
NO_SEQ  True  False  False

Then we can reduce this DataFrame of booleans with all to check which columns (indexes) have only True values (logical AND):

mask = mask_df.all()

0     True
1    False
2    False
dtype: bool

Note: logical OR can be achieved with any instead of all

Now we can use the mask to filter the indexes:

data.index[mask].tolist()

[0]

Together find can look something like:

def find(data, columns, values):
    # Create a DataFrame of boolean conditions
    # Take logical AND using DataFrame.all
    mask = pd.DataFrame([
        data[col] == val for col, val in zip(columns, values)
    ]).all()
    # Filter and convert to list
    return data.index[mask].tolist()

Sample Program:

import pandas as pd


def find(data, columns, values):
    mask = pd.DataFrame([
        data[col] == val for col, val in zip(columns, values)
    ]).all()
    return data.index[mask].tolist()


print(
    find(
        data=pd.DataFrame({
            'NO_DOC': ['A1', 'B1', 'C1'],
            'NO_SEQ': [1, 2, 3],
            'DESC': ['A', 'B', 'C']
        }),
        columns=["NO_DOC", "NO_SEQ"],
        values=["A1", 1]
    )
)
  • Related