Let say I have pandas DataFrame like this:
NO_DOC | NO_SEQ | DESC
A1 | 1 | A
B1 | 2 | B
C1 | 3 | C
I want to search row index from data above on a certain filter: I am able to find it using statement:
idx = data.index[data["NO_DOC"] == "A1" && data["NO_SEQ"] == 1]
but what I want to ask is how to make the "condition" is variable because I want to make it in a function.
Example:
def find(self, data, columns, value):
return data.index[data[...] == value[...] && ....].tolist()
columns parameters is ["NO_DOC","NO_SEQ"]
value is ["A1", 1]
the filter condition above is: data["NO_DOC"] == "A1" && data["NO_SEQ"] == 1
How do I make a function to find row index on pandas DataFrame with a certain filter condition?
CodePudding user response:
How to write Pandas dataframe filter condition from parameters in a function import pandas as pd
raw_data = {'NO_DOC':['A1','B1','C1'],'NO_SEQ':[1,2,3],'DESC':['A','B','C']}
data = pd.DataFrame(raw_data)
columns = ["NO_DOC","NO_SEQ"]
value = ["A1", 1]
create masks and send them to functions instead of cols and values
mask1 = data[columns[0]] == value[0]
mask2 = data[columns[1]] == value[1]
def find(data, mask1,mask2):
return data.index[mask1 & mask2].tolist()
Call your function as follows:
find(data,mask1,mask2)
CodePudding user response:
We can use zip
with a list comprehension to build a DataFrame from each Series of boolean values:
mask_df = pd.DataFrame([
data[col] == val for col, val in zip(columns, values)
])
0 1 2
NO_DOC True False False
NO_SEQ True False False
Then we can reduce this DataFrame of booleans with all
to check which columns (indexes) have only True values (logical AND):
mask = mask_df.all()
0 True
1 False
2 False
dtype: bool
Note: logical OR can be achieved with any
instead of all
Now we can use the mask
to filter the indexes:
data.index[mask].tolist()
[0]
Together find
can look something like:
def find(data, columns, values):
# Create a DataFrame of boolean conditions
# Take logical AND using DataFrame.all
mask = pd.DataFrame([
data[col] == val for col, val in zip(columns, values)
]).all()
# Filter and convert to list
return data.index[mask].tolist()
Sample Program:
import pandas as pd
def find(data, columns, values):
mask = pd.DataFrame([
data[col] == val for col, val in zip(columns, values)
]).all()
return data.index[mask].tolist()
print(
find(
data=pd.DataFrame({
'NO_DOC': ['A1', 'B1', 'C1'],
'NO_SEQ': [1, 2, 3],
'DESC': ['A', 'B', 'C']
}),
columns=["NO_DOC", "NO_SEQ"],
values=["A1", 1]
)
)