data = {'a':['a','b','c','d','e','f','g'],
'b':['Y','N','Y','Y','Y','N','Y'],
'c':['Qualified','Unqualified','Qualified','Unqualified','Qualified','Unqualified','Qualified']}
df = pd.DataFrame(data)
df_para = {'Y/N':['','y','n'],
'Q/U':['unqualified','','unqualified']}
df_para = pd.DataFrame(df_para)
I would like to filter the df using df_para, My code is:
df_output = pd.DataFrame()
for para in df_para.iterrows():
df_result = df
# filter Q/U
if '' not in df_para['Q/U']:
mask_qu = df_result['c'].str.lower().isin(df_para['Q/U'])
df_result = df_result.loc[(mask_qu)]
# filter Y/N
if '' not in df_para['Y/N']:
mask_yn = df_result['b'].str.lower().isin(df_para['Y/N'])
df_result = df_result.loc[(mask_yn)]
df_output = df_output.append(df_result)
If I use my code, it returns all rows within df three times. However, the df_output should be like:
a b c
1 b N Unqualified
3 d Y Unqualified
5 f N Unqualified
0 a Y Qualified
2 c Y Qualified
3 d Y Unqualified
4 e Y Qualified
6 g Y Qualified
1 b N Unqualified
5 f N Unqualified
How could I fix it?
CodePudding user response:
Reason is in operator test indices:
Using the Python in operator on a Series tests for membership in the index, not membership among the values.
If this behavior is surprising, keep in mind that using in on a Python dictionary tests keys, not values, and Series are dict-like.
#pairs for filtering
cols = [('c','Q/U'), ('b','Y/N')]
#for each unique value in df_para filter rows in list
dfs = [df[df[a].str.lower().eq(x)] for a, b in cols for x in df_para[b].unique()]
#join subDataFrames
df_out = pd.concat(dfs)
print (df_out)
a b c
1 b N Unqualified
3 d Y Unqualified
5 f N Unqualified
0 a Y Qualified
2 c Y Qualified
3 d Y Unqualified
4 e Y Qualified
6 g Y Qualified
1 b N Unqualified
5 f N Unqualified
CodePudding user response:
try this:
import pandas as pd
import numpy as np
data = {'a':['a','b','c','d','e','f','g'],
'b':['Y','N','Y','Y','Y','N','Y'],
'c':['Qualified','Unqualified','Qualified','Unqualified','Qualified','Qualified','Unqualified']}
df = pd.DataFrame(data)
df_result = df[df["c"] == "Unqualified"]
print(df_result)
print(type(df_result))