I have a csv with a few people. I would like to build a function that will either filter based on all parameters given or return the entire dataframe as it is if no arguments are passed.
So given this as the csv:
FirstName LastName City
Matt Fred Austin
Jim Jack NYC
Larry Bob Houston
Matt Spencer NYC
if I were to call my function find
, assuming here is what I would expect to see depending on what I passed as arguments
find(first="Matt", last="Fred")
Output: Matt Fred Austin
find()
Output: Full Dataframe
find(last="Spencer")
Output: Matt Spencer Fred
find(address="NYC")
Output: All people living in NYC in dataframe
This is what I have tried:
def find(first=None, last=None, city=None):
file= pd.read_csv(list)
searched = file.loc[(file["FirstName"] == first) & (file["LastName" == last]) & (file["City"] == city)]
return searched
This returns blank if I just pass in a first name and nothing else
CodePudding user response:
You could do something like that:
import numpy as np
def find(**kwargs):
assert np.isin(list(kwargs.keys()), df.columns).all()
return df.loc[df[list(kwargs.keys())].eq(list(kwargs.values())).all(axis=1)]
search = find(FirstName="Matt", LastName="Fred")
print(search)
# FirstName LastName City
#0 Matt Fred Austin
find(LastName="Spencer")
# FirstName LastName City
#3 Matt Spencer NYC
If you want use "first"
, "last"
and "city"
:
def find(**kwargs):
df_index = df.rename(columns={"FirstName": "first",
"LastName": "last",
"City": "city"})
assert np.isin(list(kwargs.keys()), df_index.columns).all()
return df.loc[df_index[list(kwargs.keys())]
.eq(list(kwargs.values())).all(axis=1)]
CodePudding user response:
Another alternative approach of filtering columns:
csv_path = os.path.abspath('test.csv')
df = pd.read_table(csv_path, sep='\s ')
def find_by_attrs(df, **attrs):
if attrs.keys() - df.columns:
raise KeyError('Improper column name(s)')
return df[df[attrs.keys()].eq(attrs.values()).all(1)]
print(find_by_attrs(df, City="NYC"))
The output:
FirstName LastName City
1 Jim Jack NYC
3 Matt Spencer NYC