Home > Mobile >  Conditionally filter on multiple columns or else return entire dataframe
Conditionally filter on multiple columns or else return entire dataframe

Time:12-16

I have a csv with a few people. I would like to build a function that will either filter based on all parameters given or return the entire dataframe as it is if no arguments are passed.

So given this as the csv:

FirstName    LastName   City
Matt          Fred      Austin
Jim           Jack      NYC
Larry         Bob       Houston
Matt          Spencer   NYC

if I were to call my function find, assuming here is what I would expect to see depending on what I passed as arguments

find(first="Matt", last="Fred")
Output: Matt   Fred Austin
find()
Output: Full Dataframe
find(last="Spencer")
Output: Matt Spencer Fred
find(address="NYC")
Output: All people living in NYC in dataframe

This is what I have tried:

def find(first=None, last=None, city=None):
    file= pd.read_csv(list)
    searched = file.loc[(file["FirstName"] == first) & (file["LastName" == last]) & (file["City"] == city)]
    return searched

This returns blank if I just pass in a first name and nothing else

CodePudding user response:

You could do something like that:

import numpy as np

def find(**kwargs):
    assert np.isin(list(kwargs.keys()), df.columns).all()
    return df.loc[df[list(kwargs.keys())].eq(list(kwargs.values())).all(axis=1)]

search = find(FirstName="Matt", LastName="Fred")
print(search)

#  FirstName LastName    City
#0      Matt     Fred  Austin

find(LastName="Spencer")

#   FirstName     LastName   City
#3       Matt      Spencer    NYC

If you want use "first", "last" and "city":

def find(**kwargs):
    
    df_index = df.rename(columns={"FirstName": "first",
                                  "LastName": "last", 
                                  "City": "city"})
    assert np.isin(list(kwargs.keys()), df_index.columns).all()
    
    return df.loc[df_index[list(kwargs.keys())]
                    .eq(list(kwargs.values())).all(axis=1)]

CodePudding user response:

Another alternative approach of filtering columns:

csv_path = os.path.abspath('test.csv')
df = pd.read_table(csv_path, sep='\s ')

def find_by_attrs(df, **attrs):
    if attrs.keys() - df.columns:
        raise KeyError('Improper column name(s)')
    return df[df[attrs.keys()].eq(attrs.values()).all(1)]


print(find_by_attrs(df, City="NYC"))

The output:

  FirstName LastName City
1       Jim     Jack  NYC
3      Matt  Spencer  NYC
  • Related